2023-11-18 01:38:34,528 INFO [train_asr.py:1183] (1/4) Training started 2023-11-18 01:38:34,528 INFO [train_asr.py:1193] (1/4) Device: cuda:1 2023-11-18 01:38:34,532 INFO [train_asr.py:1205] (1/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': '025f11fd-dirty', 'icefall-git-date': 'Fri Nov 17 16:19:07 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-10-1113160712-78bc8d8bd8-pw6cd', 'IP address': '10.177.94.17'}, 'world_size': 4, 'master_port': 13454, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-18 01:38:34,533 INFO [train_asr.py:1207] (1/4) About to create model 2023-11-18 01:38:35,350 INFO [train_asr.py:1211] (1/4) Number of model parameters: 65819362 2023-11-18 01:38:38,258 INFO [train_asr.py:1227] (1/4) Using DDP 2023-11-18 01:38:39,398 INFO [train_asr.py:1271] (1/4) Getting audioset cuts 2023-11-18 01:38:39,398 INFO [kd_datamodule.py:796] (1/4) About to get the audioset cuts. 2023-11-18 01:38:39,462 INFO [train_asr.py:1277] (1/4) Using mux to combine Librispeech with audioset 2023-11-18 01:38:39,462 INFO [train_asr.py:1287] (1/4) CutSet(len=2748469) [underlying data type: ] 2023-11-18 01:38:48,654 INFO [kd_datamodule.py:396] (1/4) Enable MUSAN 2023-11-18 01:38:48,654 INFO [kd_datamodule.py:397] (1/4) About to get Musan cuts 2023-11-18 01:38:51,055 INFO [kd_datamodule.py:427] (1/4) Enable SpecAugment 2023-11-18 01:38:51,055 INFO [kd_datamodule.py:428] (1/4) Time warp factor: 80 2023-11-18 01:38:51,056 INFO [kd_datamodule.py:438] (1/4) Num frame mask: 10 2023-11-18 01:38:51,056 INFO [kd_datamodule.py:451] (1/4) About to create train dataset 2023-11-18 01:38:51,057 INFO [kd_datamodule.py:487] (1/4) Using SimpleCutSampler 2023-11-18 01:38:51,057 INFO [kd_datamodule.py:495] (1/4) About to create train dataloader 2023-11-18 01:38:51,085 INFO [kd_datamodule.py:814] (1/4) About to get the audioset eval cuts. 2023-11-18 01:38:51,122 INFO [kd_datamodule.py:529] (1/4) About to create dev dataset 2023-11-18 01:38:51,569 INFO [kd_datamodule.py:550] (1/4) About to create dev dataloader 2023-11-18 01:39:26,873 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 0, loss[loss=3.792, simple_loss=2.328, pruned_loss=2.302, audio_tagging_loss=1.234, over 15592.00 frames. ], tot_loss[loss=3.792, simple_loss=2.328, pruned_loss=2.302, audio_tagging_loss=1.234, over 15592.00 frames. ], batch size: 61, lr: 2.25e-02, grad_scale: 2.0 2023-11-18 01:39:26,873 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 01:39:49,503 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.0641, 4.0998, 4.0191, 4.0995, 4.0589, 4.1034, 4.0077, 4.0865], device='cuda:1') 2023-11-18 01:40:00,446 INFO [train_asr.py:1147] (1/4) Epoch 1, validation: loss=2.927, simple_loss=1.349, pruned_loss=1.339, audio_tagging_loss=1.444, over 4681554.00 frames. 2023-11-18 01:40:00,446 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 01:40:00,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=55.57 vs. limit=7.5 2023-11-18 01:40:02,674 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=31.49 vs. limit=4.0 2023-11-18 01:40:19,632 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=432.53 vs. limit=5.033333333333333 2023-11-18 01:40:24,803 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=365.99 vs. limit=7.525 2023-11-18 01:40:25,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=66.66666666666667, ans=0.49166666666666664 2023-11-18 01:40:28,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=133.33333333333334, ans=0.8953333333333333 2023-11-18 01:40:36,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=133.33333333333334, ans=0.49375 2023-11-18 01:40:38,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=510.27 vs. limit=7.6 2023-11-18 01:40:41,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=37.34 vs. limit=4.053333333333334 2023-11-18 01:40:49,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=200.0, ans=0.490625 2023-11-18 01:41:06,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=48.52 vs. limit=7.7 2023-11-18 01:41:09,568 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 50, loss[loss=0.6512, simple_loss=0.5429, pruned_loss=0.6067, audio_tagging_loss=0.04064, over 15705.00 frames. ], tot_loss[loss=1.327, simple_loss=0.9788, pruned_loss=0.8374, audio_tagging_loss=0.2623, over 680401.64 frames. ], batch size: 56, lr: 2.48e-02, grad_scale: 1.0 2023-11-18 01:41:15,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=333.3333333333333, ans=0.484375 2023-11-18 01:41:15,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=298.78 vs. limit=7.75 2023-11-18 01:41:15,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=40.16 vs. limit=7.75 2023-11-18 01:41:17,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=158.64 vs. limit=7.625 2023-11-18 01:41:18,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=349.00 vs. limit=7.75 2023-11-18 01:41:24,467 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=449.93 vs. limit=7.65 2023-11-18 01:41:30,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=400.0, ans=0.0975 2023-11-18 01:41:32,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=243.34 vs. limit=7.65 2023-11-18 01:41:46,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=154.16 vs. limit=7.675 2023-11-18 01:41:49,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=462.82 vs. limit=7.7 2023-11-18 01:41:55,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=533.3333333333334, ans=0.8813333333333333 2023-11-18 01:41:57,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=68.20 vs. limit=7.7 2023-11-18 01:42:01,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=263.83 vs. limit=7.7 2023-11-18 01:42:10,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=116.03 vs. limit=7.725 2023-11-18 01:42:18,213 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 100, loss[loss=0.4834, simple_loss=0.3894, pruned_loss=0.4721, audio_tagging_loss=0.03513, over 15850.00 frames. ], tot_loss[loss=0.8645, simple_loss=0.657, pruned_loss=0.6325, audio_tagging_loss=0.1384, over 1209815.89 frames. ], batch size: 57, lr: 2.70e-02, grad_scale: 2.0 2023-11-18 01:42:19,526 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 4.039e+01 1.213e+02 5.684e+02 1.606e+03 1.428e+04, threshold=1.137e+03, percent-clipped=0.0 2023-11-18 01:42:19,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=16.27 vs. limit=5.166666666666667 2023-11-18 01:42:32,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=287.43 vs. limit=5.366666666666667 2023-11-18 01:42:33,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=231.10 vs. limit=7.775 2023-11-18 01:42:34,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=733.3333333333334, ans=0.465625 2023-11-18 01:42:35,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=183.46 vs. limit=5.366666666666667 2023-11-18 01:42:36,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=136.63 vs. limit=5.366666666666667 2023-11-18 01:42:36,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=733.3333333333334, ans=0.09541666666666668 2023-11-18 01:42:44,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=800.0, ans=0.4625 2023-11-18 01:42:46,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=54.34 vs. limit=8.1 2023-11-18 01:42:50,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=55.31 vs. limit=7.8 2023-11-18 01:42:50,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=86.62 vs. limit=7.8 2023-11-18 01:42:53,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=52.46 vs. limit=7.8 2023-11-18 01:42:55,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=251.90 vs. limit=7.8 2023-11-18 01:43:04,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=66.82 vs. limit=5.0 2023-11-18 01:43:05,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=244.70 vs. limit=7.825 2023-11-18 01:43:22,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=141.15 vs. limit=7.85 2023-11-18 01:43:23,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=8.2 2023-11-18 01:43:24,585 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 150, loss[loss=0.4407, simple_loss=0.3579, pruned_loss=0.4444, audio_tagging_loss=0.02164, over 15293.00 frames. ], tot_loss[loss=0.6774, simple_loss=0.5215, pruned_loss=0.5402, audio_tagging_loss=0.09307, over 1614385.89 frames. ], batch size: 56, lr: 2.93e-02, grad_scale: 2.0 2023-11-18 01:43:34,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=429.60 vs. limit=7.875 2023-11-18 01:43:39,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=1066.6666666666667, ans=0.16 2023-11-18 01:43:42,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=160.22 vs. limit=8.3 2023-11-18 01:43:57,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=1133.3333333333333, ans=0.8603333333333334 2023-11-18 01:43:57,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=118.05 vs. limit=7.925 2023-11-18 01:43:57,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=144.25 vs. limit=8.35 2023-11-18 01:43:58,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=1133.3333333333333, ans=0.446875 2023-11-18 01:44:07,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=376.43 vs. limit=7.95 2023-11-18 01:44:09,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=215.15 vs. limit=7.95 2023-11-18 01:44:12,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=228.04 vs. limit=7.95 2023-11-18 01:44:12,447 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=58.42 vs. limit=5.6 2023-11-18 01:44:18,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=12.97 vs. limit=5.316666666666666 2023-11-18 01:44:19,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=15.95 vs. limit=5.316666666666666 2023-11-18 01:44:20,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.73 vs. limit=4.506666666666667 2023-11-18 01:44:22,851 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=229.49 vs. limit=7.975 2023-11-18 01:44:23,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=64.75 vs. limit=7.975 2023-11-18 01:44:29,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=164.66 vs. limit=7.975 2023-11-18 01:44:32,268 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 200, loss[loss=0.3782, simple_loss=0.3042, pruned_loss=0.368, audio_tagging_loss=0.0192, over 14137.00 frames. ], tot_loss[loss=0.5758, simple_loss=0.4474, pruned_loss=0.4834, audio_tagging_loss=0.06864, over 1930571.72 frames. ], batch size: 54, lr: 3.15e-02, grad_scale: 4.0 2023-11-18 01:44:33,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=40.79 vs. limit=4.266666666666667 2023-11-18 01:44:33,546 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.394e+01 4.484e+01 5.110e+01 6.274e+01 1.485e+02, threshold=1.022e+02, percent-clipped=0.0 2023-11-18 01:44:37,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=110.30 vs. limit=8.0 2023-11-18 01:44:37,502 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=79.32 vs. limit=8.0 2023-11-18 01:44:41,964 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=92.75 vs. limit=8.0 2023-11-18 01:44:45,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.87 vs. limit=4.533333333333333 2023-11-18 01:44:46,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=201.20 vs. limit=8.55 2023-11-18 01:44:53,359 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=73.95 vs. limit=5.0 2023-11-18 01:45:07,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=140.34 vs. limit=8.6 2023-11-18 01:45:08,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=4.586666666666667 2023-11-18 01:45:08,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.75 vs. limit=5.366666666666667 2023-11-18 01:45:16,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=219.31 vs. limit=8.65 2023-11-18 01:45:16,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=245.82 vs. limit=8.075 2023-11-18 01:45:18,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=1533.3333333333333, ans=5.958333333333333 2023-11-18 01:45:18,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1533.3333333333333, ans=0.8463333333333334 2023-11-18 01:45:18,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=46.53 vs. limit=8.075 2023-11-18 01:45:33,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=75.57 vs. limit=8.1 2023-11-18 01:45:34,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=126.49 vs. limit=8.1 2023-11-18 01:45:37,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1600.0, ans=0.425 2023-11-18 01:45:40,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=15.48 vs. limit=5.833333333333333 2023-11-18 01:45:41,046 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 250, loss[loss=0.2731, simple_loss=0.213, pruned_loss=0.2482, audio_tagging_loss=0.02062, over 14757.00 frames. ], tot_loss[loss=0.5143, simple_loss=0.4023, pruned_loss=0.4448, audio_tagging_loss=0.05349, over 2175349.71 frames. ], batch size: 57, lr: 3.38e-02, grad_scale: 4.0 2023-11-18 01:45:41,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=189.30 vs. limit=8.125 2023-11-18 01:45:45,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=80.40 vs. limit=8.125 2023-11-18 01:45:57,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=285.47 vs. limit=8.15 2023-11-18 01:45:57,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=1733.3333333333333, ans=6.083333333333333 2023-11-18 01:45:58,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=287.80 vs. limit=8.15 2023-11-18 01:46:02,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=1733.3333333333333, ans=0.5 2023-11-18 01:46:04,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=288.53 vs. limit=8.15 2023-11-18 01:46:08,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=110.57 vs. limit=8.85 2023-11-18 01:46:09,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=1800.0, ans=8.175 2023-11-18 01:46:11,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=1800.0, ans=0.768 2023-11-18 01:46:16,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=147.75 vs. limit=8.175 2023-11-18 01:46:17,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=1800.0, ans=0.1325 2023-11-18 01:46:19,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=85.39 vs. limit=8.2 2023-11-18 01:46:37,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=63.45 vs. limit=8.225 2023-11-18 01:46:46,677 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 300, loss[loss=0.3414, simple_loss=0.2693, pruned_loss=0.3151, audio_tagging_loss=0.01815, over 15047.00 frames. ], tot_loss[loss=0.4755, simple_loss=0.3734, pruned_loss=0.4184, audio_tagging_loss=0.04373, over 2362630.24 frames. ], batch size: 58, lr: 3.60e-02, grad_scale: 8.0 2023-11-18 01:46:47,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=51.30 vs. limit=9.0 2023-11-18 01:46:47,927 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.566e+01 4.754e+01 5.461e+01 6.771e+01 2.069e+02, threshold=1.092e+02, percent-clipped=3.0 2023-11-18 01:46:48,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2000.0, ans=0.25 2023-11-18 01:46:49,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=42.62 vs. limit=9.0 2023-11-18 01:47:02,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=60.59 vs. limit=8.275 2023-11-18 01:47:05,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=80.52 vs. limit=8.275 2023-11-18 01:47:05,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.54 vs. limit=4.413333333333333 2023-11-18 01:47:12,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.94 vs. limit=9.1 2023-11-18 01:47:16,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=75.29 vs. limit=8.3 2023-11-18 01:47:18,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2133.3333333333335, ans=0.12 2023-11-18 01:47:24,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=108.72 vs. limit=9.1 2023-11-18 01:47:25,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=172.42 vs. limit=8.325 2023-11-18 01:47:26,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=36.54 vs. limit=6.1 2023-11-18 01:47:32,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2200.0, ans=0.27799999999999997 2023-11-18 01:47:34,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=69.46 vs. limit=8.325 2023-11-18 01:47:34,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.56 vs. limit=5.55 2023-11-18 01:47:34,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.58 vs. limit=9.15 2023-11-18 01:47:36,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=9.15 2023-11-18 01:47:44,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=49.45 vs. limit=6.133333333333333 2023-11-18 01:47:47,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2266.6666666666665, ans=6.416666666666666 2023-11-18 01:47:51,178 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 350, loss[loss=0.3538, simple_loss=0.2758, pruned_loss=0.3208, audio_tagging_loss=0.01937, over 14339.00 frames. ], tot_loss[loss=0.4494, simple_loss=0.3531, pruned_loss=0.3994, audio_tagging_loss=0.0371, over 2524537.59 frames. ], batch size: 55, lr: 3.83e-02, grad_scale: 8.0 2023-11-18 01:48:09,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2400.0, ans=0.27599999999999997 2023-11-18 01:48:12,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=77.07 vs. limit=8.4 2023-11-18 01:48:17,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2466.6666666666665, ans=8.425 2023-11-18 01:48:19,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=2466.6666666666665, ans=0.237 2023-11-18 01:48:26,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=42.84 vs. limit=9.35 2023-11-18 01:48:38,143 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=5.955e+00 2023-11-18 01:48:53,534 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.123e+00 2023-11-18 01:48:53,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=62.06 vs. limit=9.45 2023-11-18 01:48:54,067 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.44 vs. limit=9.45 2023-11-18 01:48:57,487 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 400, loss[loss=0.2731, simple_loss=0.2049, pruned_loss=0.2342, audio_tagging_loss=0.0231, over 13829.00 frames. ], tot_loss[loss=0.4327, simple_loss=0.3397, pruned_loss=0.3863, audio_tagging_loss=0.03211, over 2643948.51 frames. ], batch size: 55, lr: 4.05e-02, grad_scale: 16.0 2023-11-18 01:48:58,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=39.55 vs. limit=8.5 2023-11-18 01:48:58,707 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.943e+01 5.270e+01 6.183e+01 8.354e+01 3.927e+02, threshold=1.237e+02, percent-clipped=8.0 2023-11-18 01:49:08,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=54.73 vs. limit=8.5 2023-11-18 01:49:09,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.18 vs. limit=5.683333333333334 2023-11-18 01:49:14,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.08 vs. limit=5.683333333333334 2023-11-18 01:49:17,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=6.88 vs. limit=5.0 2023-11-18 01:49:23,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=47.99 vs. limit=8.55 2023-11-18 01:49:27,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2800.0, ans=0.095 2023-11-18 01:49:43,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=72.59 vs. limit=8.575 2023-11-18 01:49:46,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2866.6666666666665, ans=0.365625 2023-11-18 01:49:47,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.00 vs. limit=5.733333333333333 2023-11-18 01:49:54,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2933.3333333333335, ans=0.3625 2023-11-18 01:49:55,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=59.87 vs. limit=8.6 2023-11-18 01:50:00,767 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 450, loss[loss=0.3074, simple_loss=0.2327, pruned_loss=0.2721, audio_tagging_loss=0.01844, over 14383.00 frames. ], tot_loss[loss=0.4144, simple_loss=0.3243, pruned_loss=0.3697, audio_tagging_loss=0.02861, over 2721329.64 frames. ], batch size: 55, lr: 4.28e-02, grad_scale: 16.0 2023-11-18 01:50:04,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3000.0, ans=0.795 2023-11-18 01:50:04,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3000.0, ans=0.795 2023-11-18 01:50:08,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3000.0, ans=0.03249999999999999 2023-11-18 01:50:10,954 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.134e+00 2023-11-18 01:50:24,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3066.6666666666665, ans=0.031 2023-11-18 01:50:37,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=39.59 vs. limit=8.675 2023-11-18 01:50:46,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3200.0, ans=0.218 2023-11-18 01:50:52,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3266.6666666666665, ans=0.2673333333333333 2023-11-18 01:50:52,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=34.90 vs. limit=8.725 2023-11-18 01:50:55,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=36.35 vs. limit=8.725 2023-11-18 01:51:04,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.11 vs. limit=5.833333333333333 2023-11-18 01:51:05,773 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 500, loss[loss=0.4072, simple_loss=0.3137, pruned_loss=0.3527, audio_tagging_loss=0.01809, over 14824.00 frames. ], tot_loss[loss=0.4026, simple_loss=0.3137, pruned_loss=0.3574, audio_tagging_loss=0.02611, over 2795605.47 frames. ], batch size: 56, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:51:06,952 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 3.968e+01 4.922e+01 5.274e+01 6.306e+01 1.338e+02, threshold=1.055e+02, percent-clipped=1.0 2023-11-18 01:51:09,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3333.3333333333335, ans=0.024999999999999994 2023-11-18 01:51:12,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=10.0 2023-11-18 01:51:17,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3400.0, ans=0.340625 2023-11-18 01:51:17,443 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.04 vs. limit=5.85 2023-11-18 01:51:20,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=38.06 vs. limit=8.775 2023-11-18 01:51:20,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.07 vs. limit=10.05 2023-11-18 01:51:24,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.68 vs. limit=5.36 2023-11-18 01:51:26,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.75 vs. limit=8.775 2023-11-18 01:51:38,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.06 vs. limit=10.1 2023-11-18 01:51:46,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=101.21 vs. limit=8.825 2023-11-18 01:51:47,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.52 vs. limit=10.15 2023-11-18 01:51:50,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3533.3333333333335, ans=8.825 2023-11-18 01:52:09,203 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 550, loss[loss=0.3209, simple_loss=0.2386, pruned_loss=0.2684, audio_tagging_loss=0.0218, over 16575.00 frames. ], tot_loss[loss=0.3965, simple_loss=0.3077, pruned_loss=0.3494, audio_tagging_loss=0.02406, over 2848270.07 frames. ], batch size: 62, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:52:09,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3666.6666666666665, ans=0.2633333333333333 2023-11-18 01:52:09,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=17.42 vs. limit=6.833333333333333 2023-11-18 01:52:10,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=27.40 vs. limit=8.875 2023-11-18 01:52:14,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.94 vs. limit=5.916666666666667 2023-11-18 01:52:20,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=24.40 vs. limit=10.3 2023-11-18 01:52:27,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3733.3333333333335, ans=0.26266666666666666 2023-11-18 01:52:38,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=35.73 vs. limit=8.925 2023-11-18 01:52:41,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=32.13 vs. limit=10.35 2023-11-18 01:52:41,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.45 vs. limit=10.35 2023-11-18 01:52:44,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=21.87 vs. limit=6.9 2023-11-18 01:52:44,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3800.0, ans=0.262 2023-11-18 01:52:47,702 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.58 vs. limit=8.95 2023-11-18 01:52:47,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=17.76 vs. limit=8.95 2023-11-18 01:52:52,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.58 vs. limit=5.966666666666667 2023-11-18 01:53:01,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3933.3333333333335, ans=8.975 2023-11-18 01:53:02,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3933.3333333333335, ans=0.05249999999999999 2023-11-18 01:53:07,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3933.3333333333335, ans=0.011499999999999996 2023-11-18 01:53:12,514 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 600, loss[loss=0.4066, simple_loss=0.3127, pruned_loss=0.3421, audio_tagging_loss=0.01417, over 15791.00 frames. ], tot_loss[loss=0.3898, simple_loss=0.3009, pruned_loss=0.3399, audio_tagging_loss=0.02268, over 2890640.43 frames. ], batch size: 57, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:53:12,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4000.0, ans=0.3125 2023-11-18 01:53:13,655 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 4.102e+01 5.824e+01 6.784e+01 8.267e+01 3.333e+02, threshold=1.357e+02, percent-clipped=4.0 2023-11-18 01:53:13,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4000.0, ans=0.3125 2023-11-18 01:53:15,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.84 vs. limit=6.0 2023-11-18 01:53:29,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.76 vs. limit=10.55 2023-11-18 01:53:30,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.57 vs. limit=10.55 2023-11-18 01:53:33,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=35.93 vs. limit=9.025 2023-11-18 01:53:36,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=18.12 vs. limit=9.025 2023-11-18 01:53:47,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4133.333333333333, ans=0.7553333333333334 2023-11-18 01:53:47,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4133.333333333333, ans=0.30625 2023-11-18 01:54:12,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4266.666666666667, ans=0.04888888888888889 2023-11-18 01:54:14,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=4266.666666666667, ans=9.1 2023-11-18 01:54:16,215 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=59.59 vs. limit=9.125 2023-11-18 01:54:16,777 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 650, loss[loss=0.3727, simple_loss=0.2808, pruned_loss=0.3022, audio_tagging_loss=0.01887, over 15064.00 frames. ], tot_loss[loss=0.384, simple_loss=0.295, pruned_loss=0.3303, audio_tagging_loss=0.02169, over 2927678.58 frames. ], batch size: 57, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:54:19,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4333.333333333333, ans=0.296875 2023-11-18 01:54:22,309 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=58.16 vs. limit=9.125 2023-11-18 01:54:29,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=19.08 vs. limit=9.15 2023-11-18 01:54:46,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=9.175 2023-11-18 01:54:54,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4533.333333333333, ans=0.25466666666666665 2023-11-18 01:55:10,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=34.63 vs. limit=9.225 2023-11-18 01:55:14,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4600.0, ans=0.254 2023-11-18 01:55:19,015 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 700, loss[loss=0.3551, simple_loss=0.2683, pruned_loss=0.2804, audio_tagging_loss=0.01741, over 15739.00 frames. ], tot_loss[loss=0.3789, simple_loss=0.2898, pruned_loss=0.3213, audio_tagging_loss=0.02083, over 2961244.34 frames. ], batch size: 58, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:55:19,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=24.56 vs. limit=9.25 2023-11-18 01:55:20,175 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.747e+01 8.200e+01 9.584e+01 1.192e+02 3.813e+02, threshold=1.917e+02, percent-clipped=10.0 2023-11-18 01:55:23,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4666.666666666667, ans=0.009855072463768115 2023-11-18 01:55:26,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=43.54 vs. limit=9.25 2023-11-18 01:55:29,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4666.666666666667, ans=0.28125 2023-11-18 01:55:35,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4733.333333333333, ans=7.958333333333333 2023-11-18 01:55:44,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=18.02 vs. limit=9.3 2023-11-18 01:55:57,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4866.666666666667, ans=0.271875 2023-11-18 01:56:04,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4866.666666666667, ans=0.00981159420289855 2023-11-18 01:56:09,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=35.61 vs. limit=9.35 2023-11-18 01:56:13,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=36.32 vs. limit=9.35 2023-11-18 01:56:14,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4933.333333333333, ans=0.7273333333333334 2023-11-18 01:56:18,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=4933.333333333333, ans=11.2 2023-11-18 01:56:21,537 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 750, loss[loss=0.3482, simple_loss=0.2601, pruned_loss=0.2727, audio_tagging_loss=0.01749, over 16090.00 frames. ], tot_loss[loss=0.3761, simple_loss=0.2866, pruned_loss=0.3139, audio_tagging_loss=0.0201, over 2983459.82 frames. ], batch size: 59, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:56:32,736 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=33.01 vs. limit=9.375 2023-11-18 01:56:32,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.80 vs. limit=9.375 2023-11-18 01:56:43,829 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=19.78 vs. limit=9.4 2023-11-18 01:56:55,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.05 vs. limit=11.35 2023-11-18 01:56:56,919 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=19.02 vs. limit=9.425 2023-11-18 01:57:00,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5200.0, ans=0.25625 2023-11-18 01:57:02,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.75 vs. limit=11.4 2023-11-18 01:57:05,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.64 vs. limit=11.4 2023-11-18 01:57:06,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=5200.0, ans=0.7180000000000001 2023-11-18 01:57:11,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=5266.666666666667, ans=0.09899494936611666 2023-11-18 01:57:24,936 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 800, loss[loss=0.4339, simple_loss=0.3348, pruned_loss=0.3317, audio_tagging_loss=0.01342, over 15041.00 frames. ], tot_loss[loss=0.3754, simple_loss=0.2853, pruned_loss=0.3077, audio_tagging_loss=0.01957, over 2996172.19 frames. ], batch size: 54, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:57:26,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=35.10 vs. limit=9.5 2023-11-18 01:57:27,267 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.496e+01 8.780e+01 1.132e+02 1.440e+02 3.329e+02, threshold=2.265e+02, percent-clipped=7.0 2023-11-18 01:57:29,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5333.333333333333, ans=0.25 2023-11-18 01:57:41,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=5400.0, ans=0.281 2023-11-18 01:57:42,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.17 vs. limit=6.16 2023-11-18 01:57:48,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.44 vs. limit=11.6 2023-11-18 01:57:49,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=30.02 vs. limit=9.55 2023-11-18 01:57:49,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=9.55 2023-11-18 01:57:53,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.84 vs. limit=11.6 2023-11-18 01:58:02,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=5533.333333333333, ans=0.7063333333333334 2023-11-18 01:58:02,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5533.333333333333, ans=0.24466666666666667 2023-11-18 01:58:14,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.53 vs. limit=11.7 2023-11-18 01:58:16,896 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.65 vs. limit=11.7 2023-11-18 01:58:25,500 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 850, loss[loss=0.3662, simple_loss=0.2804, pruned_loss=0.2715, audio_tagging_loss=0.01439, over 15919.00 frames. ], tot_loss[loss=0.3733, simple_loss=0.2835, pruned_loss=0.2997, audio_tagging_loss=0.01902, over 3014608.06 frames. ], batch size: 59, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:58:31,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5666.666666666667, ans=0.234375 2023-11-18 01:58:36,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=25.00 vs. limit=9.625 2023-11-18 01:58:36,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=5733.333333333333, ans=0.04277777777777778 2023-11-18 01:58:40,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=27.42 vs. limit=9.65 2023-11-18 01:58:54,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.17 vs. limit=11.85 2023-11-18 01:59:03,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=25.04 vs. limit=9.7 2023-11-18 01:59:11,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=5866.666666666667, ans=0.009594202898550725 2023-11-18 01:59:12,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=5866.666666666667, ans=0.22499999999999998 2023-11-18 01:59:26,566 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 900, loss[loss=0.36, simple_loss=0.2801, pruned_loss=0.2529, audio_tagging_loss=0.01518, over 15443.00 frames. ], tot_loss[loss=0.372, simple_loss=0.2833, pruned_loss=0.2916, audio_tagging_loss=0.01853, over 3020736.52 frames. ], batch size: 55, lr: 4.48e-02, grad_scale: 16.0 2023-11-18 01:59:27,187 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.25 vs. limit=6.5 2023-11-18 01:59:28,374 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=21.18 vs. limit=9.75 2023-11-18 01:59:28,877 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.579e+01 7.921e+01 9.785e+01 1.252e+02 2.736e+02, threshold=1.957e+02, percent-clipped=4.0 2023-11-18 01:59:50,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=9.8 2023-11-18 02:00:03,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.45 vs. limit=12.15 2023-11-18 02:00:14,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=6266.666666666667, ans=0.20625 2023-11-18 02:00:15,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=9.85 2023-11-18 02:00:18,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=6266.666666666667, ans=8.916666666666668 2023-11-18 02:00:18,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.67 vs. limit=12.2 2023-11-18 02:00:28,103 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 950, loss[loss=0.346, simple_loss=0.2724, pruned_loss=0.2381, audio_tagging_loss=0.01266, over 15424.00 frames. ], tot_loss[loss=0.3638, simple_loss=0.2782, pruned_loss=0.278, audio_tagging_loss=0.01801, over 3022379.93 frames. ], batch size: 55, lr: 4.48e-02, grad_scale: 8.0 2023-11-18 02:00:38,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=6400.0, ans=0.676 2023-11-18 02:00:45,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=6400.0, ans=0.0 2023-11-18 02:00:52,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=6466.666666666667, ans=0.00946376811594203 2023-11-18 02:01:09,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=12.4 2023-11-18 02:01:27,324 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1000, loss[loss=0.314, simple_loss=0.2539, pruned_loss=0.2049, audio_tagging_loss=0.01084, over 15572.00 frames. ], tot_loss[loss=0.3499, simple_loss=0.2688, pruned_loss=0.2601, audio_tagging_loss=0.01758, over 3028310.69 frames. ], batch size: 60, lr: 4.48e-02, grad_scale: 8.0 2023-11-18 02:01:30,672 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.563e+01 9.019e+01 1.486e+02 2.475e+02 7.919e+02, threshold=2.973e+02, percent-clipped=36.0 2023-11-18 02:01:41,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=6733.333333333333, ans=0.23266666666666666 2023-11-18 02:01:42,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=6733.333333333333, ans=0.184375 2023-11-18 02:01:53,964 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:01:58,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=6800.0, ans=0.18125000000000002 2023-11-18 02:02:02,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=6866.666666666667, ans=0.0093768115942029 2023-11-18 02:02:04,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.36 vs. limit=10.075 2023-11-18 02:02:07,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=6866.666666666667, ans=0.0093768115942029 2023-11-18 02:02:26,192 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1050, loss[loss=0.2108, simple_loss=0.1635, pruned_loss=0.1269, audio_tagging_loss=0.01753, over 14004.00 frames. ], tot_loss[loss=0.334, simple_loss=0.258, pruned_loss=0.2417, audio_tagging_loss=0.01704, over 3028856.34 frames. ], batch size: 53, lr: 4.48e-02, grad_scale: 8.0 2023-11-18 02:02:26,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=7000.0, ans=0.171875 2023-11-18 02:02:44,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=7066.666666666667, ans=0.16875 2023-11-18 02:03:21,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=7266.666666666667, ans=0.309 2023-11-18 02:03:23,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=7266.666666666667, ans=0.009289855072463769 2023-11-18 02:03:25,436 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1100, loss[loss=0.3552, simple_loss=0.2882, pruned_loss=0.2219, audio_tagging_loss=0.01428, over 14900.00 frames. ], tot_loss[loss=0.3205, simple_loss=0.2492, pruned_loss=0.2253, audio_tagging_loss=0.01669, over 3023749.57 frames. ], batch size: 56, lr: 4.48e-02, grad_scale: 8.0 2023-11-18 02:03:28,772 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 1.098e+02 1.778e+02 2.963e+02 6.822e+02, threshold=3.557e+02, percent-clipped=25.0 2023-11-18 02:03:28,838 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:03:51,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=7466.666666666667, ans=0.009246376811594202 2023-11-18 02:03:59,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=7533.333333333333, ans=0.14687499999999998 2023-11-18 02:04:05,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=7533.333333333333, ans=0.03527777777777778 2023-11-18 02:04:18,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=7600.0, ans=0.14375 2023-11-18 02:04:18,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=13.2 2023-11-18 02:04:22,637 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1150, loss[loss=0.255, simple_loss=0.2083, pruned_loss=0.1487, audio_tagging_loss=0.01478, over 15584.00 frames. ], tot_loss[loss=0.308, simple_loss=0.2413, pruned_loss=0.2104, audio_tagging_loss=0.01627, over 3029263.14 frames. ], batch size: 57, lr: 4.47e-02, grad_scale: 8.0 2023-11-18 02:04:38,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=7733.333333333333, ans=0.1375 2023-11-18 02:04:44,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=20.67 vs. limit=10.4 2023-11-18 02:05:07,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=7933.333333333333, ans=0.0 2023-11-18 02:05:19,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=13.5 2023-11-18 02:05:20,092 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1200, loss[loss=0.1584, simple_loss=0.1235, pruned_loss=0.08606, audio_tagging_loss=0.01684, over 15153.00 frames. ], tot_loss[loss=0.2949, simple_loss=0.2325, pruned_loss=0.1959, audio_tagging_loss=0.0161, over 3031463.63 frames. ], batch size: 59, lr: 4.47e-02, grad_scale: 16.0 2023-11-18 02:05:23,353 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 1.072e+02 1.842e+02 2.807e+02 8.662e+02, threshold=3.683e+02, percent-clipped=14.0 2023-11-18 02:05:31,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=8066.666666666667, ans=0.0 2023-11-18 02:05:45,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=7.253333333333334 2023-11-18 02:05:56,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=13.65 2023-11-18 02:06:11,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=8266.666666666666, ans=0.009072463768115942 2023-11-18 02:06:17,038 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1250, loss[loss=0.242, simple_loss=0.1954, pruned_loss=0.1341, audio_tagging_loss=0.0188, over 15163.00 frames. ], tot_loss[loss=0.285, simple_loss=0.2263, pruned_loss=0.1842, audio_tagging_loss=0.01591, over 3032257.66 frames. ], batch size: 57, lr: 4.47e-02, grad_scale: 16.0 2023-11-18 02:06:40,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.79 vs. limit=9.233333333333333 2023-11-18 02:06:49,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8533.333333333334, ans=0.21466666666666667 2023-11-18 02:06:53,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=13.9 2023-11-18 02:07:13,972 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1300, loss[loss=0.2265, simple_loss=0.1801, pruned_loss=0.126, audio_tagging_loss=0.0185, over 15347.00 frames. ], tot_loss[loss=0.2769, simple_loss=0.2216, pruned_loss=0.174, audio_tagging_loss=0.01574, over 3030180.96 frames. ], batch size: 57, lr: 4.47e-02, grad_scale: 16.0 2023-11-18 02:07:17,220 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.700e+01 1.001e+02 1.539e+02 2.707e+02 8.460e+02, threshold=3.079e+02, percent-clipped=10.0 2023-11-18 02:07:17,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=8666.666666666666, ans=0.03055555555555556 2023-11-18 02:07:19,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=8666.666666666666, ans=0.125 2023-11-18 02:07:19,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=8666.666666666666, ans=0.32999999999999996 2023-11-18 02:07:40,377 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.11 vs. limit=14.1 2023-11-18 02:07:47,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.88 vs. limit=10.825 2023-11-18 02:07:50,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8866.666666666666, ans=0.21133333333333332 2023-11-18 02:07:53,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=10.825 2023-11-18 02:07:53,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.31 vs. limit=14.15 2023-11-18 02:08:06,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=8933.333333333334, ans=0.025 2023-11-18 02:08:06,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.31 vs. limit=7.233333333333333 2023-11-18 02:08:08,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=14.2 2023-11-18 02:08:10,221 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1350, loss[loss=0.2555, simple_loss=0.219, pruned_loss=0.1387, audio_tagging_loss=0.01233, over 15776.00 frames. ], tot_loss[loss=0.268, simple_loss=0.216, pruned_loss=0.1643, audio_tagging_loss=0.01556, over 3032501.13 frames. ], batch size: 58, lr: 4.46e-02, grad_scale: 16.0 2023-11-18 02:08:11,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9000.0, ans=0.21000000000000002 2023-11-18 02:08:15,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.33 vs. limit=7.25 2023-11-18 02:08:26,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=10.9 2023-11-18 02:08:32,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=9066.666666666666, ans=0.125 2023-11-18 02:08:35,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=9133.333333333334, ans=0.11551666666666667 2023-11-18 02:08:52,803 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:08:56,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=9266.666666666666, ans=0.125 2023-11-18 02:08:56,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=14.45 2023-11-18 02:09:09,602 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1400, loss[loss=0.2494, simple_loss=0.2074, pruned_loss=0.1338, audio_tagging_loss=0.01692, over 16363.00 frames. ], tot_loss[loss=0.2584, simple_loss=0.2091, pruned_loss=0.1547, audio_tagging_loss=0.0157, over 3032891.57 frames. ], batch size: 60, lr: 4.46e-02, grad_scale: 16.0 2023-11-18 02:09:12,848 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 1.322e+02 1.809e+02 2.689e+02 4.159e+02, threshold=3.617e+02, percent-clipped=14.0 2023-11-18 02:09:14,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.46 vs. limit=7.333333333333334 2023-11-18 02:09:27,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=9400.0, ans=0.008826086956521739 2023-11-18 02:09:27,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.36 vs. limit=4.41 2023-11-18 02:09:35,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.71 vs. limit=9.733333333333333 2023-11-18 02:09:41,363 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=7.786666666666667 2023-11-18 02:10:05,814 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1450, loss[loss=0.1517, simple_loss=0.1211, pruned_loss=0.07671, audio_tagging_loss=0.01672, over 14240.00 frames. ], tot_loss[loss=0.2514, simple_loss=0.2048, pruned_loss=0.1471, audio_tagging_loss=0.01566, over 3036430.60 frames. ], batch size: 56, lr: 4.46e-02, grad_scale: 16.0 2023-11-18 02:10:10,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=11.125 2023-11-18 02:10:15,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=9733.333333333334, ans=0.125 2023-11-18 02:10:25,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.24 vs. limit=14.8 2023-11-18 02:10:28,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=9800.0, ans=0.125 2023-11-18 02:10:29,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9800.0, ans=0.202 2023-11-18 02:10:33,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=9800.0, ans=0.125 2023-11-18 02:10:44,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.88 vs. limit=14.9 2023-11-18 02:10:49,293 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.34 vs. limit=7.466666666666667 2023-11-18 02:10:54,719 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.58 vs. limit=7.483333333333333 2023-11-18 02:11:01,695 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1500, loss[loss=0.1668, simple_loss=0.1352, pruned_loss=0.0838, audio_tagging_loss=0.01738, over 15834.00 frames. ], tot_loss[loss=0.2481, simple_loss=0.2036, pruned_loss=0.1419, audio_tagging_loss=0.01577, over 3038912.70 frames. ], batch size: 63, lr: 4.46e-02, grad_scale: 16.0 2023-11-18 02:11:04,894 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.893e+01 1.138e+02 1.532e+02 2.102e+02 5.614e+02, threshold=3.064e+02, percent-clipped=6.0 2023-11-18 02:11:24,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=10133.333333333334, ans=0.024444444444444446 2023-11-18 02:11:42,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=4.53 2023-11-18 02:11:44,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10200.0, ans=0.198 2023-11-18 02:11:59,193 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1550, loss[loss=0.2258, simple_loss=0.1917, pruned_loss=0.1182, audio_tagging_loss=0.0141, over 14304.00 frames. ], tot_loss[loss=0.2413, simple_loss=0.1992, pruned_loss=0.1353, audio_tagging_loss=0.01578, over 3042183.14 frames. ], batch size: 53, lr: 4.45e-02, grad_scale: 16.0 2023-11-18 02:12:12,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.73 vs. limit=7.6 2023-11-18 02:12:26,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=11.425 2023-11-18 02:12:27,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=10466.666666666666, ans=0.02305555555555556 2023-11-18 02:12:31,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=10533.333333333334, ans=0.022777777777777775 2023-11-18 02:12:32,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=10533.333333333334, ans=0.19466666666666665 2023-11-18 02:12:39,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=10533.333333333334, ans=0.022777777777777775 2023-11-18 02:12:56,141 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1600, loss[loss=0.2326, simple_loss=0.1979, pruned_loss=0.1138, audio_tagging_loss=0.02054, over 15388.00 frames. ], tot_loss[loss=0.2372, simple_loss=0.1966, pruned_loss=0.1307, audio_tagging_loss=0.01584, over 3044431.56 frames. ], batch size: 59, lr: 4.45e-02, grad_scale: 32.0 2023-11-18 02:12:57,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.00 vs. limit=7.666666666666666 2023-11-18 02:12:58,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=10666.666666666666, ans=0.125 2023-11-18 02:12:59,356 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 1.048e+02 1.443e+02 2.212e+02 4.225e+02, threshold=2.886e+02, percent-clipped=6.0 2023-11-18 02:13:36,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10866.666666666666, ans=0.19133333333333336 2023-11-18 02:13:41,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=10933.333333333334, ans=0.14066666666666666 2023-11-18 02:13:45,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.83 vs. limit=15.7 2023-11-18 02:13:51,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=11000.0, ans=0.5150000000000001 2023-11-18 02:13:51,857 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1650, loss[loss=0.2898, simple_loss=0.2476, pruned_loss=0.1515, audio_tagging_loss=0.01667, over 14209.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.1926, pruned_loss=0.1249, audio_tagging_loss=0.01578, over 3044873.07 frames. ], batch size: 53, lr: 4.45e-02, grad_scale: 16.0 2023-11-18 02:13:59,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=11000.0, ans=0.125 2023-11-18 02:14:07,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=11066.666666666666, ans=0.00846376811594203 2023-11-18 02:14:09,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.05 vs. limit=15.8 2023-11-18 02:14:31,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=11200.0, ans=0.020000000000000004 2023-11-18 02:14:48,695 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1700, loss[loss=0.2358, simple_loss=0.2047, pruned_loss=0.12, audio_tagging_loss=0.0143, over 15824.00 frames. ], tot_loss[loss=0.228, simple_loss=0.1918, pruned_loss=0.1215, audio_tagging_loss=0.01559, over 3048912.97 frames. ], batch size: 57, lr: 4.44e-02, grad_scale: 16.0 2023-11-18 02:14:52,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=11333.333333333334, ans=0.125 2023-11-18 02:14:53,006 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.938e+01 1.231e+02 1.950e+02 2.730e+02 7.528e+02, threshold=3.901e+02, percent-clipped=22.0 2023-11-18 02:15:18,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.92 vs. limit=11.8 2023-11-18 02:15:37,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=11600.0, ans=0.49400000000000005 2023-11-18 02:15:44,897 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1750, loss[loss=0.2368, simple_loss=0.2156, pruned_loss=0.1174, audio_tagging_loss=0.01134, over 15442.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.1908, pruned_loss=0.1189, audio_tagging_loss=0.0153, over 3052335.71 frames. ], batch size: 56, lr: 4.44e-02, grad_scale: 16.0 2023-11-18 02:15:53,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=11666.666666666666, ans=0.125 2023-11-18 02:15:56,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=11733.333333333334, ans=0.0 2023-11-18 02:16:01,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.19 vs. limit=11.9 2023-11-18 02:16:17,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=11866.666666666666, ans=0.48466666666666675 2023-11-18 02:16:29,405 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.55 vs. limit=7.983333333333333 2023-11-18 02:16:40,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.78 vs. limit=4.8 2023-11-18 02:16:41,141 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1800, loss[loss=0.2611, simple_loss=0.2326, pruned_loss=0.1325, audio_tagging_loss=0.01263, over 15305.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.1892, pruned_loss=0.1156, audio_tagging_loss=0.01509, over 3051574.77 frames. ], batch size: 56, lr: 4.44e-02, grad_scale: 16.0 2023-11-18 02:16:45,460 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 1.122e+02 1.379e+02 2.095e+02 9.381e+02, threshold=2.759e+02, percent-clipped=5.0 2023-11-18 02:16:56,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=12066.666666666666, ans=12.025 2023-11-18 02:17:06,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=12133.333333333334, ans=0.09899494936611666 2023-11-18 02:17:16,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=12.075 2023-11-18 02:17:21,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=12200.0, ans=0.008217391304347826 2023-11-18 02:17:37,665 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1850, loss[loss=0.178, simple_loss=0.1505, pruned_loss=0.08659, audio_tagging_loss=0.01631, over 14740.00 frames. ], tot_loss[loss=0.2171, simple_loss=0.1858, pruned_loss=0.1116, audio_tagging_loss=0.01505, over 3049071.03 frames. ], batch size: 56, lr: 4.43e-02, grad_scale: 16.0 2023-11-18 02:17:54,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=12.15 2023-11-18 02:18:16,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=12533.333333333334, ans=0.125 2023-11-18 02:18:18,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=12.2 2023-11-18 02:18:25,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=12600.0, ans=0.125 2023-11-18 02:18:33,431 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1900, loss[loss=0.1201, simple_loss=0.09823, pruned_loss=0.05422, audio_tagging_loss=0.01675, over 14004.00 frames. ], tot_loss[loss=0.2124, simple_loss=0.1826, pruned_loss=0.1081, audio_tagging_loss=0.01489, over 3045196.30 frames. ], batch size: 56, lr: 4.43e-02, grad_scale: 16.0 2023-11-18 02:18:33,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=12666.666666666666, ans=0.4566666666666667 2023-11-18 02:18:34,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=12666.666666666666, ans=0.125 2023-11-18 02:18:37,664 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.647e+01 1.124e+02 1.503e+02 2.193e+02 6.798e+02, threshold=3.006e+02, percent-clipped=14.0 2023-11-18 02:18:49,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12733.333333333334, ans=0.17266666666666666 2023-11-18 02:18:54,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.82 vs. limit=4.91 2023-11-18 02:18:55,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.03 vs. limit=8.183333333333334 2023-11-18 02:19:22,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=12933.333333333334, ans=0.125 2023-11-18 02:19:24,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=12933.333333333334, ans=0.17066666666666666 2023-11-18 02:19:29,592 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 1950, loss[loss=0.1543, simple_loss=0.1396, pruned_loss=0.07028, audio_tagging_loss=0.01403, over 14195.00 frames. ], tot_loss[loss=0.2096, simple_loss=0.1814, pruned_loss=0.1056, audio_tagging_loss=0.0148, over 3042835.95 frames. ], batch size: 54, lr: 4.43e-02, grad_scale: 16.0 2023-11-18 02:19:46,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=13066.666666666666, ans=0.012222222222222225 2023-11-18 02:19:47,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=13066.666666666666, ans=0.125 2023-11-18 02:19:52,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=13133.333333333334, ans=0.125 2023-11-18 02:19:53,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.05 vs. limit=17.35 2023-11-18 02:19:58,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.34 vs. limit=17.35 2023-11-18 02:19:58,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=13133.333333333334, ans=0.125 2023-11-18 02:20:06,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=13200.0, ans=0.025 2023-11-18 02:20:09,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=13200.0, ans=0.0 2023-11-18 02:20:26,155 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2000, loss[loss=0.199, simple_loss=0.1811, pruned_loss=0.09702, audio_tagging_loss=0.01137, over 14098.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.1786, pruned_loss=0.1026, audio_tagging_loss=0.01483, over 3040986.59 frames. ], batch size: 53, lr: 4.42e-02, grad_scale: 32.0 2023-11-18 02:20:30,373 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 1.116e+02 1.535e+02 2.034e+02 3.808e+02, threshold=3.071e+02, percent-clipped=5.0 2023-11-18 02:20:30,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=13333.333333333334, ans=0.125 2023-11-18 02:20:32,993 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.82 vs. limit=12.5 2023-11-18 02:20:34,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.12 vs. limit=17.5 2023-11-18 02:20:48,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=13466.666666666666, ans=0.010555555555555561 2023-11-18 02:20:51,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.44 vs. limit=11.733333333333333 2023-11-18 02:20:53,918 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.87 vs. limit=11.733333333333333 2023-11-18 02:21:21,362 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2050, loss[loss=0.251, simple_loss=0.2349, pruned_loss=0.1242, audio_tagging_loss=0.009366, over 16767.00 frames. ], tot_loss[loss=0.2047, simple_loss=0.179, pruned_loss=0.1015, audio_tagging_loss=0.01469, over 3041078.79 frames. ], batch size: 61, lr: 4.42e-02, grad_scale: 32.0 2023-11-18 02:21:27,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=13666.666666666666, ans=0.16333333333333333 2023-11-18 02:21:51,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=12.675 2023-11-18 02:22:02,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.27 vs. limit=17.9 2023-11-18 02:22:17,374 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2100, loss[loss=0.2071, simple_loss=0.191, pruned_loss=0.09672, audio_tagging_loss=0.01486, over 15840.00 frames. ], tot_loss[loss=0.2037, simple_loss=0.179, pruned_loss=0.1002, audio_tagging_loss=0.01464, over 3046566.01 frames. ], batch size: 57, lr: 4.42e-02, grad_scale: 32.0 2023-11-18 02:22:21,607 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.391e+01 1.118e+02 1.317e+02 1.653e+02 4.106e+02, threshold=2.634e+02, percent-clipped=4.0 2023-11-18 02:22:45,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.43 vs. limit=12.8 2023-11-18 02:22:52,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.27 vs. limit=18.15 2023-11-18 02:22:56,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=14200.0, ans=0.0075 2023-11-18 02:22:57,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.78 vs. limit=18.15 2023-11-18 02:23:10,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=14266.666666666666, ans=0.125 2023-11-18 02:23:12,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14333.333333333334, ans=0.15666666666666668 2023-11-18 02:23:13,721 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2150, loss[loss=0.1682, simple_loss=0.1501, pruned_loss=0.07744, audio_tagging_loss=0.0157, over 14272.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.1774, pruned_loss=0.09799, audio_tagging_loss=0.01478, over 3042989.80 frames. ], batch size: 54, lr: 4.41e-02, grad_scale: 32.0 2023-11-18 02:23:13,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=14333.333333333334, ans=0.125 2023-11-18 02:23:46,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=14533.333333333334, ans=0.006111111111111109 2023-11-18 02:23:47,738 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:23:51,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=14533.333333333334, ans=0.006111111111111109 2023-11-18 02:23:54,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=12.95 2023-11-18 02:24:07,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.12 vs. limit=8.65 2023-11-18 02:24:10,327 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2200, loss[loss=0.2268, simple_loss=0.1974, pruned_loss=0.1106, audio_tagging_loss=0.01745, over 14998.00 frames. ], tot_loss[loss=0.1996, simple_loss=0.177, pruned_loss=0.09677, audio_tagging_loss=0.01473, over 3041315.35 frames. ], batch size: 53, lr: 4.41e-02, grad_scale: 32.0 2023-11-18 02:24:14,663 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.017e+01 1.117e+02 1.377e+02 2.009e+02 5.109e+02, threshold=2.755e+02, percent-clipped=7.0 2023-11-18 02:24:17,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=14666.666666666666, ans=0.125 2023-11-18 02:24:23,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=14733.333333333334, ans=0.125 2023-11-18 02:24:31,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.99 vs. limit=5.21 2023-11-18 02:24:59,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=14933.333333333334, ans=0.15066666666666667 2023-11-18 02:25:01,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=14933.333333333334, ans=0.125 2023-11-18 02:25:06,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=15000.0, ans=0.15000000000000002 2023-11-18 02:25:07,604 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2250, loss[loss=0.201, simple_loss=0.187, pruned_loss=0.09552, audio_tagging_loss=0.012, over 15520.00 frames. ], tot_loss[loss=0.1984, simple_loss=0.1766, pruned_loss=0.09558, audio_tagging_loss=0.01481, over 3044507.53 frames. ], batch size: 55, lr: 4.40e-02, grad_scale: 32.0 2023-11-18 02:26:02,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15266.666666666666, ans=0.14733333333333334 2023-11-18 02:26:05,406 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2300, loss[loss=0.2002, simple_loss=0.1908, pruned_loss=0.09358, audio_tagging_loss=0.01125, over 14898.00 frames. ], tot_loss[loss=0.1982, simple_loss=0.1779, pruned_loss=0.09476, audio_tagging_loss=0.01477, over 3046347.98 frames. ], batch size: 56, lr: 4.40e-02, grad_scale: 32.0 2023-11-18 02:26:09,712 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 1.107e+02 1.429e+02 1.999e+02 3.636e+02, threshold=2.858e+02, percent-clipped=5.0 2023-11-18 02:26:15,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=13.275 2023-11-18 02:26:18,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=15400.0, ans=0.361 2023-11-18 02:26:41,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=15533.333333333334, ans=0.007492753623188406 2023-11-18 02:26:53,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.09 vs. limit=5.34 2023-11-18 02:26:56,030 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:27:01,480 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2350, loss[loss=0.1817, simple_loss=0.1606, pruned_loss=0.086, audio_tagging_loss=0.01544, over 14283.00 frames. ], tot_loss[loss=0.1966, simple_loss=0.1771, pruned_loss=0.09349, audio_tagging_loss=0.0147, over 3046613.05 frames. ], batch size: 55, lr: 4.40e-02, grad_scale: 32.0 2023-11-18 02:27:20,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.47 vs. limit=13.4 2023-11-18 02:27:41,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=15866.666666666666, ans=0.0005555555555555522 2023-11-18 02:27:43,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=15866.666666666666, ans=0.05 2023-11-18 02:27:50,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=15933.333333333334, ans=0.125 2023-11-18 02:27:57,898 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2400, loss[loss=0.148, simple_loss=0.1358, pruned_loss=0.0686, audio_tagging_loss=0.01149, over 14538.00 frames. ], tot_loss[loss=0.1936, simple_loss=0.1748, pruned_loss=0.09166, audio_tagging_loss=0.01469, over 3037103.22 frames. ], batch size: 55, lr: 4.39e-02, grad_scale: 32.0 2023-11-18 02:28:02,145 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.230e+01 1.240e+02 1.395e+02 1.790e+02 3.155e+02, threshold=2.791e+02, percent-clipped=5.0 2023-11-18 02:28:03,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=19.5 2023-11-18 02:28:36,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16200.0, ans=0.138 2023-11-18 02:28:39,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=16200.0, ans=0.138 2023-11-18 02:28:54,742 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2450, loss[loss=0.1943, simple_loss=0.1765, pruned_loss=0.08958, audio_tagging_loss=0.01651, over 15056.00 frames. ], tot_loss[loss=0.1896, simple_loss=0.1714, pruned_loss=0.08917, audio_tagging_loss=0.01486, over 3039032.29 frames. ], batch size: 57, lr: 4.39e-02, grad_scale: 32.0 2023-11-18 02:28:58,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=16333.333333333334, ans=0.07 2023-11-18 02:29:08,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.97 vs. limit=13.2 2023-11-18 02:29:10,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=16400.0, ans=0.125 2023-11-18 02:29:22,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=19.85 2023-11-18 02:29:29,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.71 vs. limit=19.9 2023-11-18 02:29:33,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=16533.333333333332, ans=0.125 2023-11-18 02:29:36,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=16533.333333333332, ans=0.32133333333333347 2023-11-18 02:29:48,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=13.725 2023-11-18 02:29:49,866 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2500, loss[loss=0.1921, simple_loss=0.1877, pruned_loss=0.08453, audio_tagging_loss=0.01367, over 15005.00 frames. ], tot_loss[loss=0.1891, simple_loss=0.1718, pruned_loss=0.08861, audio_tagging_loss=0.01474, over 3045003.95 frames. ], batch size: 54, lr: 4.38e-02, grad_scale: 32.0 2023-11-18 02:29:53,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=16666.666666666668, ans=0.125 2023-11-18 02:29:54,091 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.242e+01 1.096e+02 1.316e+02 1.723e+02 3.236e+02, threshold=2.632e+02, percent-clipped=4.0 2023-11-18 02:29:57,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=16666.666666666668, ans=0.0 2023-11-18 02:30:01,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16733.333333333332, ans=0.13266666666666668 2023-11-18 02:30:19,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=16800.0, ans=0.0 2023-11-18 02:30:45,336 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2550, loss[loss=0.1722, simple_loss=0.1484, pruned_loss=0.08006, audio_tagging_loss=0.01791, over 14552.00 frames. ], tot_loss[loss=0.1879, simple_loss=0.1709, pruned_loss=0.08788, audio_tagging_loss=0.01466, over 3038499.27 frames. ], batch size: 55, lr: 4.38e-02, grad_scale: 32.0 2023-11-18 02:30:47,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=17000.0, ans=0.125 2023-11-18 02:30:56,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=17066.666666666668, ans=0.30266666666666675 2023-11-18 02:31:14,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=13.924999999999999 2023-11-18 02:31:20,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=17200.0, ans=0.0 2023-11-18 02:31:21,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.12 vs. limit=13.6 2023-11-18 02:31:27,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=13.95 2023-11-18 02:31:28,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.26 vs. limit=20.4 2023-11-18 02:31:43,163 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2600, loss[loss=0.1784, simple_loss=0.1669, pruned_loss=0.08108, audio_tagging_loss=0.01385, over 14834.00 frames. ], tot_loss[loss=0.1856, simple_loss=0.1696, pruned_loss=0.08643, audio_tagging_loss=0.01445, over 3047689.30 frames. ], batch size: 54, lr: 4.37e-02, grad_scale: 32.0 2023-11-18 02:31:47,404 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.043e+01 1.250e+02 1.620e+02 2.059e+02 4.953e+02, threshold=3.240e+02, percent-clipped=12.0 2023-11-18 02:31:49,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.35 vs. limit=9.333333333333332 2023-11-18 02:32:04,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=14.05 2023-11-18 02:32:07,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=17466.666666666668, ans=0.28866666666666674 2023-11-18 02:32:13,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=17466.666666666668, ans=0.12533333333333332 2023-11-18 02:32:17,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=17533.333333333332, ans=0.125 2023-11-18 02:32:18,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=17533.333333333332, ans=0.0 2023-11-18 02:32:33,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=17600.0, ans=0.125 2023-11-18 02:32:39,187 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2650, loss[loss=0.152, simple_loss=0.1379, pruned_loss=0.06516, audio_tagging_loss=0.01792, over 15038.00 frames. ], tot_loss[loss=0.1837, simple_loss=0.1685, pruned_loss=0.08515, audio_tagging_loss=0.01437, over 3037202.13 frames. ], batch size: 57, lr: 4.37e-02, grad_scale: 32.0 2023-11-18 02:32:41,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=17666.666666666668, ans=0.12333333333333332 2023-11-18 02:32:43,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=17666.666666666668, ans=0.125 2023-11-18 02:32:43,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.01 vs. limit=20.75 2023-11-18 02:32:53,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=17733.333333333332, ans=0.07 2023-11-18 02:32:57,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=17733.333333333332, ans=0.27933333333333343 2023-11-18 02:32:59,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=17800.0, ans=0.04949747468305833 2023-11-18 02:33:01,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.75 vs. limit=20.85 2023-11-18 02:33:10,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.03 vs. limit=5.0 2023-11-18 02:33:34,666 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2700, loss[loss=0.2208, simple_loss=0.1983, pruned_loss=0.1088, audio_tagging_loss=0.01279, over 15971.00 frames. ], tot_loss[loss=0.1845, simple_loss=0.1696, pruned_loss=0.08553, audio_tagging_loss=0.01421, over 3048859.43 frames. ], batch size: 59, lr: 4.36e-02, grad_scale: 32.0 2023-11-18 02:33:36,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=18000.0, ans=10.0 2023-11-18 02:33:37,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=18000.0, ans=0.125 2023-11-18 02:33:38,907 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.040e+01 1.101e+02 1.289e+02 1.771e+02 2.746e+02, threshold=2.578e+02, percent-clipped=0.0 2023-11-18 02:33:39,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.94 vs. limit=9.5 2023-11-18 02:33:47,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=18066.666666666668, ans=0.125 2023-11-18 02:33:54,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=18066.666666666668, ans=0.125 2023-11-18 02:33:57,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=18133.333333333332, ans=0.0 2023-11-18 02:33:59,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=14.3 2023-11-18 02:34:08,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=18200.0, ans=0.0 2023-11-18 02:34:10,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=18200.0, ans=0.125 2023-11-18 02:34:21,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=21.2 2023-11-18 02:34:31,185 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2750, loss[loss=0.158, simple_loss=0.154, pruned_loss=0.06977, audio_tagging_loss=0.01124, over 15416.00 frames. ], tot_loss[loss=0.1811, simple_loss=0.1672, pruned_loss=0.08335, audio_tagging_loss=0.01418, over 3049413.73 frames. ], batch size: 59, lr: 4.36e-02, grad_scale: 32.0 2023-11-18 02:34:32,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.12 vs. limit=21.25 2023-11-18 02:34:52,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.25 vs. limit=21.35 2023-11-18 02:34:57,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.63 vs. limit=5.77 2023-11-18 02:34:58,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=5.77 2023-11-18 02:35:02,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=18466.666666666668, ans=0.2536666666666667 2023-11-18 02:35:11,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=18533.333333333332, ans=0.125 2023-11-18 02:35:20,306 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:35:27,851 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2800, loss[loss=0.1775, simple_loss=0.1764, pruned_loss=0.07644, audio_tagging_loss=0.01287, over 14973.00 frames. ], tot_loss[loss=0.1805, simple_loss=0.1671, pruned_loss=0.08291, audio_tagging_loss=0.0141, over 3040498.74 frames. ], batch size: 53, lr: 4.36e-02, grad_scale: 32.0 2023-11-18 02:35:32,076 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.686e+01 1.129e+02 1.327e+02 1.684e+02 3.032e+02, threshold=2.655e+02, percent-clipped=2.0 2023-11-18 02:36:01,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=18866.666666666668, ans=0.125 2023-11-18 02:36:05,200 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=14.575 2023-11-18 02:36:09,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=18866.666666666668, ans=0.125 2023-11-18 02:36:16,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18933.333333333332, ans=0.11066666666666669 2023-11-18 02:36:23,509 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2850, loss[loss=0.1627, simple_loss=0.1477, pruned_loss=0.07629, audio_tagging_loss=0.01257, over 14916.00 frames. ], tot_loss[loss=0.1804, simple_loss=0.1677, pruned_loss=0.08251, audio_tagging_loss=0.01402, over 3034746.81 frames. ], batch size: 56, lr: 4.35e-02, grad_scale: 32.0 2023-11-18 02:36:24,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.92 vs. limit=21.75 2023-11-18 02:36:48,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=19133.333333333332, ans=0.0 2023-11-18 02:36:54,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=19133.333333333332, ans=0.125 2023-11-18 02:37:04,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=19200.0, ans=0.22799999999999998 2023-11-18 02:37:05,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=19200.0, ans=0.125 2023-11-18 02:37:21,216 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2900, loss[loss=0.1552, simple_loss=0.1527, pruned_loss=0.06644, audio_tagging_loss=0.01236, over 14696.00 frames. ], tot_loss[loss=0.1784, simple_loss=0.166, pruned_loss=0.08142, audio_tagging_loss=0.01404, over 3030120.66 frames. ], batch size: 54, lr: 4.35e-02, grad_scale: 32.0 2023-11-18 02:37:25,459 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 1.019e+02 1.241e+02 1.587e+02 2.643e+02, threshold=2.482e+02, percent-clipped=0.0 2023-11-18 02:37:25,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=19333.333333333332, ans=0.125 2023-11-18 02:37:30,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=19333.333333333332, ans=0.10666666666666669 2023-11-18 02:37:36,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=14.775 2023-11-18 02:37:51,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19466.666666666668, ans=0.10533333333333333 2023-11-18 02:37:54,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=19533.333333333332, ans=0.125 2023-11-18 02:38:10,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.68 vs. limit=22.2 2023-11-18 02:38:17,189 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 2950, loss[loss=0.1693, simple_loss=0.1599, pruned_loss=0.07337, audio_tagging_loss=0.01597, over 13988.00 frames. ], tot_loss[loss=0.1789, simple_loss=0.1669, pruned_loss=0.08145, audio_tagging_loss=0.01406, over 3036821.48 frames. ], batch size: 54, lr: 4.34e-02, grad_scale: 32.0 2023-11-18 02:38:25,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=11.866666666666667 2023-11-18 02:38:33,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=19733.333333333332, ans=0.125 2023-11-18 02:38:39,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=19800.0, ans=0.07 2023-11-18 02:39:04,692 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:39:10,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=19933.333333333332, ans=0.20233333333333337 2023-11-18 02:39:11,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=19933.333333333332, ans=0.125 2023-11-18 02:39:13,989 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3000, loss[loss=0.1644, simple_loss=0.1389, pruned_loss=0.07368, audio_tagging_loss=0.0213, over 15992.00 frames. ], tot_loss[loss=0.1784, simple_loss=0.1664, pruned_loss=0.08092, audio_tagging_loss=0.01427, over 3041379.00 frames. ], batch size: 59, lr: 4.34e-02, grad_scale: 32.0 2023-11-18 02:39:13,990 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 02:39:26,473 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.0835, 3.1491, 3.4481, 4.0668], device='cuda:1') 2023-11-18 02:39:27,151 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.4760, 6.2825, 6.2502, 6.2981], device='cuda:1') 2023-11-18 02:39:47,913 INFO [train_asr.py:1147] (1/4) Epoch 1, validation: loss=0.1123, simple_loss=0.08353, pruned_loss=0.02777, audio_tagging_loss=0.04274, over 4681554.00 frames. 2023-11-18 02:39:47,914 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 02:39:51,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=20000.0, ans=0.1 2023-11-18 02:39:52,080 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.173e+01 1.112e+02 1.246e+02 1.564e+02 3.954e+02, threshold=2.493e+02, percent-clipped=6.0 2023-11-18 02:39:56,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=20000.0, ans=0.0 2023-11-18 02:39:59,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.80 vs. limit=10.0 2023-11-18 02:40:03,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=20066.666666666668, ans=0.0 2023-11-18 02:40:05,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=20066.666666666668, ans=0.0 2023-11-18 02:40:06,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=20066.666666666668, ans=0.125 2023-11-18 02:40:12,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20133.333333333332, ans=0.1 2023-11-18 02:40:43,608 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3050, loss[loss=0.1507, simple_loss=0.1441, pruned_loss=0.06139, audio_tagging_loss=0.01724, over 15380.00 frames. ], tot_loss[loss=0.1781, simple_loss=0.1668, pruned_loss=0.08045, audio_tagging_loss=0.01421, over 3043985.26 frames. ], batch size: 61, lr: 4.33e-02, grad_scale: 32.0 2023-11-18 02:40:44,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=20333.333333333332, ans=0.1 2023-11-18 02:40:45,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.86 vs. limit=15.0 2023-11-18 02:40:48,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=20333.333333333332, ans=0.125 2023-11-18 02:40:51,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.72 vs. limit=15.0 2023-11-18 02:40:56,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.46 vs. limit=22.5 2023-11-18 02:41:07,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=20466.666666666668, ans=0.125 2023-11-18 02:41:17,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=20533.333333333332, ans=0.125 2023-11-18 02:41:18,272 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:41:21,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=20533.333333333332, ans=0.125 2023-11-18 02:41:39,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.42 vs. limit=15.0 2023-11-18 02:41:40,405 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3100, loss[loss=0.1985, simple_loss=0.2024, pruned_loss=0.0861, audio_tagging_loss=0.01118, over 15757.00 frames. ], tot_loss[loss=0.178, simple_loss=0.1672, pruned_loss=0.08015, audio_tagging_loss=0.01422, over 3041185.38 frames. ], batch size: 58, lr: 4.33e-02, grad_scale: 32.0 2023-11-18 02:41:40,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=20666.666666666668, ans=0.04949747468305833 2023-11-18 02:41:44,733 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 1.051e+02 1.308e+02 1.673e+02 2.696e+02, threshold=2.616e+02, percent-clipped=3.0 2023-11-18 02:41:58,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.72 vs. limit=15.0 2023-11-18 02:42:07,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=20800.0, ans=0.05 2023-11-18 02:42:08,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=20800.0, ans=0.0 2023-11-18 02:42:11,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=15.0 2023-11-18 02:42:16,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=20866.666666666668, ans=0.0 2023-11-18 02:42:21,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=20866.666666666668, ans=10.0 2023-11-18 02:42:23,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20866.666666666668, ans=0.1 2023-11-18 02:42:27,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=20933.333333333332, ans=0.125 2023-11-18 02:42:37,777 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3150, loss[loss=0.1963, simple_loss=0.1815, pruned_loss=0.09249, audio_tagging_loss=0.01309, over 15457.00 frames. ], tot_loss[loss=0.1765, simple_loss=0.1659, pruned_loss=0.07911, audio_tagging_loss=0.01441, over 3038884.40 frames. ], batch size: 57, lr: 4.32e-02, grad_scale: 32.0 2023-11-18 02:42:39,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=21000.0, ans=0.0 2023-11-18 02:42:43,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=21000.0, ans=0.125 2023-11-18 02:42:50,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=12.0 2023-11-18 02:42:52,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=21066.666666666668, ans=0.125 2023-11-18 02:42:59,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=21133.333333333332, ans=0.125 2023-11-18 02:42:59,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=21133.333333333332, ans=0.125 2023-11-18 02:43:02,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=21133.333333333332, ans=0.125 2023-11-18 02:43:07,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.63 vs. limit=15.0 2023-11-18 02:43:10,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=21200.0, ans=0.125 2023-11-18 02:43:11,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=21200.0, ans=0.125 2023-11-18 02:43:15,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=21200.0, ans=0.125 2023-11-18 02:43:15,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.44 vs. limit=15.0 2023-11-18 02:43:28,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=21266.666666666668, ans=0.2 2023-11-18 02:43:34,050 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3200, loss[loss=0.1895, simple_loss=0.1901, pruned_loss=0.08612, audio_tagging_loss=0.008296, over 14280.00 frames. ], tot_loss[loss=0.1757, simple_loss=0.1655, pruned_loss=0.07855, audio_tagging_loss=0.01443, over 3045529.40 frames. ], batch size: 54, lr: 4.32e-02, grad_scale: 32.0 2023-11-18 02:43:35,720 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=15.0 2023-11-18 02:43:38,318 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 1.064e+02 1.244e+02 1.490e+02 2.410e+02, threshold=2.488e+02, percent-clipped=0.0 2023-11-18 02:43:47,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=21400.0, ans=0.2 2023-11-18 02:43:56,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=21466.666666666668, ans=0.125 2023-11-18 02:44:00,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=21466.666666666668, ans=0.0 2023-11-18 02:44:04,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=21466.666666666668, ans=0.2 2023-11-18 02:44:17,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2023-11-18 02:44:22,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=21600.0, ans=0.0 2023-11-18 02:44:30,114 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3250, loss[loss=0.1492, simple_loss=0.1375, pruned_loss=0.06657, audio_tagging_loss=0.01394, over 15162.00 frames. ], tot_loss[loss=0.1747, simple_loss=0.1645, pruned_loss=0.07797, audio_tagging_loss=0.01445, over 3042576.65 frames. ], batch size: 57, lr: 4.31e-02, grad_scale: 32.0 2023-11-18 02:44:44,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=21733.333333333332, ans=0.0 2023-11-18 02:45:19,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=21933.333333333332, ans=0.2 2023-11-18 02:45:26,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22000.0, ans=0.1 2023-11-18 02:45:27,667 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3300, loss[loss=0.159, simple_loss=0.1512, pruned_loss=0.07272, audio_tagging_loss=0.01067, over 15978.00 frames. ], tot_loss[loss=0.1734, simple_loss=0.1633, pruned_loss=0.07723, audio_tagging_loss=0.01456, over 3047173.24 frames. ], batch size: 58, lr: 4.31e-02, grad_scale: 32.0 2023-11-18 02:45:31,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=22000.0, ans=0.125 2023-11-18 02:45:32,503 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.259e+01 1.069e+02 1.225e+02 1.477e+02 2.736e+02, threshold=2.451e+02, percent-clipped=1.0 2023-11-18 02:45:43,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=22066.666666666668, ans=0.125 2023-11-18 02:45:45,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=22066.666666666668, ans=0.1 2023-11-18 02:45:52,547 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:45:54,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=22133.333333333332, ans=0.125 2023-11-18 02:45:55,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.39 vs. limit=6.0 2023-11-18 02:46:02,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=22200.0, ans=0.006043478260869565 2023-11-18 02:46:11,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=22200.0, ans=0.0 2023-11-18 02:46:22,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=22266.666666666668, ans=0.006028985507246377 2023-11-18 02:46:24,722 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3350, loss[loss=0.1676, simple_loss=0.1615, pruned_loss=0.07081, audio_tagging_loss=0.01598, over 15879.00 frames. ], tot_loss[loss=0.1725, simple_loss=0.163, pruned_loss=0.07661, audio_tagging_loss=0.01434, over 3051080.67 frames. ], batch size: 57, lr: 4.30e-02, grad_scale: 32.0 2023-11-18 02:46:28,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2023-11-18 02:46:37,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22400.0, ans=0.1 2023-11-18 02:46:42,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.04 vs. limit=15.0 2023-11-18 02:47:21,311 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3400, loss[loss=0.1516, simple_loss=0.1481, pruned_loss=0.06355, audio_tagging_loss=0.01404, over 15687.00 frames. ], tot_loss[loss=0.1726, simple_loss=0.1635, pruned_loss=0.07663, audio_tagging_loss=0.01422, over 3054338.86 frames. ], batch size: 56, lr: 4.29e-02, grad_scale: 32.0 2023-11-18 02:47:21,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=22666.666666666668, ans=0.125 2023-11-18 02:47:25,581 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.820e+01 1.014e+02 1.234e+02 1.515e+02 3.091e+02, threshold=2.469e+02, percent-clipped=0.0 2023-11-18 02:47:27,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=22666.666666666668, ans=0.005942028985507246 2023-11-18 02:47:40,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=22733.333333333332, ans=0.125 2023-11-18 02:48:14,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=22933.333333333332, ans=0.0058840579710144935 2023-11-18 02:48:18,687 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3450, loss[loss=0.1882, simple_loss=0.1934, pruned_loss=0.07942, audio_tagging_loss=0.01209, over 15369.00 frames. ], tot_loss[loss=0.1728, simple_loss=0.164, pruned_loss=0.07668, audio_tagging_loss=0.01407, over 3056100.34 frames. ], batch size: 57, lr: 4.29e-02, grad_scale: 32.0 2023-11-18 02:48:25,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2023-11-18 02:48:27,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=23000.0, ans=0.1 2023-11-18 02:48:27,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2023-11-18 02:48:32,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=23066.666666666668, ans=0.2 2023-11-18 02:48:50,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=23200.0, ans=0.0 2023-11-18 02:49:02,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=23266.666666666668, ans=0.0 2023-11-18 02:49:05,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=23266.666666666668, ans=0.00581159420289855 2023-11-18 02:49:15,058 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3500, loss[loss=0.1749, simple_loss=0.1601, pruned_loss=0.08238, audio_tagging_loss=0.01245, over 14750.00 frames. ], tot_loss[loss=0.1699, simple_loss=0.1612, pruned_loss=0.07527, audio_tagging_loss=0.014, over 3057585.71 frames. ], batch size: 56, lr: 4.28e-02, grad_scale: 32.0 2023-11-18 02:49:19,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.83 vs. limit=15.0 2023-11-18 02:49:19,473 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.153e+01 1.129e+02 1.309e+02 1.633e+02 2.948e+02, threshold=2.617e+02, percent-clipped=2.0 2023-11-18 02:49:22,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.27 vs. limit=22.5 2023-11-18 02:49:38,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=23466.666666666668, ans=0.125 2023-11-18 02:49:39,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=23466.666666666668, ans=0.125 2023-11-18 02:49:44,223 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:50:00,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=23600.0, ans=0.125 2023-11-18 02:50:10,784 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3550, loss[loss=0.149, simple_loss=0.1464, pruned_loss=0.05895, audio_tagging_loss=0.0168, over 16066.00 frames. ], tot_loss[loss=0.169, simple_loss=0.1607, pruned_loss=0.07467, audio_tagging_loss=0.01399, over 3055196.66 frames. ], batch size: 59, lr: 4.28e-02, grad_scale: 32.0 2023-11-18 02:50:38,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.74 vs. limit=6.0 2023-11-18 02:50:53,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=23866.666666666668, ans=0.1 2023-11-18 02:50:53,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=23866.666666666668, ans=0.2 2023-11-18 02:51:08,216 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3600, loss[loss=0.212, simple_loss=0.2082, pruned_loss=0.09649, audio_tagging_loss=0.0114, over 14681.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.1589, pruned_loss=0.07353, audio_tagging_loss=0.01386, over 3048383.43 frames. ], batch size: 55, lr: 4.27e-02, grad_scale: 32.0 2023-11-18 02:51:10,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=24000.0, ans=0.125 2023-11-18 02:51:13,841 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.609e+01 1.015e+02 1.156e+02 1.393e+02 2.534e+02, threshold=2.312e+02, percent-clipped=0.0 2023-11-18 02:51:17,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=24000.0, ans=0.125 2023-11-18 02:51:20,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=24066.666666666668, ans=0.00563768115942029 2023-11-18 02:51:22,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=24066.666666666668, ans=0.2 2023-11-18 02:51:25,536 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:51:33,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=24133.333333333332, ans=0.005623188405797102 2023-11-18 02:51:49,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=24200.0, ans=0.125 2023-11-18 02:52:05,637 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3650, loss[loss=0.1967, simple_loss=0.1865, pruned_loss=0.08936, audio_tagging_loss=0.01408, over 15363.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.1576, pruned_loss=0.07339, audio_tagging_loss=0.01391, over 3049190.42 frames. ], batch size: 55, lr: 4.27e-02, grad_scale: 64.0 2023-11-18 02:52:07,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=24333.333333333332, ans=0.0 2023-11-18 02:52:08,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=24333.333333333332, ans=0.125 2023-11-18 02:52:53,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2023-11-18 02:52:53,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=24600.0, ans=0.125 2023-11-18 02:53:01,437 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3700, loss[loss=0.2248, simple_loss=0.2217, pruned_loss=0.1045, audio_tagging_loss=0.009425, over 14494.00 frames. ], tot_loss[loss=0.1687, simple_loss=0.1605, pruned_loss=0.07459, audio_tagging_loss=0.01388, over 3050894.23 frames. ], batch size: 55, lr: 4.26e-02, grad_scale: 64.0 2023-11-18 02:53:03,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.95 vs. limit=22.5 2023-11-18 02:53:04,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=24666.666666666668, ans=0.1 2023-11-18 02:53:05,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 1.068e+02 1.322e+02 1.624e+02 2.925e+02, threshold=2.645e+02, percent-clipped=5.0 2023-11-18 02:53:12,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.46 vs. limit=15.0 2023-11-18 02:53:13,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=24733.333333333332, ans=0.125 2023-11-18 02:53:15,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24733.333333333332, ans=0.1 2023-11-18 02:53:27,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=24800.0, ans=0.005478260869565218 2023-11-18 02:53:32,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=24800.0, ans=0.125 2023-11-18 02:53:34,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=24800.0, ans=0.125 2023-11-18 02:53:38,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.54 vs. limit=15.0 2023-11-18 02:53:39,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24866.666666666668, ans=0.1 2023-11-18 02:53:58,196 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3750, loss[loss=0.1776, simple_loss=0.1758, pruned_loss=0.07894, audio_tagging_loss=0.0108, over 14912.00 frames. ], tot_loss[loss=0.1691, simple_loss=0.1609, pruned_loss=0.07474, audio_tagging_loss=0.01387, over 3053071.07 frames. ], batch size: 55, lr: 4.26e-02, grad_scale: 64.0 2023-11-18 02:54:13,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.40 vs. limit=10.0 2023-11-18 02:54:27,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.64 vs. limit=22.5 2023-11-18 02:54:37,592 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:54:56,676 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3800, loss[loss=0.1877, simple_loss=0.1779, pruned_loss=0.08193, audio_tagging_loss=0.01682, over 14400.00 frames. ], tot_loss[loss=0.1692, simple_loss=0.1613, pruned_loss=0.07456, audio_tagging_loss=0.01403, over 3050902.36 frames. ], batch size: 54, lr: 4.25e-02, grad_scale: 64.0 2023-11-18 02:55:01,041 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.087e+01 1.058e+02 1.234e+02 1.426e+02 2.558e+02, threshold=2.469e+02, percent-clipped=0.0 2023-11-18 02:55:04,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=25333.333333333332, ans=0.2 2023-11-18 02:55:08,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=25400.0, ans=0.125 2023-11-18 02:55:10,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=25400.0, ans=0.125 2023-11-18 02:55:15,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=25400.0, ans=0.125 2023-11-18 02:55:15,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.68 vs. limit=15.0 2023-11-18 02:55:27,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=15.0 2023-11-18 02:55:29,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=25533.333333333332, ans=0.2 2023-11-18 02:55:33,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=25533.333333333332, ans=0.125 2023-11-18 02:55:42,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=25600.0, ans=0.0 2023-11-18 02:55:47,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=25600.0, ans=0.005304347826086957 2023-11-18 02:55:49,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=25600.0, ans=0.125 2023-11-18 02:55:53,505 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3850, loss[loss=0.1615, simple_loss=0.1567, pruned_loss=0.07131, audio_tagging_loss=0.01187, over 16551.00 frames. ], tot_loss[loss=0.1696, simple_loss=0.1619, pruned_loss=0.0746, audio_tagging_loss=0.01408, over 3048045.92 frames. ], batch size: 63, lr: 4.24e-02, grad_scale: 64.0 2023-11-18 02:56:07,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=25733.333333333332, ans=0.125 2023-11-18 02:56:13,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=25733.333333333332, ans=0.125 2023-11-18 02:56:16,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=25800.0, ans=0.025 2023-11-18 02:56:21,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.69 vs. limit=22.5 2023-11-18 02:56:30,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=25866.666666666668, ans=0.2 2023-11-18 02:56:34,569 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:56:35,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=25866.666666666668, ans=10.0 2023-11-18 02:56:37,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=25933.333333333332, ans=0.0 2023-11-18 02:56:38,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=25933.333333333332, ans=0.125 2023-11-18 02:56:40,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.77 vs. limit=5.0 2023-11-18 02:56:43,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.06 vs. limit=22.5 2023-11-18 02:56:48,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.61 vs. limit=10.0 2023-11-18 02:56:49,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-11-18 02:56:49,652 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3900, loss[loss=0.1465, simple_loss=0.1361, pruned_loss=0.0648, audio_tagging_loss=0.01369, over 14666.00 frames. ], tot_loss[loss=0.1674, simple_loss=0.1598, pruned_loss=0.07332, audio_tagging_loss=0.01421, over 3035402.08 frames. ], batch size: 55, lr: 4.24e-02, grad_scale: 64.0 2023-11-18 02:56:55,011 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.882e+01 1.063e+02 1.269e+02 1.447e+02 2.279e+02, threshold=2.539e+02, percent-clipped=0.0 2023-11-18 02:56:55,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=26000.0, ans=0.0 2023-11-18 02:57:01,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=26066.666666666668, ans=0.125 2023-11-18 02:57:04,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=26066.666666666668, ans=0.05 2023-11-18 02:57:05,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=26066.666666666668, ans=15.0 2023-11-18 02:57:11,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=26066.666666666668, ans=0.125 2023-11-18 02:57:11,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2023-11-18 02:57:40,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=26266.666666666668, ans=0.0 2023-11-18 02:57:47,474 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 3950, loss[loss=0.1617, simple_loss=0.1524, pruned_loss=0.06921, audio_tagging_loss=0.01632, over 15545.00 frames. ], tot_loss[loss=0.1666, simple_loss=0.1588, pruned_loss=0.07281, audio_tagging_loss=0.01435, over 3043291.50 frames. ], batch size: 59, lr: 4.23e-02, grad_scale: 64.0 2023-11-18 02:57:53,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=26333.333333333332, ans=0.2 2023-11-18 02:57:58,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.79 vs. limit=15.0 2023-11-18 02:58:09,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=26466.666666666668, ans=0.125 2023-11-18 02:58:12,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=26466.666666666668, ans=0.005115942028985507 2023-11-18 02:58:23,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=26533.333333333332, ans=0.2 2023-11-18 02:58:31,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=26600.0, ans=0.2 2023-11-18 02:58:34,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=26600.0, ans=0.05 2023-11-18 02:58:35,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=26600.0, ans=0.2 2023-11-18 02:58:46,895 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4000, loss[loss=0.1787, simple_loss=0.1662, pruned_loss=0.08224, audio_tagging_loss=0.01329, over 15199.00 frames. ], tot_loss[loss=0.1679, simple_loss=0.1603, pruned_loss=0.07343, audio_tagging_loss=0.01438, over 3045777.48 frames. ], batch size: 56, lr: 4.23e-02, grad_scale: 64.0 2023-11-18 02:58:48,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=26666.666666666668, ans=0.125 2023-11-18 02:58:51,137 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.575e+01 1.092e+02 1.270e+02 1.504e+02 2.237e+02, threshold=2.540e+02, percent-clipped=0.0 2023-11-18 02:59:17,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=26800.0, ans=0.125 2023-11-18 02:59:26,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=26866.666666666668, ans=6.0 2023-11-18 02:59:32,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.38 vs. limit=22.5 2023-11-18 02:59:43,010 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4050, loss[loss=0.114, simple_loss=0.1052, pruned_loss=0.04613, audio_tagging_loss=0.01527, over 14964.00 frames. ], tot_loss[loss=0.168, simple_loss=0.1605, pruned_loss=0.07328, audio_tagging_loss=0.01446, over 3043108.43 frames. ], batch size: 58, lr: 4.22e-02, grad_scale: 64.0 2023-11-18 02:59:44,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=27000.0, ans=0.5 2023-11-18 02:59:46,338 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:59:46,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=27000.0, ans=0.125 2023-11-18 02:59:49,914 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:59:54,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=27066.666666666668, ans=0.1 2023-11-18 02:59:57,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=27066.666666666668, ans=0.125 2023-11-18 03:00:06,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.85 vs. limit=22.5 2023-11-18 03:00:19,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.41 vs. limit=22.5 2023-11-18 03:00:20,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=27200.0, ans=0.125 2023-11-18 03:00:30,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=27266.666666666668, ans=0.0 2023-11-18 03:00:33,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.19 vs. limit=12.0 2023-11-18 03:00:41,271 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4100, loss[loss=0.1376, simple_loss=0.1345, pruned_loss=0.05586, audio_tagging_loss=0.01454, over 15283.00 frames. ], tot_loss[loss=0.1674, simple_loss=0.16, pruned_loss=0.07299, audio_tagging_loss=0.01442, over 3044413.98 frames. ], batch size: 57, lr: 4.22e-02, grad_scale: 64.0 2023-11-18 03:00:45,567 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 1.139e+02 1.299e+02 1.567e+02 2.247e+02, threshold=2.597e+02, percent-clipped=0.0 2023-11-18 03:00:46,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=27333.333333333332, ans=0.0049275362318840586 2023-11-18 03:00:51,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=27400.0, ans=0.125 2023-11-18 03:01:04,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=27466.666666666668, ans=0.125 2023-11-18 03:01:04,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=27466.666666666668, ans=0.004898550724637681 2023-11-18 03:01:07,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=27466.666666666668, ans=0.04949747468305833 2023-11-18 03:01:26,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.61 vs. limit=22.5 2023-11-18 03:01:28,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=27600.0, ans=0.1 2023-11-18 03:01:33,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=27600.0, ans=0.004869565217391305 2023-11-18 03:01:34,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=12.0 2023-11-18 03:01:35,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=27600.0, ans=0.004869565217391305 2023-11-18 03:01:38,128 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4150, loss[loss=0.1325, simple_loss=0.1247, pruned_loss=0.05452, audio_tagging_loss=0.01558, over 13828.00 frames. ], tot_loss[loss=0.167, simple_loss=0.16, pruned_loss=0.07282, audio_tagging_loss=0.01416, over 3038526.75 frames. ], batch size: 55, lr: 4.21e-02, grad_scale: 64.0 2023-11-18 03:01:40,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=27666.666666666668, ans=0.125 2023-11-18 03:01:47,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.12 vs. limit=22.5 2023-11-18 03:01:49,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=27733.333333333332, ans=0.0 2023-11-18 03:01:50,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=27733.333333333332, ans=0.125 2023-11-18 03:01:51,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.45 vs. limit=15.0 2023-11-18 03:01:53,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.30 vs. limit=12.0 2023-11-18 03:02:02,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2023-11-18 03:02:04,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=27800.0, ans=0.125 2023-11-18 03:02:19,677 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:02:31,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=27933.333333333332, ans=0.2 2023-11-18 03:02:34,608 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4200, loss[loss=0.1827, simple_loss=0.1713, pruned_loss=0.08257, audio_tagging_loss=0.0145, over 15368.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.1583, pruned_loss=0.0716, audio_tagging_loss=0.01389, over 3037922.27 frames. ], batch size: 58, lr: 4.20e-02, grad_scale: 64.0 2023-11-18 03:02:38,899 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.536e+01 1.064e+02 1.276e+02 1.442e+02 2.964e+02, threshold=2.551e+02, percent-clipped=1.0 2023-11-18 03:03:13,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=28200.0, ans=0.2 2023-11-18 03:03:14,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=28200.0, ans=0.125 2023-11-18 03:03:19,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=28266.666666666668, ans=0.125 2023-11-18 03:03:32,466 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4250, loss[loss=0.182, simple_loss=0.1795, pruned_loss=0.08233, audio_tagging_loss=0.009954, over 15104.00 frames. ], tot_loss[loss=0.1641, simple_loss=0.1581, pruned_loss=0.0713, audio_tagging_loss=0.01378, over 3031615.21 frames. ], batch size: 55, lr: 4.20e-02, grad_scale: 64.0 2023-11-18 03:03:41,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=28333.333333333332, ans=15.0 2023-11-18 03:03:53,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=28466.666666666668, ans=0.125 2023-11-18 03:03:53,962 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=18.73 vs. limit=15.0 2023-11-18 03:03:56,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=28466.666666666668, ans=0.2 2023-11-18 03:04:00,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=12.0 2023-11-18 03:04:08,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=28533.333333333332, ans=0.125 2023-11-18 03:04:08,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=28533.333333333332, ans=0.0 2023-11-18 03:04:27,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=28666.666666666668, ans=0.00463768115942029 2023-11-18 03:04:28,446 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4300, loss[loss=0.1624, simple_loss=0.1562, pruned_loss=0.07146, audio_tagging_loss=0.01281, over 15018.00 frames. ], tot_loss[loss=0.1652, simple_loss=0.1595, pruned_loss=0.07168, audio_tagging_loss=0.01374, over 3040682.93 frames. ], batch size: 56, lr: 4.19e-02, grad_scale: 64.0 2023-11-18 03:04:29,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=28666.666666666668, ans=0.2 2023-11-18 03:04:31,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=28666.666666666668, ans=0.125 2023-11-18 03:04:32,714 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 1.089e+02 1.255e+02 1.443e+02 2.387e+02, threshold=2.510e+02, percent-clipped=0.0 2023-11-18 03:04:41,551 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.78 vs. limit=22.5 2023-11-18 03:04:44,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=28733.333333333332, ans=0.125 2023-11-18 03:04:53,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=28800.0, ans=0.2 2023-11-18 03:04:56,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=19.48 vs. limit=15.0 2023-11-18 03:04:59,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=28800.0, ans=0.125 2023-11-18 03:05:25,325 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4350, loss[loss=0.135, simple_loss=0.1282, pruned_loss=0.05767, audio_tagging_loss=0.01319, over 14616.00 frames. ], tot_loss[loss=0.1655, simple_loss=0.1602, pruned_loss=0.07184, audio_tagging_loss=0.01357, over 3041694.41 frames. ], batch size: 54, lr: 4.19e-02, grad_scale: 64.0 2023-11-18 03:05:37,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2023-11-18 03:05:47,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=29133.333333333332, ans=0.07 2023-11-18 03:05:51,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=29133.333333333332, ans=0.1 2023-11-18 03:05:52,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=29133.333333333332, ans=0.2 2023-11-18 03:05:56,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=29133.333333333332, ans=0.2 2023-11-18 03:06:02,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=29200.0, ans=0.125 2023-11-18 03:06:07,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=29200.0, ans=0.125 2023-11-18 03:06:15,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=29266.666666666668, ans=0.004507246376811594 2023-11-18 03:06:22,947 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4400, loss[loss=0.1576, simple_loss=0.151, pruned_loss=0.0698, audio_tagging_loss=0.01233, over 15249.00 frames. ], tot_loss[loss=0.1644, simple_loss=0.159, pruned_loss=0.07125, audio_tagging_loss=0.01369, over 3043815.33 frames. ], batch size: 54, lr: 4.18e-02, grad_scale: 64.0 2023-11-18 03:06:27,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.425e+01 1.162e+02 1.302e+02 1.640e+02 3.175e+02, threshold=2.603e+02, percent-clipped=6.0 2023-11-18 03:06:50,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.61 vs. limit=22.5 2023-11-18 03:07:19,270 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4450, loss[loss=0.1905, simple_loss=0.1868, pruned_loss=0.08669, audio_tagging_loss=0.01035, over 14996.00 frames. ], tot_loss[loss=0.1643, simple_loss=0.1591, pruned_loss=0.0712, audio_tagging_loss=0.01352, over 3052039.24 frames. ], batch size: 55, lr: 4.17e-02, grad_scale: 64.0 2023-11-18 03:07:49,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=29800.0, ans=0.125 2023-11-18 03:07:52,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=29800.0, ans=0.125 2023-11-18 03:07:56,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=29866.666666666668, ans=0.05 2023-11-18 03:08:02,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=29866.666666666668, ans=0.004376811594202898 2023-11-18 03:08:03,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=12.0 2023-11-18 03:08:06,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=29933.333333333332, ans=0.0 2023-11-18 03:08:15,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.46 vs. limit=22.5 2023-11-18 03:08:15,492 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4500, loss[loss=0.1333, simple_loss=0.1209, pruned_loss=0.05805, audio_tagging_loss=0.01478, over 15654.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.1568, pruned_loss=0.07018, audio_tagging_loss=0.01361, over 3046995.35 frames. ], batch size: 60, lr: 4.17e-02, grad_scale: 64.0 2023-11-18 03:08:20,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 1.091e+02 1.301e+02 1.544e+02 2.749e+02, threshold=2.602e+02, percent-clipped=1.0 2023-11-18 03:08:31,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=30066.666666666668, ans=0.1 2023-11-18 03:08:31,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=30066.666666666668, ans=0.07 2023-11-18 03:08:33,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=30066.666666666668, ans=0.125 2023-11-18 03:08:35,211 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.457e+01 2023-11-18 03:08:44,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=30133.333333333332, ans=0.1 2023-11-18 03:09:05,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=30266.666666666668, ans=0.1 2023-11-18 03:09:11,559 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2023-11-18 03:09:13,090 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4550, loss[loss=0.2077, simple_loss=0.2005, pruned_loss=0.09618, audio_tagging_loss=0.01129, over 15969.00 frames. ], tot_loss[loss=0.1631, simple_loss=0.1577, pruned_loss=0.07069, audio_tagging_loss=0.01353, over 3046742.55 frames. ], batch size: 56, lr: 4.16e-02, grad_scale: 64.0 2023-11-18 03:09:13,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=30333.333333333332, ans=0.125 2023-11-18 03:09:24,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.55 vs. limit=15.0 2023-11-18 03:09:26,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=30400.0, ans=0.125 2023-11-18 03:09:31,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=30400.0, ans=0.125 2023-11-18 03:09:43,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=30466.666666666668, ans=0.125 2023-11-18 03:09:57,980 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:10:10,254 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4600, loss[loss=0.1341, simple_loss=0.1308, pruned_loss=0.05496, audio_tagging_loss=0.01373, over 16363.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.1563, pruned_loss=0.06989, audio_tagging_loss=0.01376, over 3053466.12 frames. ], batch size: 61, lr: 4.15e-02, grad_scale: 64.0 2023-11-18 03:10:14,500 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 1.069e+02 1.267e+02 1.546e+02 2.795e+02, threshold=2.534e+02, percent-clipped=1.0 2023-11-18 03:10:14,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=30666.666666666668, ans=0.004202898550724638 2023-11-18 03:10:28,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=30733.333333333332, ans=0.125 2023-11-18 03:10:34,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.12 vs. limit=22.5 2023-11-18 03:10:55,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=30933.333333333332, ans=0.004144927536231885 2023-11-18 03:11:01,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=30933.333333333332, ans=0.2 2023-11-18 03:11:03,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2023-11-18 03:11:05,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=31000.0, ans=0.125 2023-11-18 03:11:06,017 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4650, loss[loss=0.1359, simple_loss=0.1384, pruned_loss=0.05132, audio_tagging_loss=0.01543, over 13787.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.1552, pruned_loss=0.0695, audio_tagging_loss=0.014, over 3044503.23 frames. ], batch size: 53, lr: 4.15e-02, grad_scale: 64.0 2023-11-18 03:11:13,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=31000.0, ans=0.07 2023-11-18 03:11:15,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=31000.0, ans=0.004130434782608696 2023-11-18 03:11:27,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=31066.666666666668, ans=0.004115942028985507 2023-11-18 03:11:44,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=31200.0, ans=0.00408695652173913 2023-11-18 03:12:00,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=31266.666666666668, ans=0.004072463768115942 2023-11-18 03:12:02,408 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4700, loss[loss=0.1471, simple_loss=0.1322, pruned_loss=0.06603, audio_tagging_loss=0.01493, over 14259.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.1556, pruned_loss=0.06948, audio_tagging_loss=0.01422, over 3041912.10 frames. ], batch size: 55, lr: 4.14e-02, grad_scale: 64.0 2023-11-18 03:12:07,957 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.482e+01 1.109e+02 1.199e+02 1.387e+02 2.796e+02, threshold=2.398e+02, percent-clipped=1.0 2023-11-18 03:12:15,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=31400.0, ans=0.125 2023-11-18 03:12:48,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31600.0, ans=0.1 2023-11-18 03:12:59,579 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4750, loss[loss=0.1246, simple_loss=0.1238, pruned_loss=0.04763, audio_tagging_loss=0.01504, over 15182.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.157, pruned_loss=0.06977, audio_tagging_loss=0.01411, over 3037104.64 frames. ], batch size: 59, lr: 4.14e-02, grad_scale: 64.0 2023-11-18 03:13:07,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=31666.666666666668, ans=0.125 2023-11-18 03:13:09,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.69 vs. limit=6.0 2023-11-18 03:13:15,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31733.333333333332, ans=0.1 2023-11-18 03:13:20,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=31800.0, ans=0.125 2023-11-18 03:13:31,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2023-11-18 03:13:35,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2023-11-18 03:13:37,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=31866.666666666668, ans=0.125 2023-11-18 03:13:43,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=31933.333333333332, ans=0.05 2023-11-18 03:13:45,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.42 vs. limit=22.5 2023-11-18 03:13:55,729 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4800, loss[loss=0.1985, simple_loss=0.194, pruned_loss=0.08766, audio_tagging_loss=0.01383, over 14256.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.1564, pruned_loss=0.06953, audio_tagging_loss=0.0142, over 3046451.26 frames. ], batch size: 55, lr: 4.13e-02, grad_scale: 64.0 2023-11-18 03:13:55,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=32000.0, ans=0.125 2023-11-18 03:13:59,888 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 1.059e+02 1.265e+02 1.558e+02 2.176e+02, threshold=2.529e+02, percent-clipped=0.0 2023-11-18 03:14:29,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=32200.0, ans=0.1 2023-11-18 03:14:39,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=32266.666666666668, ans=0.125 2023-11-18 03:14:50,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=32266.666666666668, ans=15.0 2023-11-18 03:14:51,910 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4850, loss[loss=0.1692, simple_loss=0.1644, pruned_loss=0.07521, audio_tagging_loss=0.01179, over 15394.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.1565, pruned_loss=0.06962, audio_tagging_loss=0.01434, over 3046855.04 frames. ], batch size: 56, lr: 4.12e-02, grad_scale: 64.0 2023-11-18 03:15:13,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=32466.666666666668, ans=0.2 2023-11-18 03:15:20,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=32466.666666666668, ans=0.0 2023-11-18 03:15:22,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=32466.666666666668, ans=12.0 2023-11-18 03:15:29,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=32533.333333333332, ans=0.125 2023-11-18 03:15:34,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=32533.333333333332, ans=0.0 2023-11-18 03:15:40,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=12.0 2023-11-18 03:15:48,646 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4900, loss[loss=0.139, simple_loss=0.1313, pruned_loss=0.05609, audio_tagging_loss=0.01723, over 14882.00 frames. ], tot_loss[loss=0.1619, simple_loss=0.1567, pruned_loss=0.06939, audio_tagging_loss=0.01411, over 3040756.88 frames. ], batch size: 58, lr: 4.12e-02, grad_scale: 64.0 2023-11-18 03:15:51,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=32666.666666666668, ans=0.125 2023-11-18 03:15:52,884 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.066e+01 1.045e+02 1.197e+02 1.386e+02 2.012e+02, threshold=2.394e+02, percent-clipped=0.0 2023-11-18 03:16:18,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=32800.0, ans=0.0037391304347826086 2023-11-18 03:16:19,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=32800.0, ans=0.0037391304347826086 2023-11-18 03:16:26,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.53 vs. limit=22.5 2023-11-18 03:16:41,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=32933.333333333336, ans=0.0 2023-11-18 03:16:43,796 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 4950, loss[loss=0.1898, simple_loss=0.1905, pruned_loss=0.0804, audio_tagging_loss=0.01413, over 15861.00 frames. ], tot_loss[loss=0.1616, simple_loss=0.1566, pruned_loss=0.06942, audio_tagging_loss=0.01383, over 3046052.17 frames. ], batch size: 60, lr: 4.11e-02, grad_scale: 64.0 2023-11-18 03:16:45,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=33000.0, ans=0.125 2023-11-18 03:17:29,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.88 vs. limit=12.0 2023-11-18 03:17:40,578 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5000, loss[loss=0.1932, simple_loss=0.2026, pruned_loss=0.0838, audio_tagging_loss=0.008072, over 15837.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.1565, pruned_loss=0.06936, audio_tagging_loss=0.01362, over 3042621.81 frames. ], batch size: 57, lr: 4.10e-02, grad_scale: 64.0 2023-11-18 03:17:45,397 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 1.071e+02 1.252e+02 1.412e+02 1.907e+02, threshold=2.505e+02, percent-clipped=0.0 2023-11-18 03:17:46,085 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.60 vs. limit=22.5 2023-11-18 03:18:00,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33400.0, ans=0.1 2023-11-18 03:18:11,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.32 vs. limit=22.5 2023-11-18 03:18:38,050 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5050, loss[loss=0.1515, simple_loss=0.149, pruned_loss=0.06616, audio_tagging_loss=0.01085, over 16628.00 frames. ], tot_loss[loss=0.161, simple_loss=0.1563, pruned_loss=0.06929, audio_tagging_loss=0.01356, over 3047289.13 frames. ], batch size: 62, lr: 4.10e-02, grad_scale: 64.0 2023-11-18 03:18:39,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=33666.666666666664, ans=0.07 2023-11-18 03:18:47,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=33733.333333333336, ans=0.125 2023-11-18 03:18:56,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=33733.333333333336, ans=0.125 2023-11-18 03:18:56,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=33733.333333333336, ans=0.0 2023-11-18 03:18:58,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=33800.0, ans=0.003521739130434783 2023-11-18 03:19:10,233 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.861e+00 2023-11-18 03:19:17,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=33866.666666666664, ans=0.125 2023-11-18 03:19:33,396 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5100, loss[loss=0.1323, simple_loss=0.1243, pruned_loss=0.05409, audio_tagging_loss=0.01611, over 16454.00 frames. ], tot_loss[loss=0.1594, simple_loss=0.1548, pruned_loss=0.06838, audio_tagging_loss=0.01362, over 3037597.44 frames. ], batch size: 65, lr: 4.09e-02, grad_scale: 64.0 2023-11-18 03:19:37,618 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 1.065e+02 1.271e+02 1.460e+02 2.434e+02, threshold=2.541e+02, percent-clipped=0.0 2023-11-18 03:19:40,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.77 vs. limit=15.0 2023-11-18 03:20:05,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=34133.333333333336, ans=0.02 2023-11-18 03:20:17,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=34266.666666666664, ans=0.125 2023-11-18 03:20:29,385 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5150, loss[loss=0.2475, simple_loss=0.2433, pruned_loss=0.1113, audio_tagging_loss=0.01455, over 15007.00 frames. ], tot_loss[loss=0.1589, simple_loss=0.1544, pruned_loss=0.06803, audio_tagging_loss=0.01363, over 3033408.87 frames. ], batch size: 55, lr: 4.09e-02, grad_scale: 64.0 2023-11-18 03:20:30,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.62 vs. limit=22.5 2023-11-18 03:20:34,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.75 vs. limit=22.5 2023-11-18 03:20:42,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=34400.0, ans=0.0 2023-11-18 03:20:55,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=34466.666666666664, ans=0.2 2023-11-18 03:20:55,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=34466.666666666664, ans=0.125 2023-11-18 03:21:26,263 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5200, loss[loss=0.1818, simple_loss=0.1831, pruned_loss=0.08087, audio_tagging_loss=0.009392, over 15485.00 frames. ], tot_loss[loss=0.1578, simple_loss=0.1534, pruned_loss=0.0674, audio_tagging_loss=0.01368, over 3043816.54 frames. ], batch size: 58, lr: 4.08e-02, grad_scale: 64.0 2023-11-18 03:21:30,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.892e+01 1.044e+02 1.171e+02 1.375e+02 2.529e+02, threshold=2.342e+02, percent-clipped=0.0 2023-11-18 03:21:34,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=34666.666666666664, ans=0.015 2023-11-18 03:21:34,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=34666.666666666664, ans=0.125 2023-11-18 03:22:05,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=34866.666666666664, ans=0.0 2023-11-18 03:22:11,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=34933.333333333336, ans=15.0 2023-11-18 03:22:22,079 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5250, loss[loss=0.1274, simple_loss=0.1213, pruned_loss=0.05275, audio_tagging_loss=0.01397, over 15657.00 frames. ], tot_loss[loss=0.1584, simple_loss=0.1543, pruned_loss=0.06765, audio_tagging_loss=0.01362, over 3041525.37 frames. ], batch size: 61, lr: 4.07e-02, grad_scale: 64.0 2023-11-18 03:22:26,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.07 vs. limit=15.0 2023-11-18 03:22:36,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=35066.666666666664, ans=0.125 2023-11-18 03:22:48,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=35133.333333333336, ans=0.125 2023-11-18 03:23:13,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=35266.666666666664, ans=0.1 2023-11-18 03:23:18,038 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5300, loss[loss=0.2002, simple_loss=0.203, pruned_loss=0.08738, audio_tagging_loss=0.01133, over 15581.00 frames. ], tot_loss[loss=0.1589, simple_loss=0.1549, pruned_loss=0.06786, audio_tagging_loss=0.01359, over 3039382.98 frames. ], batch size: 58, lr: 4.07e-02, grad_scale: 64.0 2023-11-18 03:23:22,279 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.314e+01 1.054e+02 1.180e+02 1.432e+02 2.621e+02, threshold=2.360e+02, percent-clipped=2.0 2023-11-18 03:23:38,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=35400.0, ans=0.125 2023-11-18 03:23:53,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=35533.333333333336, ans=0.0 2023-11-18 03:23:57,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2023-11-18 03:24:14,619 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5350, loss[loss=0.1102, simple_loss=0.09812, pruned_loss=0.04407, audio_tagging_loss=0.01703, over 15240.00 frames. ], tot_loss[loss=0.1596, simple_loss=0.1557, pruned_loss=0.06828, audio_tagging_loss=0.01348, over 3036047.37 frames. ], batch size: 58, lr: 4.06e-02, grad_scale: 64.0 2023-11-18 03:24:19,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=35666.666666666664, ans=0.0 2023-11-18 03:24:19,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=35666.666666666664, ans=0.1 2023-11-18 03:24:31,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=35733.333333333336, ans=0.2 2023-11-18 03:24:33,782 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2023-11-18 03:24:48,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.53 vs. limit=10.0 2023-11-18 03:24:52,480 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:24:56,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=35866.666666666664, ans=0.0030724637681159433 2023-11-18 03:25:03,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=35933.333333333336, ans=0.003057971014492753 2023-11-18 03:25:07,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=35933.333333333336, ans=0.05 2023-11-18 03:25:10,841 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5400, loss[loss=0.2053, simple_loss=0.2115, pruned_loss=0.08861, audio_tagging_loss=0.01091, over 14686.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.1574, pruned_loss=0.0688, audio_tagging_loss=0.01359, over 3040628.56 frames. ], batch size: 52, lr: 4.05e-02, grad_scale: 64.0 2023-11-18 03:25:11,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2023-11-18 03:25:15,058 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.377e+01 1.086e+02 1.314e+02 1.571e+02 2.162e+02, threshold=2.627e+02, percent-clipped=0.0 2023-11-18 03:25:21,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=36066.666666666664, ans=0.125 2023-11-18 03:25:52,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.47 vs. limit=15.0 2023-11-18 03:25:55,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=12.0 2023-11-18 03:26:01,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=36266.666666666664, ans=0.1 2023-11-18 03:26:06,207 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5450, loss[loss=0.1285, simple_loss=0.1182, pruned_loss=0.05189, audio_tagging_loss=0.01754, over 15014.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.1573, pruned_loss=0.06885, audio_tagging_loss=0.01376, over 3042034.79 frames. ], batch size: 56, lr: 4.05e-02, grad_scale: 64.0 2023-11-18 03:26:12,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=36333.333333333336, ans=0.1 2023-11-18 03:26:15,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=36333.333333333336, ans=0.125 2023-11-18 03:26:21,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=36400.0, ans=0.002956521739130435 2023-11-18 03:26:28,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=36466.666666666664, ans=0.125 2023-11-18 03:26:30,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=36466.666666666664, ans=0.125 2023-11-18 03:26:36,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=36466.666666666664, ans=0.2 2023-11-18 03:26:41,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.29 vs. limit=6.0 2023-11-18 03:26:44,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=36533.333333333336, ans=0.0 2023-11-18 03:26:54,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=36600.0, ans=15.0 2023-11-18 03:27:03,294 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5500, loss[loss=0.1397, simple_loss=0.1304, pruned_loss=0.05888, audio_tagging_loss=0.01566, over 15488.00 frames. ], tot_loss[loss=0.1601, simple_loss=0.1564, pruned_loss=0.06817, audio_tagging_loss=0.01375, over 3040078.24 frames. ], batch size: 58, lr: 4.04e-02, grad_scale: 64.0 2023-11-18 03:27:06,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=36666.666666666664, ans=0.2 2023-11-18 03:27:07,489 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.157e+01 1.024e+02 1.184e+02 1.343e+02 1.900e+02, threshold=2.368e+02, percent-clipped=0.0 2023-11-18 03:27:32,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=36800.0, ans=0.1 2023-11-18 03:27:35,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=36866.666666666664, ans=0.1 2023-11-18 03:27:58,587 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5550, loss[loss=0.1561, simple_loss=0.1525, pruned_loss=0.06624, audio_tagging_loss=0.01359, over 15514.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.1577, pruned_loss=0.06859, audio_tagging_loss=0.01363, over 3046442.04 frames. ], batch size: 56, lr: 4.03e-02, grad_scale: 64.0 2023-11-18 03:27:58,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=37000.0, ans=0.0 2023-11-18 03:28:15,385 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.83 vs. limit=6.0 2023-11-18 03:28:35,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=37200.0, ans=0.0 2023-11-18 03:28:43,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.09 vs. limit=22.5 2023-11-18 03:28:49,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=37266.666666666664, ans=0.0 2023-11-18 03:28:54,751 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5600, loss[loss=0.1362, simple_loss=0.1269, pruned_loss=0.05824, audio_tagging_loss=0.01449, over 15104.00 frames. ], tot_loss[loss=0.161, simple_loss=0.1577, pruned_loss=0.06844, audio_tagging_loss=0.01378, over 3047093.96 frames. ], batch size: 58, lr: 4.03e-02, grad_scale: 64.0 2023-11-18 03:28:58,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=37333.333333333336, ans=0.125 2023-11-18 03:28:59,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.818e+01 1.034e+02 1.195e+02 1.444e+02 2.133e+02, threshold=2.390e+02, percent-clipped=0.0 2023-11-18 03:29:00,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=30.32 vs. limit=22.5 2023-11-18 03:29:12,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.32 vs. limit=6.0 2023-11-18 03:29:23,546 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2023-11-18 03:29:31,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=37533.333333333336, ans=0.125 2023-11-18 03:29:34,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=37533.333333333336, ans=0.0 2023-11-18 03:29:35,254 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:29:36,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.30 vs. limit=10.0 2023-11-18 03:29:51,785 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5650, loss[loss=0.1337, simple_loss=0.1295, pruned_loss=0.05255, audio_tagging_loss=0.01639, over 15260.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.1591, pruned_loss=0.06911, audio_tagging_loss=0.01392, over 3053636.53 frames. ], batch size: 58, lr: 4.02e-02, grad_scale: 128.0 2023-11-18 03:30:04,748 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:30:20,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=37800.0, ans=0.0026521739130434784 2023-11-18 03:30:39,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=37933.333333333336, ans=0.002623188405797101 2023-11-18 03:30:45,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37933.333333333336, ans=0.1 2023-11-18 03:30:47,183 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5700, loss[loss=0.1114, simple_loss=0.1047, pruned_loss=0.04728, audio_tagging_loss=0.0118, over 14937.00 frames. ], tot_loss[loss=0.1605, simple_loss=0.1572, pruned_loss=0.06807, audio_tagging_loss=0.01381, over 3054668.02 frames. ], batch size: 56, lr: 4.02e-02, grad_scale: 64.0 2023-11-18 03:30:50,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=38000.0, ans=0.125 2023-11-18 03:30:52,374 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.432e+01 1.093e+02 1.259e+02 1.491e+02 2.385e+02, threshold=2.519e+02, percent-clipped=0.0 2023-11-18 03:30:54,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=38000.0, ans=0.1 2023-11-18 03:31:00,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.61 vs. limit=22.5 2023-11-18 03:31:11,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2023-11-18 03:31:17,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=38133.333333333336, ans=0.125 2023-11-18 03:31:21,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2023-11-18 03:31:23,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=15.0 2023-11-18 03:31:25,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=38200.0, ans=0.125 2023-11-18 03:31:27,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=38200.0, ans=0.125 2023-11-18 03:31:42,340 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5750, loss[loss=0.1521, simple_loss=0.1489, pruned_loss=0.064, audio_tagging_loss=0.01367, over 15824.00 frames. ], tot_loss[loss=0.16, simple_loss=0.1569, pruned_loss=0.06799, audio_tagging_loss=0.0136, over 3047643.04 frames. ], batch size: 57, lr: 4.01e-02, grad_scale: 32.0 2023-11-18 03:31:45,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2023-11-18 03:31:57,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.34 vs. limit=15.0 2023-11-18 03:32:02,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=38400.0, ans=0.09899494936611666 2023-11-18 03:32:04,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=38400.0, ans=12.0 2023-11-18 03:32:29,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=38600.0, ans=0.125 2023-11-18 03:32:39,948 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5800, loss[loss=0.1163, simple_loss=0.1026, pruned_loss=0.05222, audio_tagging_loss=0.01281, over 14904.00 frames. ], tot_loss[loss=0.1589, simple_loss=0.1558, pruned_loss=0.06748, audio_tagging_loss=0.01354, over 3052633.63 frames. ], batch size: 57, lr: 4.00e-02, grad_scale: 32.0 2023-11-18 03:32:46,931 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 1.065e+02 1.200e+02 1.362e+02 2.023e+02, threshold=2.399e+02, percent-clipped=0.0 2023-11-18 03:33:01,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=38800.0, ans=0.125 2023-11-18 03:33:07,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=38800.0, ans=22.5 2023-11-18 03:33:19,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.80 vs. limit=15.0 2023-11-18 03:33:29,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=38933.333333333336, ans=0.2 2023-11-18 03:33:33,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=38933.333333333336, ans=0.2 2023-11-18 03:33:36,005 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5850, loss[loss=0.1619, simple_loss=0.166, pruned_loss=0.06572, audio_tagging_loss=0.01316, over 15292.00 frames. ], tot_loss[loss=0.1577, simple_loss=0.1547, pruned_loss=0.06684, audio_tagging_loss=0.01353, over 3050367.75 frames. ], batch size: 58, lr: 4.00e-02, grad_scale: 32.0 2023-11-18 03:33:37,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=39000.0, ans=0.0 2023-11-18 03:33:37,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=39000.0, ans=0.1 2023-11-18 03:33:51,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=39066.666666666664, ans=0.1 2023-11-18 03:34:14,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=39200.0, ans=0.125 2023-11-18 03:34:18,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.99 vs. limit=22.5 2023-11-18 03:34:28,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2023-11-18 03:34:31,913 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5900, loss[loss=0.1789, simple_loss=0.1911, pruned_loss=0.07254, audio_tagging_loss=0.0108, over 15723.00 frames. ], tot_loss[loss=0.1568, simple_loss=0.1539, pruned_loss=0.06639, audio_tagging_loss=0.01349, over 3050638.17 frames. ], batch size: 57, lr: 3.99e-02, grad_scale: 32.0 2023-11-18 03:34:38,797 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.189e+01 1.114e+02 1.332e+02 1.512e+02 2.705e+02, threshold=2.665e+02, percent-clipped=2.0 2023-11-18 03:34:40,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=39333.333333333336, ans=0.125 2023-11-18 03:34:54,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=39466.666666666664, ans=0.0 2023-11-18 03:34:57,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=39466.666666666664, ans=0.1 2023-11-18 03:35:20,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=39600.0, ans=0.1 2023-11-18 03:35:21,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.52 vs. limit=22.5 2023-11-18 03:35:28,897 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 5950, loss[loss=0.1899, simple_loss=0.1893, pruned_loss=0.08442, audio_tagging_loss=0.01084, over 16010.00 frames. ], tot_loss[loss=0.1564, simple_loss=0.1538, pruned_loss=0.066, audio_tagging_loss=0.01348, over 3058566.80 frames. ], batch size: 57, lr: 3.98e-02, grad_scale: 32.0 2023-11-18 03:35:29,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=39666.666666666664, ans=0.04949747468305833 2023-11-18 03:35:55,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=39800.0, ans=0.5 2023-11-18 03:35:55,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-11-18 03:35:59,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=39800.0, ans=0.1 2023-11-18 03:36:14,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=39933.333333333336, ans=0.125 2023-11-18 03:36:21,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=39933.333333333336, ans=0.0 2023-11-18 03:36:24,714 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6000, loss[loss=0.1677, simple_loss=0.1735, pruned_loss=0.0701, audio_tagging_loss=0.01091, over 15413.00 frames. ], tot_loss[loss=0.1568, simple_loss=0.1543, pruned_loss=0.06618, audio_tagging_loss=0.01341, over 3056494.72 frames. ], batch size: 57, lr: 3.98e-02, grad_scale: 32.0 2023-11-18 03:36:24,715 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 03:36:58,777 INFO [train_asr.py:1147] (1/4) Epoch 1, validation: loss=0.1009, simple_loss=0.07718, pruned_loss=0.02169, audio_tagging_loss=0.04066, over 4681554.00 frames. 2023-11-18 03:36:58,778 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 03:37:03,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40000.0, ans=0.1 2023-11-18 03:37:05,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 1.087e+02 1.275e+02 1.499e+02 2.354e+02, threshold=2.549e+02, percent-clipped=0.0 2023-11-18 03:37:31,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=40200.0, ans=0.0 2023-11-18 03:37:40,586 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:37:55,888 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6050, loss[loss=0.1779, simple_loss=0.1773, pruned_loss=0.08045, audio_tagging_loss=0.008788, over 14657.00 frames. ], tot_loss[loss=0.1558, simple_loss=0.1534, pruned_loss=0.06571, audio_tagging_loss=0.0134, over 3047874.33 frames. ], batch size: 55, lr: 3.97e-02, grad_scale: 32.0 2023-11-18 03:37:56,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=40333.333333333336, ans=0.0 2023-11-18 03:37:58,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=40333.333333333336, ans=0.0 2023-11-18 03:38:07,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=40400.0, ans=0.125 2023-11-18 03:38:51,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.24 vs. limit=6.0 2023-11-18 03:38:52,469 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6100, loss[loss=0.1493, simple_loss=0.1469, pruned_loss=0.06218, audio_tagging_loss=0.01363, over 15219.00 frames. ], tot_loss[loss=0.1563, simple_loss=0.1538, pruned_loss=0.06595, audio_tagging_loss=0.01349, over 3048756.71 frames. ], batch size: 59, lr: 3.96e-02, grad_scale: 32.0 2023-11-18 03:38:58,886 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.303e+01 1.093e+02 1.234e+02 1.511e+02 2.648e+02, threshold=2.468e+02, percent-clipped=3.0 2023-11-18 03:39:22,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=40800.0, ans=0.2 2023-11-18 03:39:28,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=40866.666666666664, ans=0.09899494936611666 2023-11-18 03:39:41,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=40933.333333333336, ans=0.0019710144927536227 2023-11-18 03:39:42,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=40933.333333333336, ans=0.125 2023-11-18 03:39:48,159 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6150, loss[loss=0.1622, simple_loss=0.1653, pruned_loss=0.06785, audio_tagging_loss=0.0117, over 16452.00 frames. ], tot_loss[loss=0.1557, simple_loss=0.1528, pruned_loss=0.06562, audio_tagging_loss=0.01367, over 3046923.57 frames. ], batch size: 58, lr: 3.96e-02, grad_scale: 32.0 2023-11-18 03:40:25,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=41200.0, ans=0.1 2023-11-18 03:40:29,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2023-11-18 03:40:30,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.26 vs. limit=22.5 2023-11-18 03:40:37,383 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.362e+00 2023-11-18 03:40:45,696 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6200, loss[loss=0.1936, simple_loss=0.1897, pruned_loss=0.08503, audio_tagging_loss=0.01371, over 16043.00 frames. ], tot_loss[loss=0.1549, simple_loss=0.152, pruned_loss=0.06514, audio_tagging_loss=0.01373, over 3040027.23 frames. ], batch size: 63, lr: 3.95e-02, grad_scale: 32.0 2023-11-18 03:40:53,126 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.054e+01 1.077e+02 1.264e+02 1.430e+02 2.412e+02, threshold=2.529e+02, percent-clipped=0.0 2023-11-18 03:41:00,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2023-11-18 03:41:03,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=41400.0, ans=0.0 2023-11-18 03:41:10,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.53 vs. limit=15.0 2023-11-18 03:41:27,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=41533.333333333336, ans=0.1 2023-11-18 03:41:31,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=41600.0, ans=0.125 2023-11-18 03:41:35,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=41600.0, ans=0.1 2023-11-18 03:41:43,096 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6250, loss[loss=0.1165, simple_loss=0.1023, pruned_loss=0.05032, audio_tagging_loss=0.01504, over 14169.00 frames. ], tot_loss[loss=0.1548, simple_loss=0.1515, pruned_loss=0.06513, audio_tagging_loss=0.01387, over 3039287.08 frames. ], batch size: 57, lr: 3.94e-02, grad_scale: 32.0 2023-11-18 03:41:55,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=41733.333333333336, ans=0.2 2023-11-18 03:42:06,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=41800.0, ans=0.0017826086956521745 2023-11-18 03:42:13,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=41800.0, ans=0.0 2023-11-18 03:42:22,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=41866.666666666664, ans=0.0 2023-11-18 03:42:28,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=41933.333333333336, ans=0.2 2023-11-18 03:42:39,082 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6300, loss[loss=0.1531, simple_loss=0.1503, pruned_loss=0.06299, audio_tagging_loss=0.01494, over 13884.00 frames. ], tot_loss[loss=0.1549, simple_loss=0.1518, pruned_loss=0.06509, audio_tagging_loss=0.01394, over 3042184.75 frames. ], batch size: 53, lr: 3.94e-02, grad_scale: 32.0 2023-11-18 03:42:41,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=42000.0, ans=0.125 2023-11-18 03:42:46,037 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.073e+01 1.061e+02 1.176e+02 1.388e+02 2.867e+02, threshold=2.352e+02, percent-clipped=1.0 2023-11-18 03:42:46,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=42000.0, ans=0.0 2023-11-18 03:43:02,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=42133.333333333336, ans=0.2 2023-11-18 03:43:27,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=42266.666666666664, ans=0.0 2023-11-18 03:43:36,584 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6350, loss[loss=0.1689, simple_loss=0.17, pruned_loss=0.07086, audio_tagging_loss=0.01305, over 15660.00 frames. ], tot_loss[loss=0.1544, simple_loss=0.1514, pruned_loss=0.06474, audio_tagging_loss=0.01393, over 3039550.92 frames. ], batch size: 58, lr: 3.93e-02, grad_scale: 32.0 2023-11-18 03:43:38,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=42333.333333333336, ans=0.2 2023-11-18 03:43:53,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=42400.0, ans=0.125 2023-11-18 03:44:17,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=42533.333333333336, ans=0.125 2023-11-18 03:44:26,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2023-11-18 03:44:34,100 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6400, loss[loss=0.1337, simple_loss=0.1207, pruned_loss=0.05326, audio_tagging_loss=0.02013, over 15084.00 frames. ], tot_loss[loss=0.1529, simple_loss=0.1494, pruned_loss=0.06409, audio_tagging_loss=0.01414, over 3042145.23 frames. ], batch size: 59, lr: 3.92e-02, grad_scale: 32.0 2023-11-18 03:44:40,524 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.367e+01 1.120e+02 1.287e+02 1.674e+02 2.598e+02, threshold=2.575e+02, percent-clipped=2.0 2023-11-18 03:44:45,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2023-11-18 03:44:56,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=12.0 2023-11-18 03:45:00,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=42800.0, ans=0.0 2023-11-18 03:45:05,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=42800.0, ans=0.125 2023-11-18 03:45:30,272 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6450, loss[loss=0.1846, simple_loss=0.1828, pruned_loss=0.08005, audio_tagging_loss=0.01309, over 15964.00 frames. ], tot_loss[loss=0.1535, simple_loss=0.1504, pruned_loss=0.06424, audio_tagging_loss=0.0141, over 3043892.88 frames. ], batch size: 57, lr: 3.92e-02, grad_scale: 32.0 2023-11-18 03:45:40,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=43066.666666666664, ans=0.125 2023-11-18 03:45:57,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2023-11-18 03:46:20,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=43266.666666666664, ans=0.0014637681159420293 2023-11-18 03:46:23,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=43266.666666666664, ans=0.125 2023-11-18 03:46:27,418 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6500, loss[loss=0.1258, simple_loss=0.1196, pruned_loss=0.05261, audio_tagging_loss=0.01336, over 16426.00 frames. ], tot_loss[loss=0.1527, simple_loss=0.1495, pruned_loss=0.06392, audio_tagging_loss=0.01406, over 3048952.27 frames. ], batch size: 62, lr: 3.91e-02, grad_scale: 32.0 2023-11-18 03:46:27,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=43333.333333333336, ans=0.125 2023-11-18 03:46:34,393 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 9.242e+01 1.070e+02 1.244e+02 1.503e+02 2.306e+02, threshold=2.488e+02, percent-clipped=0.0 2023-11-18 03:46:53,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2023-11-18 03:47:23,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=43666.666666666664, ans=0.0 2023-11-18 03:47:24,041 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6550, loss[loss=0.1219, simple_loss=0.125, pruned_loss=0.04646, audio_tagging_loss=0.01293, over 15683.00 frames. ], tot_loss[loss=0.1536, simple_loss=0.1507, pruned_loss=0.06447, audio_tagging_loss=0.01374, over 3043649.13 frames. ], batch size: 61, lr: 3.91e-02, grad_scale: 32.0 2023-11-18 03:47:27,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=43666.666666666664, ans=0.1 2023-11-18 03:47:45,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.54 vs. limit=15.0 2023-11-18 03:47:59,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.99 vs. limit=6.0 2023-11-18 03:48:06,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.19 vs. limit=10.0 2023-11-18 03:48:20,978 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6600, loss[loss=0.1277, simple_loss=0.1317, pruned_loss=0.04678, audio_tagging_loss=0.01506, over 14280.00 frames. ], tot_loss[loss=0.153, simple_loss=0.1508, pruned_loss=0.06401, audio_tagging_loss=0.01356, over 3043643.74 frames. ], batch size: 55, lr: 3.90e-02, grad_scale: 32.0 2023-11-18 03:48:27,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=44000.0, ans=0.125 2023-11-18 03:48:28,032 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.463e+01 1.068e+02 1.215e+02 1.424e+02 2.055e+02, threshold=2.430e+02, percent-clipped=0.0 2023-11-18 03:48:33,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=44066.666666666664, ans=0.125 2023-11-18 03:48:34,679 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.655e+00 2023-11-18 03:48:46,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=44133.333333333336, ans=0.0012753623188405793 2023-11-18 03:49:00,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=44200.0, ans=0.5 2023-11-18 03:49:02,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=44200.0, ans=0.07 2023-11-18 03:49:03,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=44200.0, ans=0.2 2023-11-18 03:49:04,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=44200.0, ans=0.125 2023-11-18 03:49:17,877 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6650, loss[loss=0.1702, simple_loss=0.1772, pruned_loss=0.06943, audio_tagging_loss=0.01213, over 15491.00 frames. ], tot_loss[loss=0.1537, simple_loss=0.1517, pruned_loss=0.06448, audio_tagging_loss=0.0134, over 3053698.54 frames. ], batch size: 55, lr: 3.89e-02, grad_scale: 32.0 2023-11-18 03:49:27,991 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:49:30,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=44400.0, ans=0.0 2023-11-18 03:50:15,178 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6700, loss[loss=0.1539, simple_loss=0.1656, pruned_loss=0.0587, audio_tagging_loss=0.01243, over 15189.00 frames. ], tot_loss[loss=0.1549, simple_loss=0.1535, pruned_loss=0.06497, audio_tagging_loss=0.01322, over 3054293.68 frames. ], batch size: 55, lr: 3.89e-02, grad_scale: 32.0 2023-11-18 03:50:21,688 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.129e+01 1.020e+02 1.157e+02 1.284e+02 2.181e+02, threshold=2.314e+02, percent-clipped=0.0 2023-11-18 03:50:28,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=44733.333333333336, ans=0.1 2023-11-18 03:50:52,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.84 vs. limit=15.0 2023-11-18 03:51:11,202 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6750, loss[loss=0.1438, simple_loss=0.143, pruned_loss=0.05845, audio_tagging_loss=0.01387, over 15048.00 frames. ], tot_loss[loss=0.1543, simple_loss=0.1527, pruned_loss=0.06468, audio_tagging_loss=0.01325, over 3048468.67 frames. ], batch size: 59, lr: 3.88e-02, grad_scale: 32.0 2023-11-18 03:51:12,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2023-11-18 03:51:29,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=45066.666666666664, ans=0.0010724637681159433 2023-11-18 03:51:30,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=45066.666666666664, ans=0.0 2023-11-18 03:51:32,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=45066.666666666664, ans=0.125 2023-11-18 03:51:33,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45133.333333333336, ans=0.1 2023-11-18 03:51:36,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=45133.333333333336, ans=0.125 2023-11-18 03:52:08,435 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6800, loss[loss=0.1845, simple_loss=0.1785, pruned_loss=0.08298, audio_tagging_loss=0.01226, over 14583.00 frames. ], tot_loss[loss=0.1543, simple_loss=0.1528, pruned_loss=0.06461, audio_tagging_loss=0.0133, over 3046990.90 frames. ], batch size: 54, lr: 3.87e-02, grad_scale: 32.0 2023-11-18 03:52:13,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.11 vs. limit=15.0 2023-11-18 03:52:15,494 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.237e+01 1.104e+02 1.256e+02 1.386e+02 2.512e+02, threshold=2.511e+02, percent-clipped=1.0 2023-11-18 03:52:16,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=45333.333333333336, ans=15.0 2023-11-18 03:52:27,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=45400.0, ans=0.125 2023-11-18 03:52:36,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=45466.666666666664, ans=0.125 2023-11-18 03:52:41,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=45533.333333333336, ans=0.2 2023-11-18 03:53:03,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=45600.0, ans=0.0 2023-11-18 03:53:05,767 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6850, loss[loss=0.1426, simple_loss=0.1494, pruned_loss=0.05628, audio_tagging_loss=0.01161, over 14793.00 frames. ], tot_loss[loss=0.1527, simple_loss=0.1513, pruned_loss=0.06382, audio_tagging_loss=0.0132, over 3044397.72 frames. ], batch size: 55, lr: 3.87e-02, grad_scale: 32.0 2023-11-18 03:53:07,309 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.40 vs. limit=22.5 2023-11-18 03:53:37,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=45800.0, ans=0.0 2023-11-18 03:53:41,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=45866.666666666664, ans=0.1 2023-11-18 03:54:01,996 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6900, loss[loss=0.1197, simple_loss=0.1101, pruned_loss=0.04765, audio_tagging_loss=0.01699, over 14309.00 frames. ], tot_loss[loss=0.1516, simple_loss=0.1503, pruned_loss=0.06326, audio_tagging_loss=0.01318, over 3046852.97 frames. ], batch size: 55, lr: 3.86e-02, grad_scale: 32.0 2023-11-18 03:54:02,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=46000.0, ans=0.2 2023-11-18 03:54:08,337 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.826e+01 1.075e+02 1.233e+02 1.509e+02 2.353e+02, threshold=2.467e+02, percent-clipped=0.0 2023-11-18 03:54:14,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2023-11-18 03:54:18,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.58 vs. limit=22.5 2023-11-18 03:54:22,881 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=22.5 2023-11-18 03:54:41,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=46200.0, ans=0.1 2023-11-18 03:54:45,382 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:54:52,064 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:54:58,467 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 6950, loss[loss=0.1594, simple_loss=0.1653, pruned_loss=0.06451, audio_tagging_loss=0.01222, over 15962.00 frames. ], tot_loss[loss=0.1516, simple_loss=0.1507, pruned_loss=0.06296, audio_tagging_loss=0.01324, over 3047186.01 frames. ], batch size: 60, lr: 3.85e-02, grad_scale: 32.0 2023-11-18 03:55:28,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=46466.666666666664, ans=0.035 2023-11-18 03:55:36,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=46533.333333333336, ans=0.125 2023-11-18 03:55:48,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=46600.0, ans=0.125 2023-11-18 03:55:55,785 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7000, loss[loss=0.09356, simple_loss=0.08255, pruned_loss=0.03496, audio_tagging_loss=0.01732, over 15835.00 frames. ], tot_loss[loss=0.1521, simple_loss=0.1511, pruned_loss=0.0633, audio_tagging_loss=0.01327, over 3044240.44 frames. ], batch size: 63, lr: 3.85e-02, grad_scale: 32.0 2023-11-18 03:55:57,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=46666.666666666664, ans=0.0 2023-11-18 03:56:02,185 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.771e+01 1.138e+02 1.312e+02 1.485e+02 2.708e+02, threshold=2.623e+02, percent-clipped=2.0 2023-11-18 03:56:02,500 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:56:06,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=46733.333333333336, ans=0.05 2023-11-18 03:56:14,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=46733.333333333336, ans=0.125 2023-11-18 03:56:47,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=46933.333333333336, ans=0.125 2023-11-18 03:56:51,722 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7050, loss[loss=0.1708, simple_loss=0.1682, pruned_loss=0.07301, audio_tagging_loss=0.01372, over 14642.00 frames. ], tot_loss[loss=0.1529, simple_loss=0.1515, pruned_loss=0.06371, audio_tagging_loss=0.01344, over 3042282.58 frames. ], batch size: 54, lr: 3.84e-02, grad_scale: 32.0 2023-11-18 03:56:56,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47000.0, ans=0.1 2023-11-18 03:57:05,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=47066.666666666664, ans=0.0006376811594202905 2023-11-18 03:57:18,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=47133.333333333336, ans=0.125 2023-11-18 03:57:28,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=47200.0, ans=0.125 2023-11-18 03:57:29,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=47200.0, ans=0.0 2023-11-18 03:57:47,901 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7100, loss[loss=0.1363, simple_loss=0.1265, pruned_loss=0.05684, audio_tagging_loss=0.01618, over 14183.00 frames. ], tot_loss[loss=0.1522, simple_loss=0.1508, pruned_loss=0.0633, audio_tagging_loss=0.01348, over 3039476.26 frames. ], batch size: 53, lr: 3.83e-02, grad_scale: 32.0 2023-11-18 03:57:55,365 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.676e+01 1.058e+02 1.182e+02 1.391e+02 1.929e+02, threshold=2.364e+02, percent-clipped=0.0 2023-11-18 03:58:00,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2023-11-18 03:58:02,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.09 vs. limit=15.0 2023-11-18 03:58:06,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.63 vs. limit=15.0 2023-11-18 03:58:17,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=47466.666666666664, ans=0.125 2023-11-18 03:58:19,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=47466.666666666664, ans=0.125 2023-11-18 03:58:27,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=47533.333333333336, ans=0.0005362318840579708 2023-11-18 03:58:32,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.87 vs. limit=22.5 2023-11-18 03:58:40,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=47600.0, ans=0.125 2023-11-18 03:58:40,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=47600.0, ans=0.1 2023-11-18 03:58:43,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=47600.0, ans=0.025 2023-11-18 03:58:45,124 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7150, loss[loss=0.1784, simple_loss=0.179, pruned_loss=0.07649, audio_tagging_loss=0.01244, over 15310.00 frames. ], tot_loss[loss=0.1528, simple_loss=0.1516, pruned_loss=0.06366, audio_tagging_loss=0.01339, over 3045219.93 frames. ], batch size: 57, lr: 3.83e-02, grad_scale: 32.0 2023-11-18 03:59:06,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=47800.0, ans=15.0 2023-11-18 03:59:21,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=47866.666666666664, ans=0.0 2023-11-18 03:59:30,328 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.388e+00 2023-11-18 03:59:32,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=47933.333333333336, ans=0.125 2023-11-18 03:59:40,938 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7200, loss[loss=0.1515, simple_loss=0.1504, pruned_loss=0.06095, audio_tagging_loss=0.01535, over 15263.00 frames. ], tot_loss[loss=0.1514, simple_loss=0.1499, pruned_loss=0.06289, audio_tagging_loss=0.01356, over 3051157.16 frames. ], batch size: 57, lr: 3.82e-02, grad_scale: 32.0 2023-11-18 03:59:46,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=48000.0, ans=0.0004347826086956528 2023-11-18 03:59:47,296 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 1.038e+02 1.215e+02 1.416e+02 1.908e+02, threshold=2.429e+02, percent-clipped=0.0 2023-11-18 03:59:57,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.87 vs. limit=10.0 2023-11-18 03:59:57,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=48066.666666666664, ans=0.000420289855072465 2023-11-18 04:00:02,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=48133.333333333336, ans=0.2 2023-11-18 04:00:08,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=48133.333333333336, ans=0.00040579710144927547 2023-11-18 04:00:09,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-11-18 04:00:26,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=48266.666666666664, ans=0.125 2023-11-18 04:00:27,468 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:00:28,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=48266.666666666664, ans=0.125 2023-11-18 04:00:36,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=48333.333333333336, ans=0.0 2023-11-18 04:00:37,476 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7250, loss[loss=0.1455, simple_loss=0.1403, pruned_loss=0.05835, audio_tagging_loss=0.01703, over 14286.00 frames. ], tot_loss[loss=0.1516, simple_loss=0.15, pruned_loss=0.06295, audio_tagging_loss=0.01369, over 3061753.80 frames. ], batch size: 55, lr: 3.82e-02, grad_scale: 32.0 2023-11-18 04:00:46,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=48333.333333333336, ans=0.0 2023-11-18 04:01:34,451 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7300, loss[loss=0.1391, simple_loss=0.1265, pruned_loss=0.06279, audio_tagging_loss=0.01302, over 15201.00 frames. ], tot_loss[loss=0.1514, simple_loss=0.1498, pruned_loss=0.06285, audio_tagging_loss=0.01358, over 3060211.45 frames. ], batch size: 56, lr: 3.81e-02, grad_scale: 32.0 2023-11-18 04:01:36,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=48666.666666666664, ans=0.1 2023-11-18 04:01:39,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=48666.666666666664, ans=0.125 2023-11-18 04:01:40,951 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.111e+01 1.121e+02 1.282e+02 1.467e+02 2.763e+02, threshold=2.564e+02, percent-clipped=2.0 2023-11-18 04:01:45,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=48733.333333333336, ans=0.125 2023-11-18 04:01:46,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=48733.333333333336, ans=0.125 2023-11-18 04:01:52,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.59 vs. limit=15.0 2023-11-18 04:02:01,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=48800.0, ans=0.125 2023-11-18 04:02:08,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=48866.666666666664, ans=0.125 2023-11-18 04:02:09,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=19.48 vs. limit=15.0 2023-11-18 04:02:30,019 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7350, loss[loss=0.1779, simple_loss=0.1804, pruned_loss=0.07638, audio_tagging_loss=0.0113, over 15344.00 frames. ], tot_loss[loss=0.151, simple_loss=0.1499, pruned_loss=0.06268, audio_tagging_loss=0.01336, over 3056224.51 frames. ], batch size: 56, lr: 3.80e-02, grad_scale: 32.0 2023-11-18 04:02:34,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=49000.0, ans=0.0 2023-11-18 04:02:35,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=49000.0, ans=0.0 2023-11-18 04:02:38,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.90 vs. limit=10.0 2023-11-18 04:02:38,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=49000.0, ans=0.0 2023-11-18 04:03:00,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49133.333333333336, ans=0.1 2023-11-18 04:03:12,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=49200.0, ans=0.09899494936611666 2023-11-18 04:03:26,849 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7400, loss[loss=0.1443, simple_loss=0.1444, pruned_loss=0.0578, audio_tagging_loss=0.01432, over 14923.00 frames. ], tot_loss[loss=0.1493, simple_loss=0.1479, pruned_loss=0.06195, audio_tagging_loss=0.01338, over 3048561.02 frames. ], batch size: 57, lr: 3.80e-02, grad_scale: 32.0 2023-11-18 04:03:33,831 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.201e+01 1.102e+02 1.229e+02 1.424e+02 2.293e+02, threshold=2.457e+02, percent-clipped=0.0 2023-11-18 04:03:38,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=49400.0, ans=0.1 2023-11-18 04:03:56,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=49466.666666666664, ans=0.125 2023-11-18 04:04:01,717 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:04:15,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=49600.0, ans=0.125 2023-11-18 04:04:15,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=49600.0, ans=0.0 2023-11-18 04:04:23,581 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7450, loss[loss=0.1327, simple_loss=0.1311, pruned_loss=0.05188, audio_tagging_loss=0.0153, over 16602.00 frames. ], tot_loss[loss=0.1482, simple_loss=0.1465, pruned_loss=0.06152, audio_tagging_loss=0.01347, over 3049027.01 frames. ], batch size: 61, lr: 3.79e-02, grad_scale: 32.0 2023-11-18 04:04:33,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.20 vs. limit=15.0 2023-11-18 04:04:37,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=49733.333333333336, ans=0.125 2023-11-18 04:04:38,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=49733.333333333336, ans=0.2 2023-11-18 04:04:46,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=49800.0, ans=4.3478260869565105e-05 2023-11-18 04:05:06,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=49866.666666666664, ans=0.125 2023-11-18 04:05:09,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=49933.333333333336, ans=0.125 2023-11-18 04:05:12,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2023-11-18 04:05:20,162 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7500, loss[loss=0.1409, simple_loss=0.1429, pruned_loss=0.05473, audio_tagging_loss=0.0147, over 16178.00 frames. ], tot_loss[loss=0.1474, simple_loss=0.1457, pruned_loss=0.0611, audio_tagging_loss=0.01346, over 3050120.51 frames. ], batch size: 61, lr: 3.78e-02, grad_scale: 32.0 2023-11-18 04:05:26,547 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 1.063e+02 1.222e+02 1.436e+02 2.018e+02, threshold=2.444e+02, percent-clipped=0.0 2023-11-18 04:05:31,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=50066.666666666664, ans=0.07 2023-11-18 04:05:49,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2023-11-18 04:06:09,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=50266.666666666664, ans=0.95 2023-11-18 04:06:15,869 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7550, loss[loss=0.1353, simple_loss=0.1297, pruned_loss=0.0576, audio_tagging_loss=0.01287, over 15364.00 frames. ], tot_loss[loss=0.1481, simple_loss=0.1465, pruned_loss=0.06152, audio_tagging_loss=0.01332, over 3049350.95 frames. ], batch size: 59, lr: 3.78e-02, grad_scale: 32.0 2023-11-18 04:06:18,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=50333.333333333336, ans=0.0 2023-11-18 04:06:20,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=50333.333333333336, ans=0.0 2023-11-18 04:06:51,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2023-11-18 04:07:01,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=50600.0, ans=10.0 2023-11-18 04:07:12,594 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7600, loss[loss=0.129, simple_loss=0.1342, pruned_loss=0.05243, audio_tagging_loss=0.009492, over 15338.00 frames. ], tot_loss[loss=0.1482, simple_loss=0.1467, pruned_loss=0.06166, audio_tagging_loss=0.01322, over 3047880.55 frames. ], batch size: 57, lr: 3.77e-02, grad_scale: 32.0 2023-11-18 04:07:19,600 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.452e+01 1.053e+02 1.216e+02 1.364e+02 2.093e+02, threshold=2.431e+02, percent-clipped=0.0 2023-11-18 04:07:44,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=50800.0, ans=0.0 2023-11-18 04:07:45,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.56 vs. limit=15.0 2023-11-18 04:07:48,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=50866.666666666664, ans=0.1 2023-11-18 04:07:51,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=50866.666666666664, ans=0.125 2023-11-18 04:07:52,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.56 vs. limit=15.0 2023-11-18 04:07:54,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=50866.666666666664, ans=0.07 2023-11-18 04:08:06,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=50933.333333333336, ans=0.125 2023-11-18 04:08:06,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=50933.333333333336, ans=0.125 2023-11-18 04:08:09,174 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7650, loss[loss=0.1732, simple_loss=0.1712, pruned_loss=0.07412, audio_tagging_loss=0.0135, over 15429.00 frames. ], tot_loss[loss=0.1492, simple_loss=0.1477, pruned_loss=0.06208, audio_tagging_loss=0.01329, over 3048336.50 frames. ], batch size: 57, lr: 3.77e-02, grad_scale: 32.0 2023-11-18 04:08:10,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=51000.0, ans=15.0 2023-11-18 04:08:18,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=51000.0, ans=0.125 2023-11-18 04:08:30,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=51133.333333333336, ans=0.2 2023-11-18 04:08:41,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=51133.333333333336, ans=0.0 2023-11-18 04:08:46,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=51200.0, ans=0.0 2023-11-18 04:08:46,700 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:08:58,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=51266.666666666664, ans=0.125 2023-11-18 04:09:05,121 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7700, loss[loss=0.1041, simple_loss=0.1033, pruned_loss=0.03801, audio_tagging_loss=0.01441, over 15913.00 frames. ], tot_loss[loss=0.1482, simple_loss=0.1467, pruned_loss=0.06143, audio_tagging_loss=0.01343, over 3044027.26 frames. ], batch size: 62, lr: 3.76e-02, grad_scale: 32.0 2023-11-18 04:09:09,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=51333.333333333336, ans=0.2 2023-11-18 04:09:09,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=51333.333333333336, ans=0.2 2023-11-18 04:09:09,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.40 vs. limit=15.0 2023-11-18 04:09:12,119 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 1.072e+02 1.285e+02 1.536e+02 2.038e+02, threshold=2.570e+02, percent-clipped=0.0 2023-11-18 04:09:19,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=51400.0, ans=0.0 2023-11-18 04:09:38,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=51533.333333333336, ans=0.0 2023-11-18 04:09:41,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=51533.333333333336, ans=0.1 2023-11-18 04:09:56,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=51600.0, ans=0.125 2023-11-18 04:10:01,774 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7750, loss[loss=0.1459, simple_loss=0.1394, pruned_loss=0.05674, audio_tagging_loss=0.01948, over 15239.00 frames. ], tot_loss[loss=0.1485, simple_loss=0.1472, pruned_loss=0.0614, audio_tagging_loss=0.01347, over 3038695.22 frames. ], batch size: 54, lr: 3.75e-02, grad_scale: 64.0 2023-11-18 04:10:04,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=51666.666666666664, ans=0.0 2023-11-18 04:10:04,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-11-18 04:10:19,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.10 vs. limit=10.0 2023-11-18 04:10:30,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=51800.0, ans=0.125 2023-11-18 04:10:38,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=51866.666666666664, ans=0.0 2023-11-18 04:10:45,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.13 vs. limit=15.0 2023-11-18 04:10:58,483 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7800, loss[loss=0.1504, simple_loss=0.1495, pruned_loss=0.06299, audio_tagging_loss=0.0127, over 15082.00 frames. ], tot_loss[loss=0.1483, simple_loss=0.147, pruned_loss=0.06132, audio_tagging_loss=0.01351, over 3042187.99 frames. ], batch size: 54, lr: 3.75e-02, grad_scale: 64.0 2023-11-18 04:11:04,839 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.657e+01 1.122e+02 1.272e+02 1.519e+02 2.538e+02, threshold=2.545e+02, percent-clipped=0.0 2023-11-18 04:11:13,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=52066.666666666664, ans=0.125 2023-11-18 04:11:24,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=52133.333333333336, ans=0.2 2023-11-18 04:11:28,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.29 vs. limit=15.0 2023-11-18 04:11:47,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=52266.666666666664, ans=0.125 2023-11-18 04:11:54,967 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7850, loss[loss=0.1266, simple_loss=0.1172, pruned_loss=0.05089, audio_tagging_loss=0.0171, over 14479.00 frames. ], tot_loss[loss=0.1473, simple_loss=0.1455, pruned_loss=0.06082, audio_tagging_loss=0.01369, over 3037506.88 frames. ], batch size: 55, lr: 3.74e-02, grad_scale: 64.0 2023-11-18 04:12:01,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.24 vs. limit=22.5 2023-11-18 04:12:12,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=52400.0, ans=0.125 2023-11-18 04:12:14,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=52400.0, ans=0.025 2023-11-18 04:12:35,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=52533.333333333336, ans=0.0 2023-11-18 04:12:35,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.97 vs. limit=15.0 2023-11-18 04:12:51,438 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7900, loss[loss=0.1258, simple_loss=0.1204, pruned_loss=0.04687, audio_tagging_loss=0.0187, over 15334.00 frames. ], tot_loss[loss=0.1463, simple_loss=0.1443, pruned_loss=0.06014, audio_tagging_loss=0.01396, over 3034835.05 frames. ], batch size: 59, lr: 3.73e-02, grad_scale: 64.0 2023-11-18 04:12:58,421 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.654e+01 1.083e+02 1.346e+02 1.574e+02 2.605e+02, threshold=2.691e+02, percent-clipped=2.0 2023-11-18 04:13:22,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2023-11-18 04:13:33,368 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.287e+00 2023-11-18 04:13:36,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=52933.333333333336, ans=0.1 2023-11-18 04:13:38,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=52933.333333333336, ans=0.035 2023-11-18 04:13:41,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2023-11-18 04:13:42,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=52933.333333333336, ans=0.125 2023-11-18 04:13:47,566 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 7950, loss[loss=0.135, simple_loss=0.1244, pruned_loss=0.05477, audio_tagging_loss=0.01799, over 14231.00 frames. ], tot_loss[loss=0.1475, simple_loss=0.1453, pruned_loss=0.0608, audio_tagging_loss=0.01405, over 3035663.43 frames. ], batch size: 53, lr: 3.73e-02, grad_scale: 64.0 2023-11-18 04:13:52,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=53000.0, ans=15.0 2023-11-18 04:14:01,644 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:14:11,863 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:14:26,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2023-11-18 04:14:40,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=53266.666666666664, ans=0.1 2023-11-18 04:14:45,302 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8000, loss[loss=0.1345, simple_loss=0.1326, pruned_loss=0.05507, audio_tagging_loss=0.01312, over 15110.00 frames. ], tot_loss[loss=0.1473, simple_loss=0.1452, pruned_loss=0.06065, audio_tagging_loss=0.01408, over 3037970.49 frames. ], batch size: 58, lr: 3.72e-02, grad_scale: 64.0 2023-11-18 04:14:52,306 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.958e+01 1.064e+02 1.192e+02 1.330e+02 2.518e+02, threshold=2.384e+02, percent-clipped=0.0 2023-11-18 04:15:14,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=53466.666666666664, ans=0.0 2023-11-18 04:15:18,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=53533.333333333336, ans=0.1 2023-11-18 04:15:41,085 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8050, loss[loss=0.1117, simple_loss=0.1199, pruned_loss=0.03825, audio_tagging_loss=0.01347, over 14414.00 frames. ], tot_loss[loss=0.1494, simple_loss=0.1477, pruned_loss=0.06156, audio_tagging_loss=0.01396, over 3038254.57 frames. ], batch size: 54, lr: 3.72e-02, grad_scale: 64.0 2023-11-18 04:15:42,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=53666.666666666664, ans=0.125 2023-11-18 04:15:49,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=53666.666666666664, ans=0.2 2023-11-18 04:15:52,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=53733.333333333336, ans=0.0 2023-11-18 04:16:16,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=53866.666666666664, ans=0.125 2023-11-18 04:16:20,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=53866.666666666664, ans=0.125 2023-11-18 04:16:22,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=53866.666666666664, ans=0.0 2023-11-18 04:16:30,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=53933.333333333336, ans=0.0 2023-11-18 04:16:37,486 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8100, loss[loss=0.1234, simple_loss=0.124, pruned_loss=0.04792, audio_tagging_loss=0.01347, over 15486.00 frames. ], tot_loss[loss=0.149, simple_loss=0.1475, pruned_loss=0.06144, audio_tagging_loss=0.01379, over 3041130.91 frames. ], batch size: 58, lr: 3.71e-02, grad_scale: 64.0 2023-11-18 04:16:43,799 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 1.055e+02 1.175e+02 1.442e+02 1.996e+02, threshold=2.349e+02, percent-clipped=0.0 2023-11-18 04:16:59,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=54133.333333333336, ans=0.125 2023-11-18 04:17:32,888 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8150, loss[loss=0.1375, simple_loss=0.1436, pruned_loss=0.05354, audio_tagging_loss=0.01212, over 14100.00 frames. ], tot_loss[loss=0.1493, simple_loss=0.1481, pruned_loss=0.06173, audio_tagging_loss=0.01353, over 3037329.46 frames. ], batch size: 54, lr: 3.70e-02, grad_scale: 64.0 2023-11-18 04:17:52,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.94 vs. limit=15.0 2023-11-18 04:18:25,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=54600.0, ans=0.125 2023-11-18 04:18:29,071 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8200, loss[loss=0.1443, simple_loss=0.14, pruned_loss=0.05798, audio_tagging_loss=0.01636, over 15067.00 frames. ], tot_loss[loss=0.1486, simple_loss=0.148, pruned_loss=0.06128, audio_tagging_loss=0.01333, over 3051271.27 frames. ], batch size: 56, lr: 3.70e-02, grad_scale: 32.0 2023-11-18 04:18:29,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=54666.666666666664, ans=0.5 2023-11-18 04:18:30,773 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:18:37,080 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.282e+01 1.076e+02 1.233e+02 1.443e+02 5.591e+02, threshold=2.467e+02, percent-clipped=1.0 2023-11-18 04:18:48,522 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:18:52,932 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:18:57,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=54800.0, ans=0.0 2023-11-18 04:19:03,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=54866.666666666664, ans=0.09899494936611666 2023-11-18 04:19:11,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=54866.666666666664, ans=0.125 2023-11-18 04:19:18,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=54933.333333333336, ans=0.125 2023-11-18 04:19:24,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=55000.0, ans=0.07 2023-11-18 04:19:25,766 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8250, loss[loss=0.1417, simple_loss=0.1549, pruned_loss=0.05407, audio_tagging_loss=0.01017, over 15638.00 frames. ], tot_loss[loss=0.1493, simple_loss=0.149, pruned_loss=0.06154, audio_tagging_loss=0.01322, over 3051440.52 frames. ], batch size: 55, lr: 3.69e-02, grad_scale: 32.0 2023-11-18 04:19:39,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=55066.666666666664, ans=0.125 2023-11-18 04:19:55,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=55133.333333333336, ans=0.0 2023-11-18 04:20:00,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=55200.0, ans=0.0 2023-11-18 04:20:01,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.78 vs. limit=15.0 2023-11-18 04:20:14,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=55266.666666666664, ans=0.0 2023-11-18 04:20:19,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=55266.666666666664, ans=0.125 2023-11-18 04:20:21,385 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8300, loss[loss=0.1229, simple_loss=0.1153, pruned_loss=0.051, audio_tagging_loss=0.01427, over 13633.00 frames. ], tot_loss[loss=0.1484, simple_loss=0.1482, pruned_loss=0.06108, audio_tagging_loss=0.01328, over 3046508.28 frames. ], batch size: 55, lr: 3.68e-02, grad_scale: 32.0 2023-11-18 04:20:28,703 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.605e+01 1.079e+02 1.222e+02 1.465e+02 2.413e+02, threshold=2.444e+02, percent-clipped=0.0 2023-11-18 04:20:34,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2023-11-18 04:20:34,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=55400.0, ans=0.0 2023-11-18 04:20:40,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=55400.0, ans=0.1 2023-11-18 04:20:41,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=55400.0, ans=0.125 2023-11-18 04:20:52,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.71 vs. limit=10.0 2023-11-18 04:20:59,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=55533.333333333336, ans=0.0 2023-11-18 04:21:01,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=55533.333333333336, ans=0.1 2023-11-18 04:21:13,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=55600.0, ans=0.125 2023-11-18 04:21:17,278 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8350, loss[loss=0.1831, simple_loss=0.1792, pruned_loss=0.07882, audio_tagging_loss=0.0147, over 15089.00 frames. ], tot_loss[loss=0.1488, simple_loss=0.1487, pruned_loss=0.06136, audio_tagging_loss=0.01312, over 3044082.37 frames. ], batch size: 56, lr: 3.68e-02, grad_scale: 32.0 2023-11-18 04:21:38,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=55733.333333333336, ans=0.125 2023-11-18 04:21:45,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=55800.0, ans=0.1 2023-11-18 04:22:04,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2023-11-18 04:22:14,449 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8400, loss[loss=0.1233, simple_loss=0.1175, pruned_loss=0.05235, audio_tagging_loss=0.01226, over 15849.00 frames. ], tot_loss[loss=0.1478, simple_loss=0.1478, pruned_loss=0.0609, audio_tagging_loss=0.01302, over 3042959.26 frames. ], batch size: 62, lr: 3.67e-02, grad_scale: 32.0 2023-11-18 04:22:15,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=56000.0, ans=0.125 2023-11-18 04:22:16,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=56000.0, ans=0.0 2023-11-18 04:22:21,885 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.450e+01 1.072e+02 1.183e+02 1.364e+02 2.045e+02, threshold=2.367e+02, percent-clipped=0.0 2023-11-18 04:23:09,888 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8450, loss[loss=0.1619, simple_loss=0.1559, pruned_loss=0.07217, audio_tagging_loss=0.01178, over 15584.00 frames. ], tot_loss[loss=0.147, simple_loss=0.1469, pruned_loss=0.06039, audio_tagging_loss=0.01319, over 3041983.47 frames. ], batch size: 58, lr: 3.67e-02, grad_scale: 32.0 2023-11-18 04:23:22,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=56400.0, ans=0.2 2023-11-18 04:23:44,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=56533.333333333336, ans=0.0 2023-11-18 04:24:03,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=56600.0, ans=0.125 2023-11-18 04:24:05,589 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8500, loss[loss=0.1483, simple_loss=0.1451, pruned_loss=0.0644, audio_tagging_loss=0.0114, over 15219.00 frames. ], tot_loss[loss=0.1489, simple_loss=0.149, pruned_loss=0.06134, audio_tagging_loss=0.01302, over 3039866.99 frames. ], batch size: 56, lr: 3.66e-02, grad_scale: 32.0 2023-11-18 04:24:13,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.79 vs. limit=6.0 2023-11-18 04:24:13,490 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 1.081e+02 1.253e+02 1.521e+02 2.592e+02, threshold=2.506e+02, percent-clipped=2.0 2023-11-18 04:24:13,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=56666.666666666664, ans=0.125 2023-11-18 04:24:17,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=56733.333333333336, ans=0.125 2023-11-18 04:24:26,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=56733.333333333336, ans=0.2 2023-11-18 04:24:30,882 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:25:02,318 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8550, loss[loss=0.1706, simple_loss=0.1839, pruned_loss=0.0691, audio_tagging_loss=0.009544, over 14860.00 frames. ], tot_loss[loss=0.1493, simple_loss=0.1493, pruned_loss=0.06148, audio_tagging_loss=0.01318, over 3041691.97 frames. ], batch size: 53, lr: 3.65e-02, grad_scale: 32.0 2023-11-18 04:25:34,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=57200.0, ans=0.125 2023-11-18 04:25:52,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=57266.666666666664, ans=0.125 2023-11-18 04:25:58,793 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8600, loss[loss=0.1522, simple_loss=0.1576, pruned_loss=0.05729, audio_tagging_loss=0.01617, over 15060.00 frames. ], tot_loss[loss=0.1484, simple_loss=0.1482, pruned_loss=0.06086, audio_tagging_loss=0.01342, over 3039604.75 frames. ], batch size: 57, lr: 3.65e-02, grad_scale: 32.0 2023-11-18 04:26:06,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.067e+01 1.036e+02 1.166e+02 1.373e+02 2.331e+02, threshold=2.332e+02, percent-clipped=0.0 2023-11-18 04:26:22,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2023-11-18 04:26:30,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=57466.666666666664, ans=0.2 2023-11-18 04:26:31,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=57533.333333333336, ans=0.125 2023-11-18 04:26:39,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=57533.333333333336, ans=0.125 2023-11-18 04:26:41,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=57533.333333333336, ans=0.125 2023-11-18 04:26:48,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.72 vs. limit=12.0 2023-11-18 04:26:55,056 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8650, loss[loss=0.1388, simple_loss=0.1333, pruned_loss=0.0592, audio_tagging_loss=0.01297, over 15022.00 frames. ], tot_loss[loss=0.1491, simple_loss=0.149, pruned_loss=0.06112, audio_tagging_loss=0.01347, over 3038554.23 frames. ], batch size: 57, lr: 3.64e-02, grad_scale: 32.0 2023-11-18 04:27:02,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=57666.666666666664, ans=0.1 2023-11-18 04:27:08,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=57733.333333333336, ans=0.0 2023-11-18 04:27:09,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=57733.333333333336, ans=0.125 2023-11-18 04:27:13,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=57733.333333333336, ans=0.125 2023-11-18 04:27:45,601 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.29 vs. limit=15.0 2023-11-18 04:27:50,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=58000.0, ans=0.0 2023-11-18 04:27:51,161 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8700, loss[loss=0.1488, simple_loss=0.1406, pruned_loss=0.0621, audio_tagging_loss=0.01638, over 14103.00 frames. ], tot_loss[loss=0.1492, simple_loss=0.1491, pruned_loss=0.06112, audio_tagging_loss=0.01353, over 3041732.66 frames. ], batch size: 54, lr: 3.64e-02, grad_scale: 32.0 2023-11-18 04:27:55,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-11-18 04:27:59,160 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 1.148e+02 1.309e+02 1.555e+02 2.620e+02, threshold=2.618e+02, percent-clipped=1.0 2023-11-18 04:28:04,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.03 vs. limit=22.5 2023-11-18 04:28:16,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=58133.333333333336, ans=0.0 2023-11-18 04:28:28,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=58200.0, ans=0.125 2023-11-18 04:28:41,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.31 vs. limit=22.5 2023-11-18 04:28:47,665 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8750, loss[loss=0.1373, simple_loss=0.1332, pruned_loss=0.05804, audio_tagging_loss=0.01268, over 14860.00 frames. ], tot_loss[loss=0.1488, simple_loss=0.1486, pruned_loss=0.06098, audio_tagging_loss=0.01354, over 3040194.44 frames. ], batch size: 56, lr: 3.63e-02, grad_scale: 32.0 2023-11-18 04:28:54,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=58333.333333333336, ans=0.125 2023-11-18 04:28:59,506 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:29:00,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=58400.0, ans=0.125 2023-11-18 04:29:04,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=58400.0, ans=0.0 2023-11-18 04:29:25,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=58533.333333333336, ans=0.0 2023-11-18 04:29:40,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=58600.0, ans=0.2 2023-11-18 04:29:42,984 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8800, loss[loss=0.1301, simple_loss=0.1333, pruned_loss=0.04903, audio_tagging_loss=0.01444, over 15226.00 frames. ], tot_loss[loss=0.1489, simple_loss=0.1489, pruned_loss=0.06086, audio_tagging_loss=0.01358, over 3041103.20 frames. ], batch size: 56, lr: 3.62e-02, grad_scale: 32.0 2023-11-18 04:29:44,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=58666.666666666664, ans=0.125 2023-11-18 04:29:50,757 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 9.076e+01 1.175e+02 1.354e+02 1.562e+02 2.721e+02, threshold=2.708e+02, percent-clipped=1.0 2023-11-18 04:30:24,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=58866.666666666664, ans=15.0 2023-11-18 04:30:37,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2023-11-18 04:30:38,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=59000.0, ans=0.125 2023-11-18 04:30:39,313 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8850, loss[loss=0.123, simple_loss=0.1209, pruned_loss=0.04859, audio_tagging_loss=0.01394, over 15386.00 frames. ], tot_loss[loss=0.1499, simple_loss=0.1503, pruned_loss=0.06124, audio_tagging_loss=0.0135, over 3044287.32 frames. ], batch size: 57, lr: 3.62e-02, grad_scale: 32.0 2023-11-18 04:30:45,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=59000.0, ans=0.1 2023-11-18 04:30:45,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.55 vs. limit=22.5 2023-11-18 04:30:51,671 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:31:14,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=59200.0, ans=0.125 2023-11-18 04:31:23,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=59266.666666666664, ans=0.025 2023-11-18 04:31:34,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=59333.333333333336, ans=0.125 2023-11-18 04:31:35,259 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8900, loss[loss=0.1374, simple_loss=0.1434, pruned_loss=0.05388, audio_tagging_loss=0.01181, over 15060.00 frames. ], tot_loss[loss=0.1494, simple_loss=0.15, pruned_loss=0.06112, audio_tagging_loss=0.01329, over 3045079.02 frames. ], batch size: 56, lr: 3.61e-02, grad_scale: 32.0 2023-11-18 04:31:43,298 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.181e+01 1.040e+02 1.138e+02 1.318e+02 1.926e+02, threshold=2.277e+02, percent-clipped=0.0 2023-11-18 04:31:43,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=59333.333333333336, ans=0.0 2023-11-18 04:32:24,145 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:32:30,877 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 8950, loss[loss=0.1174, simple_loss=0.1123, pruned_loss=0.04873, audio_tagging_loss=0.01251, over 14688.00 frames. ], tot_loss[loss=0.1491, simple_loss=0.1501, pruned_loss=0.06093, audio_tagging_loss=0.01315, over 3043782.43 frames. ], batch size: 56, lr: 3.61e-02, grad_scale: 32.0 2023-11-18 04:33:13,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=59866.666666666664, ans=0.125 2023-11-18 04:33:19,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=59933.333333333336, ans=0.125 2023-11-18 04:33:22,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=59933.333333333336, ans=0.0 2023-11-18 04:33:27,435 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9000, loss[loss=0.1555, simple_loss=0.1596, pruned_loss=0.06329, audio_tagging_loss=0.01243, over 15594.00 frames. ], tot_loss[loss=0.1489, simple_loss=0.1498, pruned_loss=0.06092, audio_tagging_loss=0.01302, over 3040690.49 frames. ], batch size: 56, lr: 3.60e-02, grad_scale: 32.0 2023-11-18 04:33:27,436 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 04:34:01,190 INFO [train_asr.py:1147] (1/4) Epoch 1, validation: loss=0.0967, simple_loss=0.07481, pruned_loss=0.01931, audio_tagging_loss=0.03999, over 4681554.00 frames. 2023-11-18 04:34:01,191 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 04:34:09,102 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 1.047e+02 1.193e+02 1.407e+02 2.407e+02, threshold=2.385e+02, percent-clipped=1.0 2023-11-18 04:34:14,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=60066.666666666664, ans=0.2 2023-11-18 04:34:17,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=60066.666666666664, ans=0.1 2023-11-18 04:34:23,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=60133.333333333336, ans=0.07 2023-11-18 04:34:25,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.45 vs. limit=22.5 2023-11-18 04:34:32,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=60133.333333333336, ans=0.125 2023-11-18 04:34:32,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=60133.333333333336, ans=0.125 2023-11-18 04:34:57,626 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9050, loss[loss=0.1225, simple_loss=0.112, pruned_loss=0.05022, audio_tagging_loss=0.01626, over 16047.00 frames. ], tot_loss[loss=0.1478, simple_loss=0.1486, pruned_loss=0.06051, audio_tagging_loss=0.01296, over 3052859.96 frames. ], batch size: 66, lr: 3.59e-02, grad_scale: 32.0 2023-11-18 04:35:01,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=60333.333333333336, ans=0.125 2023-11-18 04:35:22,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=60466.666666666664, ans=0.95 2023-11-18 04:35:22,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=60466.666666666664, ans=0.1 2023-11-18 04:35:30,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=60533.333333333336, ans=0.0 2023-11-18 04:35:30,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=60533.333333333336, ans=0.125 2023-11-18 04:35:52,811 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9100, loss[loss=0.1711, simple_loss=0.1781, pruned_loss=0.07174, audio_tagging_loss=0.0103, over 16045.00 frames. ], tot_loss[loss=0.1482, simple_loss=0.1493, pruned_loss=0.06071, audio_tagging_loss=0.01286, over 3055757.57 frames. ], batch size: 58, lr: 3.59e-02, grad_scale: 32.0 2023-11-18 04:35:57,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=60666.666666666664, ans=0.05 2023-11-18 04:36:00,841 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.380e+01 1.098e+02 1.291e+02 1.456e+02 2.208e+02, threshold=2.583e+02, percent-clipped=0.0 2023-11-18 04:36:04,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=60733.333333333336, ans=0.0 2023-11-18 04:36:12,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=60733.333333333336, ans=0.0 2023-11-18 04:36:14,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=60800.0, ans=0.125 2023-11-18 04:36:32,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=60866.666666666664, ans=0.0 2023-11-18 04:36:44,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=60933.333333333336, ans=0.125 2023-11-18 04:36:48,972 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9150, loss[loss=0.1557, simple_loss=0.1628, pruned_loss=0.0614, audio_tagging_loss=0.01293, over 15613.00 frames. ], tot_loss[loss=0.1471, simple_loss=0.1482, pruned_loss=0.06012, audio_tagging_loss=0.0129, over 3050698.84 frames. ], batch size: 56, lr: 3.58e-02, grad_scale: 32.0 2023-11-18 04:36:51,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.82 vs. limit=22.5 2023-11-18 04:37:33,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=61266.666666666664, ans=0.07 2023-11-18 04:37:36,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=61266.666666666664, ans=0.0 2023-11-18 04:37:44,924 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9200, loss[loss=0.1662, simple_loss=0.1725, pruned_loss=0.06896, audio_tagging_loss=0.01099, over 15053.00 frames. ], tot_loss[loss=0.1466, simple_loss=0.1474, pruned_loss=0.05986, audio_tagging_loss=0.01301, over 3060380.74 frames. ], batch size: 55, lr: 3.58e-02, grad_scale: 32.0 2023-11-18 04:37:49,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=61333.333333333336, ans=0.125 2023-11-18 04:37:50,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=61333.333333333336, ans=0.125 2023-11-18 04:37:52,955 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.523e+01 1.127e+02 1.318e+02 1.536e+02 2.303e+02, threshold=2.636e+02, percent-clipped=0.0 2023-11-18 04:38:04,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=61400.0, ans=10.0 2023-11-18 04:38:10,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=61466.666666666664, ans=0.025 2023-11-18 04:38:41,845 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9250, loss[loss=0.1201, simple_loss=0.1162, pruned_loss=0.04799, audio_tagging_loss=0.01401, over 14322.00 frames. ], tot_loss[loss=0.1457, simple_loss=0.1468, pruned_loss=0.05938, audio_tagging_loss=0.01295, over 3057317.12 frames. ], batch size: 57, lr: 3.57e-02, grad_scale: 32.0 2023-11-18 04:38:49,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=61666.666666666664, ans=0.09899494936611666 2023-11-18 04:38:59,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.44 vs. limit=22.5 2023-11-18 04:39:11,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=61800.0, ans=0.125 2023-11-18 04:39:13,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=61800.0, ans=0.125 2023-11-18 04:39:20,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.07 vs. limit=15.0 2023-11-18 04:39:21,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=61866.666666666664, ans=0.0 2023-11-18 04:39:31,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.35 vs. limit=15.0 2023-11-18 04:39:31,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=61933.333333333336, ans=0.125 2023-11-18 04:39:33,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=61933.333333333336, ans=0.95 2023-11-18 04:39:37,696 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9300, loss[loss=0.1504, simple_loss=0.1535, pruned_loss=0.0601, audio_tagging_loss=0.0135, over 14594.00 frames. ], tot_loss[loss=0.1464, simple_loss=0.1473, pruned_loss=0.0598, audio_tagging_loss=0.01296, over 3049360.34 frames. ], batch size: 55, lr: 3.57e-02, grad_scale: 32.0 2023-11-18 04:39:45,029 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.224e+01 1.082e+02 1.160e+02 1.352e+02 1.912e+02, threshold=2.319e+02, percent-clipped=0.0 2023-11-18 04:39:47,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=62066.666666666664, ans=0.05 2023-11-18 04:39:50,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=62066.666666666664, ans=0.125 2023-11-18 04:40:04,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=62133.333333333336, ans=0.09899494936611666 2023-11-18 04:40:23,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=62266.666666666664, ans=0.1 2023-11-18 04:40:32,214 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:40:32,976 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9350, loss[loss=0.1663, simple_loss=0.1649, pruned_loss=0.06721, audio_tagging_loss=0.01668, over 15815.00 frames. ], tot_loss[loss=0.147, simple_loss=0.148, pruned_loss=0.05996, audio_tagging_loss=0.01307, over 3050858.17 frames. ], batch size: 58, lr: 3.56e-02, grad_scale: 32.0 2023-11-18 04:40:41,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=62333.333333333336, ans=0.0 2023-11-18 04:40:47,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=62400.0, ans=0.2 2023-11-18 04:40:54,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=62466.666666666664, ans=0.2 2023-11-18 04:40:58,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=62466.666666666664, ans=0.2 2023-11-18 04:41:01,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=62466.666666666664, ans=0.1 2023-11-18 04:41:05,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=62533.333333333336, ans=0.0 2023-11-18 04:41:25,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2023-11-18 04:41:29,988 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9400, loss[loss=0.1658, simple_loss=0.1687, pruned_loss=0.06814, audio_tagging_loss=0.01334, over 14976.00 frames. ], tot_loss[loss=0.1486, simple_loss=0.1491, pruned_loss=0.06075, audio_tagging_loss=0.01335, over 3047020.47 frames. ], batch size: 57, lr: 3.55e-02, grad_scale: 32.0 2023-11-18 04:41:37,999 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.990e+01 1.022e+02 1.168e+02 1.353e+02 2.252e+02, threshold=2.336e+02, percent-clipped=0.0 2023-11-18 04:41:51,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=62800.0, ans=0.125 2023-11-18 04:41:56,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=62800.0, ans=0.0 2023-11-18 04:42:10,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=62866.666666666664, ans=0.125 2023-11-18 04:42:21,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=62933.333333333336, ans=0.125 2023-11-18 04:42:25,915 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9450, loss[loss=0.206, simple_loss=0.2004, pruned_loss=0.09559, audio_tagging_loss=0.01022, over 15559.00 frames. ], tot_loss[loss=0.147, simple_loss=0.1471, pruned_loss=0.06, audio_tagging_loss=0.01348, over 3043302.31 frames. ], batch size: 56, lr: 3.55e-02, grad_scale: 32.0 2023-11-18 04:42:25,928 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:42:34,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=63000.0, ans=0.0 2023-11-18 04:42:39,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=63066.666666666664, ans=0.0 2023-11-18 04:42:56,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=63133.333333333336, ans=0.125 2023-11-18 04:43:09,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=63266.666666666664, ans=0.125 2023-11-18 04:43:21,285 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9500, loss[loss=0.1476, simple_loss=0.1427, pruned_loss=0.0616, audio_tagging_loss=0.01464, over 14854.00 frames. ], tot_loss[loss=0.1468, simple_loss=0.1468, pruned_loss=0.05985, audio_tagging_loss=0.01358, over 3047031.73 frames. ], batch size: 56, lr: 3.54e-02, grad_scale: 32.0 2023-11-18 04:43:29,296 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.749e+01 1.117e+02 1.291e+02 1.412e+02 2.358e+02, threshold=2.583e+02, percent-clipped=1.0 2023-11-18 04:43:36,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=63400.0, ans=0.125 2023-11-18 04:43:41,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=63400.0, ans=0.1 2023-11-18 04:43:43,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=63466.666666666664, ans=0.125 2023-11-18 04:43:46,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2023-11-18 04:44:10,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=63600.0, ans=0.2 2023-11-18 04:44:17,717 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9550, loss[loss=0.1285, simple_loss=0.1352, pruned_loss=0.04617, audio_tagging_loss=0.01476, over 16344.00 frames. ], tot_loss[loss=0.1469, simple_loss=0.147, pruned_loss=0.05976, audio_tagging_loss=0.01359, over 3045135.76 frames. ], batch size: 60, lr: 3.54e-02, grad_scale: 32.0 2023-11-18 04:44:27,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=63666.666666666664, ans=0.125 2023-11-18 04:44:32,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=63733.333333333336, ans=0.0 2023-11-18 04:44:52,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=63866.666666666664, ans=0.0 2023-11-18 04:45:14,619 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9600, loss[loss=0.1427, simple_loss=0.1492, pruned_loss=0.05695, audio_tagging_loss=0.0112, over 15011.00 frames. ], tot_loss[loss=0.1478, simple_loss=0.1482, pruned_loss=0.06015, audio_tagging_loss=0.01357, over 3047172.93 frames. ], batch size: 56, lr: 3.53e-02, grad_scale: 32.0 2023-11-18 04:45:14,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=64000.0, ans=0.2 2023-11-18 04:45:16,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=64000.0, ans=0.0 2023-11-18 04:45:20,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=64000.0, ans=0.125 2023-11-18 04:45:22,002 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 1.076e+02 1.212e+02 1.383e+02 1.987e+02, threshold=2.424e+02, percent-clipped=0.0 2023-11-18 04:45:29,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.68 vs. limit=22.5 2023-11-18 04:45:53,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=64200.0, ans=0.125 2023-11-18 04:46:00,817 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.67 vs. limit=22.5 2023-11-18 04:46:09,774 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9650, loss[loss=0.1236, simple_loss=0.1218, pruned_loss=0.04898, audio_tagging_loss=0.01376, over 14732.00 frames. ], tot_loss[loss=0.147, simple_loss=0.1474, pruned_loss=0.05981, audio_tagging_loss=0.01352, over 3046660.05 frames. ], batch size: 57, lr: 3.53e-02, grad_scale: 32.0 2023-11-18 04:46:18,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2023-11-18 04:46:22,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=64400.0, ans=0.1 2023-11-18 04:46:28,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=12.0 2023-11-18 04:46:33,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=64466.666666666664, ans=0.0 2023-11-18 04:46:35,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=64466.666666666664, ans=0.125 2023-11-18 04:46:42,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=64533.333333333336, ans=0.05 2023-11-18 04:46:50,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=64533.333333333336, ans=0.125 2023-11-18 04:47:06,182 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9700, loss[loss=0.1641, simple_loss=0.1734, pruned_loss=0.06794, audio_tagging_loss=0.0094, over 14637.00 frames. ], tot_loss[loss=0.1462, simple_loss=0.1467, pruned_loss=0.05943, audio_tagging_loss=0.01349, over 3041136.63 frames. ], batch size: 52, lr: 3.52e-02, grad_scale: 32.0 2023-11-18 04:47:13,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 1.077e+02 1.252e+02 1.398e+02 2.198e+02, threshold=2.504e+02, percent-clipped=0.0 2023-11-18 04:47:17,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=64733.333333333336, ans=0.05 2023-11-18 04:47:27,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=64800.0, ans=0.125 2023-11-18 04:47:38,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=64866.666666666664, ans=0.0 2023-11-18 04:47:39,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=64866.666666666664, ans=0.1 2023-11-18 04:47:47,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=64866.666666666664, ans=0.125 2023-11-18 04:47:48,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.42 vs. limit=6.0 2023-11-18 04:48:01,906 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9750, loss[loss=0.1266, simple_loss=0.1298, pruned_loss=0.05137, audio_tagging_loss=0.01031, over 16661.00 frames. ], tot_loss[loss=0.1463, simple_loss=0.1472, pruned_loss=0.05947, audio_tagging_loss=0.01327, over 3041640.04 frames. ], batch size: 61, lr: 3.51e-02, grad_scale: 32.0 2023-11-18 04:48:06,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=65000.0, ans=0.125 2023-11-18 04:48:14,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65066.666666666664, ans=0.1 2023-11-18 04:48:17,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=65066.666666666664, ans=0.0 2023-11-18 04:48:26,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=65133.333333333336, ans=0.125 2023-11-18 04:48:32,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=65133.333333333336, ans=0.1 2023-11-18 04:48:40,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=65200.0, ans=0.2 2023-11-18 04:48:50,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65266.666666666664, ans=0.1 2023-11-18 04:48:58,258 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9800, loss[loss=0.1872, simple_loss=0.1825, pruned_loss=0.08036, audio_tagging_loss=0.01562, over 14142.00 frames. ], tot_loss[loss=0.1457, simple_loss=0.1466, pruned_loss=0.05916, audio_tagging_loss=0.01327, over 3030909.55 frames. ], batch size: 53, lr: 3.51e-02, grad_scale: 32.0 2023-11-18 04:49:02,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=65333.333333333336, ans=0.2 2023-11-18 04:49:06,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.876e+01 1.068e+02 1.214e+02 1.427e+02 2.483e+02, threshold=2.428e+02, percent-clipped=0.0 2023-11-18 04:49:08,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=65400.0, ans=0.0 2023-11-18 04:49:08,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65400.0, ans=0.1 2023-11-18 04:49:09,074 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2023-11-18 04:49:15,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=65400.0, ans=0.125 2023-11-18 04:49:22,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2023-11-18 04:49:26,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.31 vs. limit=22.5 2023-11-18 04:49:43,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=65600.0, ans=0.07 2023-11-18 04:49:48,830 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:49:54,145 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9850, loss[loss=0.1953, simple_loss=0.193, pruned_loss=0.08787, audio_tagging_loss=0.01092, over 15488.00 frames. ], tot_loss[loss=0.1465, simple_loss=0.1474, pruned_loss=0.0598, audio_tagging_loss=0.01304, over 3034783.25 frames. ], batch size: 59, lr: 3.50e-02, grad_scale: 32.0 2023-11-18 04:49:56,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=65666.66666666667, ans=0.125 2023-11-18 04:50:07,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=65733.33333333333, ans=0.125 2023-11-18 04:50:10,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=65733.33333333333, ans=0.0 2023-11-18 04:50:26,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=65866.66666666667, ans=0.125 2023-11-18 04:50:32,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65866.66666666667, ans=0.1 2023-11-18 04:50:50,832 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9900, loss[loss=0.1431, simple_loss=0.1489, pruned_loss=0.05689, audio_tagging_loss=0.01179, over 14815.00 frames. ], tot_loss[loss=0.1452, simple_loss=0.1463, pruned_loss=0.059, audio_tagging_loss=0.01306, over 3037802.28 frames. ], batch size: 55, lr: 3.50e-02, grad_scale: 32.0 2023-11-18 04:50:52,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=66000.0, ans=0.125 2023-11-18 04:50:58,265 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.989e+01 1.084e+02 1.192e+02 1.374e+02 2.032e+02, threshold=2.383e+02, percent-clipped=0.0 2023-11-18 04:51:05,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=66066.66666666667, ans=0.0 2023-11-18 04:51:08,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=66066.66666666667, ans=0.125 2023-11-18 04:51:10,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=66066.66666666667, ans=10.0 2023-11-18 04:51:21,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=66133.33333333333, ans=0.0 2023-11-18 04:51:24,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=66200.0, ans=0.125 2023-11-18 04:51:46,592 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 9950, loss[loss=0.1421, simple_loss=0.1439, pruned_loss=0.05904, audio_tagging_loss=0.01111, over 15885.00 frames. ], tot_loss[loss=0.1457, simple_loss=0.1467, pruned_loss=0.05928, audio_tagging_loss=0.01304, over 3040627.82 frames. ], batch size: 57, lr: 3.49e-02, grad_scale: 32.0 2023-11-18 04:51:56,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=66400.0, ans=0.125 2023-11-18 04:51:59,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=66400.0, ans=0.1 2023-11-18 04:52:00,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=66400.0, ans=0.125 2023-11-18 04:52:02,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.95 vs. limit=6.0 2023-11-18 04:52:09,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=66466.66666666667, ans=0.0 2023-11-18 04:52:43,104 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10000, loss[loss=0.1784, simple_loss=0.177, pruned_loss=0.07773, audio_tagging_loss=0.01219, over 15762.00 frames. ], tot_loss[loss=0.1453, simple_loss=0.1466, pruned_loss=0.05894, audio_tagging_loss=0.01306, over 3045583.78 frames. ], batch size: 59, lr: 3.49e-02, grad_scale: 32.0 2023-11-18 04:52:50,995 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.963e+01 1.074e+02 1.249e+02 1.429e+02 2.064e+02, threshold=2.499e+02, percent-clipped=0.0 2023-11-18 04:53:01,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=66733.33333333333, ans=0.125 2023-11-18 04:53:03,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2023-11-18 04:53:28,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=12.0 2023-11-18 04:53:30,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=66933.33333333333, ans=0.0 2023-11-18 04:53:36,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=66933.33333333333, ans=0.0 2023-11-18 04:53:39,075 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10050, loss[loss=0.08701, simple_loss=0.08548, pruned_loss=0.02692, audio_tagging_loss=0.01735, over 15030.00 frames. ], tot_loss[loss=0.1447, simple_loss=0.1459, pruned_loss=0.05859, audio_tagging_loss=0.01314, over 3047981.30 frames. ], batch size: 56, lr: 3.48e-02, grad_scale: 32.0 2023-11-18 04:53:53,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.60 vs. limit=10.0 2023-11-18 04:54:01,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67133.33333333333, ans=0.1 2023-11-18 04:54:05,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=67133.33333333333, ans=0.07 2023-11-18 04:54:11,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=12.0 2023-11-18 04:54:21,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=67200.0, ans=0.0 2023-11-18 04:54:25,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=67266.66666666667, ans=0.125 2023-11-18 04:54:33,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=67266.66666666667, ans=0.125 2023-11-18 04:54:34,775 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10100, loss[loss=0.17, simple_loss=0.1808, pruned_loss=0.07082, audio_tagging_loss=0.008764, over 15281.00 frames. ], tot_loss[loss=0.1436, simple_loss=0.1447, pruned_loss=0.05805, audio_tagging_loss=0.01325, over 3051580.93 frames. ], batch size: 56, lr: 3.47e-02, grad_scale: 32.0 2023-11-18 04:54:39,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=67333.33333333333, ans=0.125 2023-11-18 04:54:39,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=67333.33333333333, ans=0.125 2023-11-18 04:54:41,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.54 vs. limit=22.5 2023-11-18 04:54:42,865 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.958e+01 1.070e+02 1.242e+02 1.409e+02 2.518e+02, threshold=2.485e+02, percent-clipped=1.0 2023-11-18 04:54:55,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=67400.0, ans=0.1 2023-11-18 04:55:19,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.38 vs. limit=10.0 2023-11-18 04:55:19,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=67600.0, ans=0.125 2023-11-18 04:55:20,742 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:55:22,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=67600.0, ans=0.035 2023-11-18 04:55:31,486 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10150, loss[loss=0.1236, simple_loss=0.1283, pruned_loss=0.04553, audio_tagging_loss=0.01387, over 15264.00 frames. ], tot_loss[loss=0.1445, simple_loss=0.146, pruned_loss=0.05838, audio_tagging_loss=0.01316, over 3052150.67 frames. ], batch size: 57, lr: 3.47e-02, grad_scale: 32.0 2023-11-18 04:55:33,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.72 vs. limit=15.0 2023-11-18 04:55:45,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.78 vs. limit=12.0 2023-11-18 04:55:56,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67800.0, ans=0.1 2023-11-18 04:55:58,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=67800.0, ans=0.0 2023-11-18 04:55:59,250 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:56:02,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.51 vs. limit=15.0 2023-11-18 04:56:10,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.74 vs. limit=10.0 2023-11-18 04:56:13,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=67866.66666666667, ans=0.0 2023-11-18 04:56:27,842 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10200, loss[loss=0.1121, simple_loss=0.1084, pruned_loss=0.04377, audio_tagging_loss=0.01418, over 15738.00 frames. ], tot_loss[loss=0.1448, simple_loss=0.1463, pruned_loss=0.05845, audio_tagging_loss=0.01324, over 3050151.15 frames. ], batch size: 60, lr: 3.46e-02, grad_scale: 64.0 2023-11-18 04:56:35,839 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.974e+01 1.095e+02 1.241e+02 1.478e+02 2.822e+02, threshold=2.482e+02, percent-clipped=1.0 2023-11-18 04:56:37,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=68000.0, ans=0.125 2023-11-18 04:56:42,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=68066.66666666667, ans=0.0 2023-11-18 04:56:47,802 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:56:49,764 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:57:13,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.52 vs. limit=10.0 2023-11-18 04:57:21,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=68266.66666666667, ans=0.125 2023-11-18 04:57:23,698 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10250, loss[loss=0.1824, simple_loss=0.1803, pruned_loss=0.08028, audio_tagging_loss=0.01198, over 14999.00 frames. ], tot_loss[loss=0.1439, simple_loss=0.1452, pruned_loss=0.05805, audio_tagging_loss=0.01323, over 3046060.28 frames. ], batch size: 54, lr: 3.46e-02, grad_scale: 64.0 2023-11-18 04:57:33,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=68400.0, ans=0.125 2023-11-18 04:57:38,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.79 vs. limit=15.0 2023-11-18 04:57:50,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=68466.66666666667, ans=0.0 2023-11-18 04:58:04,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=68533.33333333333, ans=0.2 2023-11-18 04:58:11,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=68600.0, ans=0.125 2023-11-18 04:58:19,289 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10300, loss[loss=0.1237, simple_loss=0.1202, pruned_loss=0.04766, audio_tagging_loss=0.01599, over 15995.00 frames. ], tot_loss[loss=0.1439, simple_loss=0.1452, pruned_loss=0.05802, audio_tagging_loss=0.0133, over 3048104.63 frames. ], batch size: 60, lr: 3.45e-02, grad_scale: 64.0 2023-11-18 04:58:22,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2023-11-18 04:58:27,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.697e+01 1.063e+02 1.210e+02 1.437e+02 2.016e+02, threshold=2.421e+02, percent-clipped=0.0 2023-11-18 04:58:38,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=68733.33333333333, ans=0.125 2023-11-18 04:58:48,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2023-11-18 04:59:15,525 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10350, loss[loss=0.1438, simple_loss=0.1426, pruned_loss=0.05823, audio_tagging_loss=0.01425, over 14663.00 frames. ], tot_loss[loss=0.1449, simple_loss=0.1462, pruned_loss=0.05844, audio_tagging_loss=0.01336, over 3046768.24 frames. ], batch size: 55, lr: 3.45e-02, grad_scale: 64.0 2023-11-18 04:59:16,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=69000.0, ans=0.5 2023-11-18 04:59:34,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=69066.66666666667, ans=0.125 2023-11-18 04:59:40,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=69133.33333333333, ans=0.125 2023-11-18 04:59:45,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=69133.33333333333, ans=0.025 2023-11-18 04:59:49,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=69200.0, ans=0.125 2023-11-18 04:59:50,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.39 vs. limit=10.0 2023-11-18 05:00:08,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=69266.66666666667, ans=0.0 2023-11-18 05:00:11,315 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10400, loss[loss=0.1547, simple_loss=0.1706, pruned_loss=0.05952, audio_tagging_loss=0.009947, over 15663.00 frames. ], tot_loss[loss=0.1446, simple_loss=0.1457, pruned_loss=0.05832, audio_tagging_loss=0.01339, over 3052215.79 frames. ], batch size: 56, lr: 3.44e-02, grad_scale: 64.0 2023-11-18 05:00:16,928 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:00:18,711 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.147e+01 1.054e+02 1.220e+02 1.352e+02 2.408e+02, threshold=2.441e+02, percent-clipped=0.0 2023-11-18 05:00:51,354 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=15.0 2023-11-18 05:01:00,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=69600.0, ans=0.125 2023-11-18 05:01:06,931 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10450, loss[loss=0.1546, simple_loss=0.1549, pruned_loss=0.06955, audio_tagging_loss=0.007579, over 15192.00 frames. ], tot_loss[loss=0.1442, simple_loss=0.1457, pruned_loss=0.05806, audio_tagging_loss=0.01327, over 3045667.62 frames. ], batch size: 56, lr: 3.44e-02, grad_scale: 64.0 2023-11-18 05:01:13,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=69666.66666666667, ans=0.125 2023-11-18 05:01:16,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=69666.66666666667, ans=0.2 2023-11-18 05:01:38,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=69800.0, ans=0.125 2023-11-18 05:01:54,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.32 vs. limit=15.0 2023-11-18 05:01:57,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.22 vs. limit=15.0 2023-11-18 05:02:03,062 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10500, loss[loss=0.08893, simple_loss=0.0861, pruned_loss=0.0302, audio_tagging_loss=0.01568, over 14609.00 frames. ], tot_loss[loss=0.1447, simple_loss=0.1462, pruned_loss=0.05853, audio_tagging_loss=0.01311, over 3047898.38 frames. ], batch size: 56, lr: 3.43e-02, grad_scale: 64.0 2023-11-18 05:02:10,948 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.463e+01 1.096e+02 1.231e+02 1.432e+02 2.125e+02, threshold=2.461e+02, percent-clipped=0.0 2023-11-18 05:02:16,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=70066.66666666667, ans=0.125 2023-11-18 05:02:16,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=70066.66666666667, ans=0.1 2023-11-18 05:02:21,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=70066.66666666667, ans=0.125 2023-11-18 05:02:42,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=70200.0, ans=0.0 2023-11-18 05:02:43,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=70200.0, ans=0.0 2023-11-18 05:02:54,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2023-11-18 05:02:58,457 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10550, loss[loss=0.1168, simple_loss=0.1195, pruned_loss=0.04181, audio_tagging_loss=0.01522, over 15123.00 frames. ], tot_loss[loss=0.1458, simple_loss=0.1476, pruned_loss=0.05905, audio_tagging_loss=0.01297, over 3047582.54 frames. ], batch size: 57, lr: 3.43e-02, grad_scale: 64.0 2023-11-18 05:03:09,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=70400.0, ans=0.0 2023-11-18 05:03:13,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.33 vs. limit=6.0 2023-11-18 05:03:21,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=70466.66666666667, ans=0.0 2023-11-18 05:03:24,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=70466.66666666667, ans=0.125 2023-11-18 05:03:34,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=70533.33333333333, ans=0.1 2023-11-18 05:03:37,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=70533.33333333333, ans=0.1 2023-11-18 05:03:39,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=70533.33333333333, ans=10.0 2023-11-18 05:03:39,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=70533.33333333333, ans=0.0 2023-11-18 05:03:53,326 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10600, loss[loss=0.1299, simple_loss=0.1294, pruned_loss=0.05153, audio_tagging_loss=0.01371, over 14482.00 frames. ], tot_loss[loss=0.1448, simple_loss=0.1469, pruned_loss=0.0585, audio_tagging_loss=0.01287, over 3043822.00 frames. ], batch size: 55, lr: 3.42e-02, grad_scale: 64.0 2023-11-18 05:03:56,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=70666.66666666667, ans=0.09899494936611666 2023-11-18 05:04:01,190 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.606e+01 1.084e+02 1.194e+02 1.358e+02 2.173e+02, threshold=2.389e+02, percent-clipped=0.0 2023-11-18 05:04:30,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=15.0 2023-11-18 05:04:49,521 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10650, loss[loss=0.1427, simple_loss=0.1512, pruned_loss=0.05626, audio_tagging_loss=0.01089, over 14534.00 frames. ], tot_loss[loss=0.1451, simple_loss=0.1473, pruned_loss=0.05861, audio_tagging_loss=0.01281, over 3043811.32 frames. ], batch size: 55, lr: 3.41e-02, grad_scale: 64.0 2023-11-18 05:04:57,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2023-11-18 05:04:58,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=71000.0, ans=0.2 2023-11-18 05:05:00,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=71066.66666666667, ans=0.95 2023-11-18 05:05:03,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=71066.66666666667, ans=0.125 2023-11-18 05:05:06,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=71066.66666666667, ans=0.025 2023-11-18 05:05:35,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=71266.66666666667, ans=0.0 2023-11-18 05:05:41,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=71266.66666666667, ans=0.1 2023-11-18 05:05:45,719 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10700, loss[loss=0.1262, simple_loss=0.1307, pruned_loss=0.05186, audio_tagging_loss=0.009003, over 15678.00 frames. ], tot_loss[loss=0.1437, simple_loss=0.1458, pruned_loss=0.05791, audio_tagging_loss=0.01294, over 3046604.98 frames. ], batch size: 60, lr: 3.41e-02, grad_scale: 64.0 2023-11-18 05:05:45,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=71333.33333333333, ans=0.0 2023-11-18 05:05:52,955 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.108e+01 1.048e+02 1.189e+02 1.344e+02 2.146e+02, threshold=2.378e+02, percent-clipped=0.0 2023-11-18 05:06:28,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=71600.0, ans=0.125 2023-11-18 05:06:39,914 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10750, loss[loss=0.1459, simple_loss=0.1396, pruned_loss=0.0636, audio_tagging_loss=0.0125, over 15397.00 frames. ], tot_loss[loss=0.144, simple_loss=0.1462, pruned_loss=0.05804, audio_tagging_loss=0.01284, over 3043781.01 frames. ], batch size: 56, lr: 3.40e-02, grad_scale: 64.0 2023-11-18 05:07:08,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=71800.0, ans=0.0 2023-11-18 05:07:09,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=12.0 2023-11-18 05:07:21,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=71866.66666666667, ans=0.0 2023-11-18 05:07:27,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=71933.33333333333, ans=0.125 2023-11-18 05:07:29,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=71933.33333333333, ans=0.2 2023-11-18 05:07:35,443 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10800, loss[loss=0.1138, simple_loss=0.1122, pruned_loss=0.04018, audio_tagging_loss=0.01749, over 15768.00 frames. ], tot_loss[loss=0.1425, simple_loss=0.1448, pruned_loss=0.05724, audio_tagging_loss=0.0129, over 3046453.70 frames. ], batch size: 58, lr: 3.40e-02, grad_scale: 64.0 2023-11-18 05:07:43,299 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.252e+01 1.082e+02 1.179e+02 1.367e+02 2.142e+02, threshold=2.358e+02, percent-clipped=0.0 2023-11-18 05:07:43,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=72000.0, ans=0.125 2023-11-18 05:08:09,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.44 vs. limit=22.5 2023-11-18 05:08:31,987 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10850, loss[loss=0.07784, simple_loss=0.06801, pruned_loss=0.02678, audio_tagging_loss=0.01706, over 15132.00 frames. ], tot_loss[loss=0.1415, simple_loss=0.1434, pruned_loss=0.05687, audio_tagging_loss=0.01295, over 3048617.35 frames. ], batch size: 60, lr: 3.39e-02, grad_scale: 64.0 2023-11-18 05:08:53,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=72466.66666666667, ans=0.0 2023-11-18 05:08:58,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=72466.66666666667, ans=0.1 2023-11-18 05:09:05,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.89 vs. limit=22.5 2023-11-18 05:09:11,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=72533.33333333333, ans=0.0 2023-11-18 05:09:11,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=72533.33333333333, ans=0.2 2023-11-18 05:09:19,746 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:09:24,812 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:09:26,862 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10900, loss[loss=0.1266, simple_loss=0.1283, pruned_loss=0.04812, audio_tagging_loss=0.01435, over 14988.00 frames. ], tot_loss[loss=0.1418, simple_loss=0.1433, pruned_loss=0.05699, audio_tagging_loss=0.0131, over 3051112.75 frames. ], batch size: 56, lr: 3.39e-02, grad_scale: 64.0 2023-11-18 05:09:33,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.27 vs. limit=15.0 2023-11-18 05:09:34,701 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.986e+01 1.097e+02 1.219e+02 1.380e+02 2.178e+02, threshold=2.437e+02, percent-clipped=0.0 2023-11-18 05:09:45,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=72733.33333333333, ans=0.125 2023-11-18 05:10:11,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=72933.33333333333, ans=0.125 2023-11-18 05:10:22,393 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 10950, loss[loss=0.1553, simple_loss=0.1669, pruned_loss=0.05936, audio_tagging_loss=0.0125, over 15159.00 frames. ], tot_loss[loss=0.1427, simple_loss=0.1445, pruned_loss=0.0573, audio_tagging_loss=0.01317, over 3050489.51 frames. ], batch size: 55, lr: 3.38e-02, grad_scale: 64.0 2023-11-18 05:10:26,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=73000.0, ans=0.125 2023-11-18 05:10:30,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=73000.0, ans=10.0 2023-11-18 05:10:58,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=73200.0, ans=0.125 2023-11-18 05:11:17,652 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.461e+00 2023-11-18 05:11:18,768 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11000, loss[loss=0.1705, simple_loss=0.1872, pruned_loss=0.06923, audio_tagging_loss=0.007648, over 15646.00 frames. ], tot_loss[loss=0.1433, simple_loss=0.1452, pruned_loss=0.05749, audio_tagging_loss=0.01321, over 3050860.60 frames. ], batch size: 56, lr: 3.38e-02, grad_scale: 64.0 2023-11-18 05:11:25,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=73333.33333333333, ans=0.2 2023-11-18 05:11:26,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.286e+01 1.063e+02 1.239e+02 1.487e+02 2.361e+02, threshold=2.479e+02, percent-clipped=0.0 2023-11-18 05:11:28,865 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:11:37,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=73400.0, ans=0.125 2023-11-18 05:11:48,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=73466.66666666667, ans=0.07 2023-11-18 05:11:51,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=73533.33333333333, ans=0.125 2023-11-18 05:12:05,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=73600.0, ans=0.0 2023-11-18 05:12:06,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=73600.0, ans=0.0 2023-11-18 05:12:13,950 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11050, loss[loss=0.1077, simple_loss=0.1003, pruned_loss=0.03773, audio_tagging_loss=0.01978, over 15063.00 frames. ], tot_loss[loss=0.1437, simple_loss=0.1455, pruned_loss=0.0577, audio_tagging_loss=0.01323, over 3057140.88 frames. ], batch size: 59, lr: 3.37e-02, grad_scale: 64.0 2023-11-18 05:12:15,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=73666.66666666667, ans=0.125 2023-11-18 05:12:16,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=73666.66666666667, ans=0.125 2023-11-18 05:12:29,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2023-11-18 05:12:38,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=73800.0, ans=0.0 2023-11-18 05:12:41,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=73800.0, ans=0.0 2023-11-18 05:12:51,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.02 vs. limit=10.0 2023-11-18 05:13:09,734 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11100, loss[loss=0.1955, simple_loss=0.2193, pruned_loss=0.07814, audio_tagging_loss=0.0077, over 14891.00 frames. ], tot_loss[loss=0.1441, simple_loss=0.1459, pruned_loss=0.05777, audio_tagging_loss=0.01337, over 3059083.94 frames. ], batch size: 54, lr: 3.37e-02, grad_scale: 64.0 2023-11-18 05:13:17,721 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.064e+01 1.115e+02 1.316e+02 1.523e+02 2.373e+02, threshold=2.632e+02, percent-clipped=0.0 2023-11-18 05:13:38,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=74133.33333333333, ans=0.125 2023-11-18 05:13:41,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=74133.33333333333, ans=0.0 2023-11-18 05:13:45,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.66 vs. limit=6.0 2023-11-18 05:13:55,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=74266.66666666667, ans=0.2 2023-11-18 05:13:58,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=74266.66666666667, ans=0.0 2023-11-18 05:14:06,315 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11150, loss[loss=0.1062, simple_loss=0.106, pruned_loss=0.03946, audio_tagging_loss=0.01378, over 15571.00 frames. ], tot_loss[loss=0.1429, simple_loss=0.1446, pruned_loss=0.05711, audio_tagging_loss=0.01352, over 3059958.11 frames. ], batch size: 60, lr: 3.36e-02, grad_scale: 64.0 2023-11-18 05:14:09,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.68 vs. limit=10.0 2023-11-18 05:14:11,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=74333.33333333333, ans=0.0 2023-11-18 05:14:25,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=74400.0, ans=0.1 2023-11-18 05:14:27,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.29 vs. limit=6.0 2023-11-18 05:14:32,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=74466.66666666667, ans=0.125 2023-11-18 05:14:33,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=74466.66666666667, ans=0.5 2023-11-18 05:14:36,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=74466.66666666667, ans=0.05 2023-11-18 05:14:38,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=74533.33333333333, ans=0.0 2023-11-18 05:14:42,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=74533.33333333333, ans=0.0 2023-11-18 05:14:51,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=15.0 2023-11-18 05:14:54,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=74600.0, ans=0.0 2023-11-18 05:15:01,599 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11200, loss[loss=0.1455, simple_loss=0.1435, pruned_loss=0.06137, audio_tagging_loss=0.01236, over 15524.00 frames. ], tot_loss[loss=0.1431, simple_loss=0.1448, pruned_loss=0.05719, audio_tagging_loss=0.01353, over 3057499.60 frames. ], batch size: 59, lr: 3.36e-02, grad_scale: 64.0 2023-11-18 05:15:09,622 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.922e+01 1.084e+02 1.213e+02 1.367e+02 1.851e+02, threshold=2.426e+02, percent-clipped=0.0 2023-11-18 05:15:14,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=74733.33333333333, ans=0.2 2023-11-18 05:15:21,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.31 vs. limit=6.0 2023-11-18 05:15:26,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=74800.0, ans=0.1 2023-11-18 05:15:34,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=74866.66666666667, ans=0.125 2023-11-18 05:15:37,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=74866.66666666667, ans=0.0 2023-11-18 05:15:57,801 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11250, loss[loss=0.135, simple_loss=0.1299, pruned_loss=0.0545, audio_tagging_loss=0.01553, over 15818.00 frames. ], tot_loss[loss=0.1417, simple_loss=0.1429, pruned_loss=0.05664, audio_tagging_loss=0.01366, over 3057427.11 frames. ], batch size: 58, lr: 3.35e-02, grad_scale: 64.0 2023-11-18 05:16:05,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=75000.0, ans=0.125 2023-11-18 05:16:17,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.06 vs. limit=22.5 2023-11-18 05:16:46,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=75266.66666666667, ans=0.09899494936611666 2023-11-18 05:16:49,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=75266.66666666667, ans=0.04949747468305833 2023-11-18 05:16:53,040 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11300, loss[loss=0.1296, simple_loss=0.1235, pruned_loss=0.0518, audio_tagging_loss=0.01609, over 13770.00 frames. ], tot_loss[loss=0.1425, simple_loss=0.1442, pruned_loss=0.05712, audio_tagging_loss=0.0133, over 3055028.80 frames. ], batch size: 52, lr: 3.35e-02, grad_scale: 64.0 2023-11-18 05:17:00,934 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.188e+01 1.067e+02 1.239e+02 1.530e+02 2.211e+02, threshold=2.479e+02, percent-clipped=0.0 2023-11-18 05:17:04,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.75 vs. limit=6.0 2023-11-18 05:17:13,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=75400.0, ans=0.025 2023-11-18 05:17:17,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=75466.66666666667, ans=0.125 2023-11-18 05:17:23,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=75466.66666666667, ans=0.125 2023-11-18 05:17:27,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=75533.33333333333, ans=0.0 2023-11-18 05:17:48,717 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11350, loss[loss=0.1155, simple_loss=0.114, pruned_loss=0.04198, audio_tagging_loss=0.0165, over 15149.00 frames. ], tot_loss[loss=0.1412, simple_loss=0.143, pruned_loss=0.05662, audio_tagging_loss=0.01314, over 3051707.39 frames. ], batch size: 58, lr: 3.34e-02, grad_scale: 64.0 2023-11-18 05:17:49,125 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=15.0 2023-11-18 05:17:55,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=75666.66666666667, ans=0.1 2023-11-18 05:18:00,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=75733.33333333333, ans=0.125 2023-11-18 05:18:09,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2023-11-18 05:18:18,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=75800.0, ans=0.1 2023-11-18 05:18:26,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=75866.66666666667, ans=0.0 2023-11-18 05:18:45,302 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11400, loss[loss=0.2054, simple_loss=0.2166, pruned_loss=0.08534, audio_tagging_loss=0.01179, over 15203.00 frames. ], tot_loss[loss=0.1438, simple_loss=0.1458, pruned_loss=0.05793, audio_tagging_loss=0.01295, over 3051822.85 frames. ], batch size: 54, lr: 3.34e-02, grad_scale: 64.0 2023-11-18 05:18:52,632 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.156e+01 1.039e+02 1.156e+02 1.287e+02 1.628e+02, threshold=2.311e+02, percent-clipped=0.0 2023-11-18 05:18:53,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.76 vs. limit=22.5 2023-11-18 05:18:57,809 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:19:27,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2023-11-18 05:19:40,945 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11450, loss[loss=0.1312, simple_loss=0.134, pruned_loss=0.04802, audio_tagging_loss=0.0162, over 15175.00 frames. ], tot_loss[loss=0.144, simple_loss=0.1464, pruned_loss=0.058, audio_tagging_loss=0.01284, over 3053566.10 frames. ], batch size: 57, lr: 3.33e-02, grad_scale: 64.0 2023-11-18 05:20:23,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=76533.33333333333, ans=0.125 2023-11-18 05:20:36,080 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11500, loss[loss=0.143, simple_loss=0.1542, pruned_loss=0.05603, audio_tagging_loss=0.009905, over 15568.00 frames. ], tot_loss[loss=0.1434, simple_loss=0.1457, pruned_loss=0.05764, audio_tagging_loss=0.01294, over 3056157.81 frames. ], batch size: 58, lr: 3.33e-02, grad_scale: 64.0 2023-11-18 05:20:36,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=76666.66666666667, ans=0.025 2023-11-18 05:20:43,407 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.208e+01 1.030e+02 1.194e+02 1.379e+02 2.068e+02, threshold=2.389e+02, percent-clipped=0.0 2023-11-18 05:21:29,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.25 vs. limit=22.5 2023-11-18 05:21:31,742 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11550, loss[loss=0.1829, simple_loss=0.1856, pruned_loss=0.07811, audio_tagging_loss=0.01203, over 15522.00 frames. ], tot_loss[loss=0.1429, simple_loss=0.1457, pruned_loss=0.05716, audio_tagging_loss=0.01289, over 3055826.61 frames. ], batch size: 58, lr: 3.32e-02, grad_scale: 64.0 2023-11-18 05:21:55,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2023-11-18 05:21:58,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=77133.33333333333, ans=0.125 2023-11-18 05:22:06,039 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:22:20,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2023-11-18 05:22:28,059 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11600, loss[loss=0.1538, simple_loss=0.1528, pruned_loss=0.06666, audio_tagging_loss=0.01073, over 14954.00 frames. ], tot_loss[loss=0.1423, simple_loss=0.1456, pruned_loss=0.05678, audio_tagging_loss=0.01276, over 3053321.44 frames. ], batch size: 57, lr: 3.32e-02, grad_scale: 64.0 2023-11-18 05:22:30,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.35 vs. limit=15.0 2023-11-18 05:22:35,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.96 vs. limit=10.0 2023-11-18 05:22:35,974 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.930e+01 1.030e+02 1.201e+02 1.372e+02 2.300e+02, threshold=2.402e+02, percent-clipped=0.0 2023-11-18 05:22:38,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.22 vs. limit=22.5 2023-11-18 05:22:40,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=77400.0, ans=0.2 2023-11-18 05:22:48,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.91 vs. limit=6.0 2023-11-18 05:22:48,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=77466.66666666667, ans=0.125 2023-11-18 05:23:05,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=77533.33333333333, ans=0.125 2023-11-18 05:23:12,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=77600.0, ans=6.0 2023-11-18 05:23:14,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=77600.0, ans=0.0 2023-11-18 05:23:23,709 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11650, loss[loss=0.1156, simple_loss=0.1151, pruned_loss=0.04357, audio_tagging_loss=0.01445, over 14659.00 frames. ], tot_loss[loss=0.1416, simple_loss=0.1449, pruned_loss=0.05641, audio_tagging_loss=0.01277, over 3049344.42 frames. ], batch size: 55, lr: 3.31e-02, grad_scale: 64.0 2023-11-18 05:23:37,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=77733.33333333333, ans=0.0 2023-11-18 05:24:08,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=77933.33333333333, ans=0.2 2023-11-18 05:24:18,906 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11700, loss[loss=0.1512, simple_loss=0.1472, pruned_loss=0.06086, audio_tagging_loss=0.01672, over 15162.00 frames. ], tot_loss[loss=0.1424, simple_loss=0.1454, pruned_loss=0.05676, audio_tagging_loss=0.01292, over 3047322.74 frames. ], batch size: 55, lr: 3.31e-02, grad_scale: 64.0 2023-11-18 05:24:20,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=78000.0, ans=0.2 2023-11-18 05:24:26,789 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.375e+01 1.130e+02 1.304e+02 1.460e+02 2.076e+02, threshold=2.607e+02, percent-clipped=0.0 2023-11-18 05:24:34,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=78066.66666666667, ans=0.0 2023-11-18 05:24:38,945 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.27 vs. limit=22.5 2023-11-18 05:24:54,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=78200.0, ans=0.0 2023-11-18 05:25:00,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=78200.0, ans=0.2 2023-11-18 05:25:13,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.55 vs. limit=15.0 2023-11-18 05:25:14,873 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11750, loss[loss=0.147, simple_loss=0.142, pruned_loss=0.0587, audio_tagging_loss=0.01732, over 15343.00 frames. ], tot_loss[loss=0.1419, simple_loss=0.1447, pruned_loss=0.05656, audio_tagging_loss=0.01297, over 3050244.99 frames. ], batch size: 57, lr: 3.30e-02, grad_scale: 64.0 2023-11-18 05:25:25,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=78400.0, ans=0.2 2023-11-18 05:25:26,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=78400.0, ans=0.125 2023-11-18 05:25:43,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=78466.66666666667, ans=0.2 2023-11-18 05:25:43,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.81 vs. limit=22.5 2023-11-18 05:25:45,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=78466.66666666667, ans=0.0 2023-11-18 05:26:11,078 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11800, loss[loss=0.1512, simple_loss=0.158, pruned_loss=0.06313, audio_tagging_loss=0.009125, over 15109.00 frames. ], tot_loss[loss=0.142, simple_loss=0.1446, pruned_loss=0.05668, audio_tagging_loss=0.01299, over 3052084.03 frames. ], batch size: 58, lr: 3.30e-02, grad_scale: 32.0 2023-11-18 05:26:15,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=78666.66666666667, ans=0.09899494936611666 2023-11-18 05:26:17,614 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:26:19,541 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 1.101e+02 1.270e+02 1.502e+02 2.355e+02, threshold=2.541e+02, percent-clipped=0.0 2023-11-18 05:26:23,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=78733.33333333333, ans=0.125 2023-11-18 05:26:39,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=78800.0, ans=0.125 2023-11-18 05:27:06,415 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11850, loss[loss=0.1648, simple_loss=0.1638, pruned_loss=0.06826, audio_tagging_loss=0.01463, over 15209.00 frames. ], tot_loss[loss=0.1425, simple_loss=0.1453, pruned_loss=0.05671, audio_tagging_loss=0.01312, over 3043256.01 frames. ], batch size: 56, lr: 3.29e-02, grad_scale: 32.0 2023-11-18 05:27:16,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=79066.66666666667, ans=15.0 2023-11-18 05:27:21,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-18 05:27:31,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=15.0 2023-11-18 05:27:35,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=79133.33333333333, ans=0.125 2023-11-18 05:27:49,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2023-11-18 05:27:54,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=79266.66666666667, ans=0.0 2023-11-18 05:27:59,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=79266.66666666667, ans=0.0 2023-11-18 05:28:02,172 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11900, loss[loss=0.1263, simple_loss=0.1204, pruned_loss=0.04936, audio_tagging_loss=0.01675, over 16024.00 frames. ], tot_loss[loss=0.1418, simple_loss=0.1444, pruned_loss=0.05631, audio_tagging_loss=0.01331, over 3050194.87 frames. ], batch size: 60, lr: 3.29e-02, grad_scale: 32.0 2023-11-18 05:28:06,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.95 vs. limit=22.5 2023-11-18 05:28:07,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=79333.33333333333, ans=0.125 2023-11-18 05:28:11,746 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.709e+01 1.049e+02 1.249e+02 1.472e+02 4.248e+02, threshold=2.498e+02, percent-clipped=1.0 2023-11-18 05:28:26,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=79466.66666666667, ans=0.1 2023-11-18 05:28:30,518 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:28:56,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=79600.0, ans=0.125 2023-11-18 05:28:58,964 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 11950, loss[loss=0.1695, simple_loss=0.1665, pruned_loss=0.07526, audio_tagging_loss=0.01099, over 14211.00 frames. ], tot_loss[loss=0.1391, simple_loss=0.1413, pruned_loss=0.05493, audio_tagging_loss=0.0135, over 3050564.39 frames. ], batch size: 56, lr: 3.28e-02, grad_scale: 32.0 2023-11-18 05:29:13,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=79733.33333333333, ans=0.07 2023-11-18 05:29:17,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=79733.33333333333, ans=0.0 2023-11-18 05:29:18,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=79733.33333333333, ans=0.0 2023-11-18 05:29:21,237 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.21 vs. limit=12.0 2023-11-18 05:29:27,539 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=15.0 2023-11-18 05:29:29,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.72 vs. limit=10.0 2023-11-18 05:29:42,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=79933.33333333333, ans=0.2 2023-11-18 05:29:43,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=79933.33333333333, ans=0.125 2023-11-18 05:29:45,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=79933.33333333333, ans=0.125 2023-11-18 05:29:55,520 INFO [train_asr.py:1115] (1/4) Epoch 1, batch 12000, loss[loss=0.1822, simple_loss=0.1862, pruned_loss=0.07628, audio_tagging_loss=0.01281, over 15559.00 frames. ], tot_loss[loss=0.1405, simple_loss=0.1427, pruned_loss=0.05569, audio_tagging_loss=0.01351, over 3045484.30 frames. ], batch size: 56, lr: 3.28e-02, grad_scale: 16.0 2023-11-18 05:29:55,520 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 05:30:31,617 INFO [train_asr.py:1147] (1/4) Epoch 1, validation: loss=0.09272, simple_loss=0.07249, pruned_loss=0.01766, audio_tagging_loss=0.03882, over 4681554.00 frames. 2023-11-18 05:30:31,618 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 05:30:31,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=80000.0, ans=0.125 2023-11-18 05:30:40,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=80000.0, ans=10.0 2023-11-18 05:30:42,376 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.048e+01 1.066e+02 1.219e+02 1.451e+02 6.762e+02, threshold=2.438e+02, percent-clipped=1.0 2023-11-18 05:31:38,185 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 0, loss[loss=0.1238, simple_loss=0.108, pruned_loss=0.03702, audio_tagging_loss=0.03274, over 15669.00 frames. ], tot_loss[loss=0.1238, simple_loss=0.108, pruned_loss=0.03702, audio_tagging_loss=0.03274, over 15669.00 frames. ], batch size: 60, lr: 3.21e-02, grad_scale: 32.0 2023-11-18 05:31:38,185 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 05:32:10,419 INFO [train_asr.py:1147] (1/4) Epoch 2, validation: loss=0.09083, simple_loss=0.07252, pruned_loss=0.0178, audio_tagging_loss=0.03677, over 4681554.00 frames. 2023-11-18 05:32:10,420 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 05:32:31,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=80293.33333333333, ans=0.1 2023-11-18 05:32:35,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=80293.33333333333, ans=0.0 2023-11-18 05:33:00,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.22 vs. limit=15.0 2023-11-18 05:33:05,837 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 50, loss[loss=0.1263, simple_loss=0.1188, pruned_loss=0.04054, audio_tagging_loss=0.02638, over 14386.00 frames. ], tot_loss[loss=0.1533, simple_loss=0.1438, pruned_loss=0.0562, audio_tagging_loss=0.0252, over 690309.34 frames. ], batch size: 55, lr: 3.21e-02, grad_scale: 32.0 2023-11-18 05:33:11,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=80493.33333333333, ans=0.1 2023-11-18 05:33:23,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=80560.0, ans=0.0 2023-11-18 05:33:37,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=80626.66666666667, ans=0.125 2023-11-18 05:33:40,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.96 vs. limit=6.0 2023-11-18 05:33:46,328 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 9.563e+01 1.150e+02 1.281e+02 1.485e+02 2.294e+02, threshold=2.563e+02, percent-clipped=0.0 2023-11-18 05:33:49,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=80760.0, ans=0.125 2023-11-18 05:33:51,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=80760.0, ans=0.0 2023-11-18 05:33:58,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=80760.0, ans=0.125 2023-11-18 05:34:00,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=80826.66666666667, ans=0.125 2023-11-18 05:34:01,931 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 100, loss[loss=0.1399, simple_loss=0.1326, pruned_loss=0.05077, audio_tagging_loss=0.02284, over 14227.00 frames. ], tot_loss[loss=0.1493, simple_loss=0.1416, pruned_loss=0.05411, audio_tagging_loss=0.02434, over 1215734.68 frames. ], batch size: 54, lr: 3.20e-02, grad_scale: 32.0 2023-11-18 05:34:27,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=80960.0, ans=0.1 2023-11-18 05:34:30,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=80960.0, ans=0.0 2023-11-18 05:34:47,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=81093.33333333333, ans=15.0 2023-11-18 05:34:51,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=81093.33333333333, ans=0.125 2023-11-18 05:34:55,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=81093.33333333333, ans=0.1 2023-11-18 05:34:56,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=12.0 2023-11-18 05:34:57,985 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 150, loss[loss=0.17, simple_loss=0.1694, pruned_loss=0.07054, audio_tagging_loss=0.01473, over 16142.00 frames. ], tot_loss[loss=0.1459, simple_loss=0.1405, pruned_loss=0.05408, audio_tagging_loss=0.02158, over 1615939.95 frames. ], batch size: 60, lr: 3.20e-02, grad_scale: 32.0 2023-11-18 05:35:16,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=81226.66666666667, ans=0.125 2023-11-18 05:35:20,647 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.135e+00 2023-11-18 05:35:24,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=81293.33333333333, ans=0.125 2023-11-18 05:35:38,723 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.932e+01 1.103e+02 1.211e+02 1.385e+02 1.770e+02, threshold=2.422e+02, percent-clipped=0.0 2023-11-18 05:35:40,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=81360.0, ans=10.0 2023-11-18 05:35:47,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.89 vs. limit=22.5 2023-11-18 05:35:54,754 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 200, loss[loss=0.1478, simple_loss=0.1491, pruned_loss=0.06255, audio_tagging_loss=0.01069, over 16661.00 frames. ], tot_loss[loss=0.1456, simple_loss=0.1433, pruned_loss=0.05503, audio_tagging_loss=0.01892, over 1936823.69 frames. ], batch size: 62, lr: 3.19e-02, grad_scale: 32.0 2023-11-18 05:36:42,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=81760.0, ans=0.125 2023-11-18 05:36:51,495 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 250, loss[loss=0.1438, simple_loss=0.1457, pruned_loss=0.05461, audio_tagging_loss=0.0163, over 14500.00 frames. ], tot_loss[loss=0.1456, simple_loss=0.1447, pruned_loss=0.05613, audio_tagging_loss=0.01713, over 2176948.71 frames. ], batch size: 55, lr: 3.19e-02, grad_scale: 32.0 2023-11-18 05:37:02,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=81893.33333333333, ans=0.125 2023-11-18 05:37:03,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=81893.33333333333, ans=0.125 2023-11-18 05:37:13,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=81960.0, ans=0.125 2023-11-18 05:37:24,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=82026.66666666667, ans=0.125 2023-11-18 05:37:27,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=82026.66666666667, ans=0.09899494936611666 2023-11-18 05:37:31,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 1.100e+02 1.268e+02 1.445e+02 2.035e+02, threshold=2.536e+02, percent-clipped=0.0 2023-11-18 05:37:34,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=82026.66666666667, ans=0.0 2023-11-18 05:37:44,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2023-11-18 05:37:47,877 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 300, loss[loss=0.1414, simple_loss=0.1423, pruned_loss=0.05616, audio_tagging_loss=0.01415, over 15192.00 frames. ], tot_loss[loss=0.1435, simple_loss=0.1435, pruned_loss=0.05575, audio_tagging_loss=0.01597, over 2366709.46 frames. ], batch size: 56, lr: 3.18e-02, grad_scale: 32.0 2023-11-18 05:37:50,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2023-11-18 05:38:19,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=82293.33333333333, ans=0.2 2023-11-18 05:38:43,908 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 350, loss[loss=0.1136, simple_loss=0.126, pruned_loss=0.03991, audio_tagging_loss=0.01075, over 14408.00 frames. ], tot_loss[loss=0.1412, simple_loss=0.1426, pruned_loss=0.05498, audio_tagging_loss=0.01491, over 2524120.99 frames. ], batch size: 57, lr: 3.18e-02, grad_scale: 32.0 2023-11-18 05:38:47,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=82493.33333333333, ans=0.125 2023-11-18 05:39:08,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.54 vs. limit=12.0 2023-11-18 05:39:17,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=82693.33333333333, ans=0.125 2023-11-18 05:39:24,851 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 1.093e+02 1.219e+02 1.382e+02 1.971e+02, threshold=2.439e+02, percent-clipped=0.0 2023-11-18 05:39:40,366 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 400, loss[loss=0.1321, simple_loss=0.1304, pruned_loss=0.05482, audio_tagging_loss=0.01211, over 14858.00 frames. ], tot_loss[loss=0.1393, simple_loss=0.1411, pruned_loss=0.05425, audio_tagging_loss=0.01445, over 2644659.69 frames. ], batch size: 55, lr: 3.17e-02, grad_scale: 32.0 2023-11-18 05:39:40,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=82826.66666666667, ans=0.0 2023-11-18 05:39:53,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=82893.33333333333, ans=0.09899494936611666 2023-11-18 05:40:03,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=82960.0, ans=0.125 2023-11-18 05:40:20,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=83026.66666666667, ans=0.1 2023-11-18 05:40:33,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=83093.33333333333, ans=0.0 2023-11-18 05:40:34,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=83093.33333333333, ans=0.0 2023-11-18 05:40:36,427 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 450, loss[loss=0.1427, simple_loss=0.1593, pruned_loss=0.05238, audio_tagging_loss=0.01072, over 15786.00 frames. ], tot_loss[loss=0.1394, simple_loss=0.1419, pruned_loss=0.05451, audio_tagging_loss=0.01392, over 2737168.88 frames. ], batch size: 58, lr: 3.17e-02, grad_scale: 32.0 2023-11-18 05:40:43,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=83160.0, ans=0.125 2023-11-18 05:40:43,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=83160.0, ans=0.125 2023-11-18 05:40:51,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=83226.66666666667, ans=0.125 2023-11-18 05:40:51,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=83226.66666666667, ans=0.125 2023-11-18 05:41:04,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=83293.33333333333, ans=0.1 2023-11-18 05:41:08,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=83360.0, ans=0.0 2023-11-18 05:41:16,692 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.351e+01 1.050e+02 1.181e+02 1.351e+02 2.147e+02, threshold=2.363e+02, percent-clipped=0.0 2023-11-18 05:41:23,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=83426.66666666667, ans=0.125 2023-11-18 05:41:32,201 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 500, loss[loss=0.1297, simple_loss=0.137, pruned_loss=0.05019, audio_tagging_loss=0.01096, over 14590.00 frames. ], tot_loss[loss=0.1385, simple_loss=0.141, pruned_loss=0.05419, audio_tagging_loss=0.01381, over 2804026.93 frames. ], batch size: 57, lr: 3.16e-02, grad_scale: 32.0 2023-11-18 05:41:33,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=83493.33333333333, ans=0.125 2023-11-18 05:41:34,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.90 vs. limit=15.0 2023-11-18 05:42:04,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=83693.33333333333, ans=0.125 2023-11-18 05:42:13,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=83693.33333333333, ans=0.125 2023-11-18 05:42:21,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=83760.0, ans=0.125 2023-11-18 05:42:27,860 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 550, loss[loss=0.1727, simple_loss=0.1739, pruned_loss=0.07189, audio_tagging_loss=0.01387, over 14592.00 frames. ], tot_loss[loss=0.1388, simple_loss=0.1417, pruned_loss=0.05445, audio_tagging_loss=0.01354, over 2852279.98 frames. ], batch size: 54, lr: 3.16e-02, grad_scale: 32.0 2023-11-18 05:42:48,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=83893.33333333333, ans=0.0 2023-11-18 05:42:58,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=83960.0, ans=0.125 2023-11-18 05:43:08,664 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 9.035e+01 1.150e+02 1.343e+02 1.676e+02 2.273e+02, threshold=2.687e+02, percent-clipped=0.0 2023-11-18 05:43:15,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=84093.33333333333, ans=0.0 2023-11-18 05:43:24,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=84160.0, ans=0.0 2023-11-18 05:43:25,082 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 600, loss[loss=0.179, simple_loss=0.1879, pruned_loss=0.07478, audio_tagging_loss=0.01033, over 15092.00 frames. ], tot_loss[loss=0.1391, simple_loss=0.1422, pruned_loss=0.05456, audio_tagging_loss=0.01341, over 2890615.56 frames. ], batch size: 58, lr: 3.15e-02, grad_scale: 32.0 2023-11-18 05:43:25,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=84160.0, ans=0.125 2023-11-18 05:43:25,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=84160.0, ans=0.0 2023-11-18 05:43:36,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=84226.66666666667, ans=0.05 2023-11-18 05:43:49,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=84293.33333333333, ans=0.1 2023-11-18 05:44:01,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2023-11-18 05:44:02,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=84360.0, ans=0.09899494936611666 2023-11-18 05:44:07,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.49 vs. limit=15.0 2023-11-18 05:44:08,708 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.296e+00 2023-11-18 05:44:14,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=84426.66666666667, ans=22.5 2023-11-18 05:44:21,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=84493.33333333333, ans=0.5 2023-11-18 05:44:21,969 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 650, loss[loss=0.1206, simple_loss=0.1134, pruned_loss=0.04959, audio_tagging_loss=0.01431, over 14488.00 frames. ], tot_loss[loss=0.1386, simple_loss=0.1418, pruned_loss=0.05441, audio_tagging_loss=0.01333, over 2921719.17 frames. ], batch size: 54, lr: 3.15e-02, grad_scale: 32.0 2023-11-18 05:44:53,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=84626.66666666667, ans=0.125 2023-11-18 05:44:59,297 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2023-11-18 05:45:01,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=84693.33333333333, ans=0.0 2023-11-18 05:45:02,892 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.174e+01 1.067e+02 1.188e+02 1.445e+02 2.872e+02, threshold=2.375e+02, percent-clipped=1.0 2023-11-18 05:45:11,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=84760.0, ans=0.0 2023-11-18 05:45:13,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=84760.0, ans=0.1 2023-11-18 05:45:17,840 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 700, loss[loss=0.158, simple_loss=0.169, pruned_loss=0.06054, audio_tagging_loss=0.01295, over 15999.00 frames. ], tot_loss[loss=0.1398, simple_loss=0.1431, pruned_loss=0.05491, audio_tagging_loss=0.01332, over 2949395.85 frames. ], batch size: 57, lr: 3.14e-02, grad_scale: 32.0 2023-11-18 05:45:21,543 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2023-11-18 05:45:24,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=84826.66666666667, ans=0.0 2023-11-18 05:45:51,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.61 vs. limit=22.5 2023-11-18 05:46:04,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=85093.33333333333, ans=0.1 2023-11-18 05:46:10,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.46 vs. limit=15.0 2023-11-18 05:46:15,219 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 750, loss[loss=0.1224, simple_loss=0.1296, pruned_loss=0.04487, audio_tagging_loss=0.01277, over 15536.00 frames. ], tot_loss[loss=0.14, simple_loss=0.1435, pruned_loss=0.05503, audio_tagging_loss=0.01322, over 2974005.09 frames. ], batch size: 57, lr: 3.14e-02, grad_scale: 32.0 2023-11-18 05:46:28,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=85226.66666666667, ans=0.125 2023-11-18 05:46:36,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-11-18 05:46:42,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=85293.33333333333, ans=0.125 2023-11-18 05:46:48,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.17 vs. limit=15.0 2023-11-18 05:46:56,125 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.449e+01 1.066e+02 1.181e+02 1.360e+02 2.052e+02, threshold=2.361e+02, percent-clipped=0.0 2023-11-18 05:47:09,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=85426.66666666667, ans=0.125 2023-11-18 05:47:11,527 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 800, loss[loss=0.1323, simple_loss=0.136, pruned_loss=0.05114, audio_tagging_loss=0.01321, over 14975.00 frames. ], tot_loss[loss=0.1402, simple_loss=0.1438, pruned_loss=0.05511, audio_tagging_loss=0.01323, over 2993446.56 frames. ], batch size: 56, lr: 3.14e-02, grad_scale: 32.0 2023-11-18 05:47:14,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=85493.33333333333, ans=0.2 2023-11-18 05:47:18,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=85493.33333333333, ans=0.125 2023-11-18 05:47:22,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=85560.0, ans=0.125 2023-11-18 05:47:54,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=85693.33333333333, ans=0.125 2023-11-18 05:48:07,702 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 850, loss[loss=0.1308, simple_loss=0.1395, pruned_loss=0.04759, audio_tagging_loss=0.01344, over 15634.00 frames. ], tot_loss[loss=0.1394, simple_loss=0.1429, pruned_loss=0.05467, audio_tagging_loss=0.01328, over 3004708.17 frames. ], batch size: 56, lr: 3.13e-02, grad_scale: 32.0 2023-11-18 05:48:08,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=85826.66666666667, ans=0.125 2023-11-18 05:48:11,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=85826.66666666667, ans=0.0 2023-11-18 05:48:17,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=85826.66666666667, ans=0.2 2023-11-18 05:48:23,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.83 vs. limit=10.0 2023-11-18 05:48:30,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.88 vs. limit=22.5 2023-11-18 05:48:32,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=85960.0, ans=0.0 2023-11-18 05:48:48,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.063e+01 1.087e+02 1.227e+02 1.407e+02 2.790e+02, threshold=2.454e+02, percent-clipped=1.0 2023-11-18 05:48:56,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=86093.33333333333, ans=0.0 2023-11-18 05:49:05,236 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 900, loss[loss=0.1332, simple_loss=0.1486, pruned_loss=0.04874, audio_tagging_loss=0.0101, over 15190.00 frames. ], tot_loss[loss=0.139, simple_loss=0.1423, pruned_loss=0.05454, audio_tagging_loss=0.01331, over 3016126.63 frames. ], batch size: 54, lr: 3.13e-02, grad_scale: 32.0 2023-11-18 05:49:09,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.30 vs. limit=15.0 2023-11-18 05:49:18,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-11-18 05:49:33,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=86293.33333333333, ans=0.0 2023-11-18 05:49:46,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=86360.0, ans=0.1 2023-11-18 05:50:01,369 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 950, loss[loss=0.1296, simple_loss=0.1543, pruned_loss=0.04254, audio_tagging_loss=0.009952, over 16193.00 frames. ], tot_loss[loss=0.1393, simple_loss=0.1432, pruned_loss=0.05471, audio_tagging_loss=0.01303, over 3027285.01 frames. ], batch size: 57, lr: 3.12e-02, grad_scale: 32.0 2023-11-18 05:50:12,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2023-11-18 05:50:42,201 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.269e+01 1.077e+02 1.200e+02 1.388e+02 2.127e+02, threshold=2.401e+02, percent-clipped=0.0 2023-11-18 05:50:47,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=86760.0, ans=0.0 2023-11-18 05:50:47,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=86760.0, ans=0.125 2023-11-18 05:50:57,283 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1000, loss[loss=0.0872, simple_loss=0.08058, pruned_loss=0.03073, audio_tagging_loss=0.01618, over 15291.00 frames. ], tot_loss[loss=0.1388, simple_loss=0.1426, pruned_loss=0.05442, audio_tagging_loss=0.01305, over 3025680.94 frames. ], batch size: 59, lr: 3.12e-02, grad_scale: 32.0 2023-11-18 05:50:59,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=86826.66666666667, ans=0.0 2023-11-18 05:51:08,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=86893.33333333333, ans=0.0 2023-11-18 05:51:11,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=86893.33333333333, ans=0.125 2023-11-18 05:51:12,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.30 vs. limit=10.0 2023-11-18 05:51:17,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=86893.33333333333, ans=0.125 2023-11-18 05:51:21,409 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:51:37,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=87026.66666666667, ans=10.0 2023-11-18 05:51:53,423 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1050, loss[loss=0.1109, simple_loss=0.105, pruned_loss=0.03767, audio_tagging_loss=0.02068, over 16157.00 frames. ], tot_loss[loss=0.1368, simple_loss=0.1406, pruned_loss=0.05357, audio_tagging_loss=0.01296, over 3028013.77 frames. ], batch size: 63, lr: 3.11e-02, grad_scale: 32.0 2023-11-18 05:51:56,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=87160.0, ans=0.125 2023-11-18 05:52:06,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=87226.66666666667, ans=0.125 2023-11-18 05:52:09,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=87226.66666666667, ans=0.125 2023-11-18 05:52:14,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=87226.66666666667, ans=0.07 2023-11-18 05:52:20,107 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=87293.33333333333, ans=0.05 2023-11-18 05:52:28,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.87 vs. limit=15.0 2023-11-18 05:52:34,265 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.017e+01 1.050e+02 1.244e+02 1.396e+02 2.108e+02, threshold=2.488e+02, percent-clipped=0.0 2023-11-18 05:52:45,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=87426.66666666667, ans=0.125 2023-11-18 05:52:50,691 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1100, loss[loss=0.1471, simple_loss=0.1546, pruned_loss=0.05943, audio_tagging_loss=0.01032, over 15623.00 frames. ], tot_loss[loss=0.1369, simple_loss=0.1406, pruned_loss=0.05373, audio_tagging_loss=0.01286, over 3035286.05 frames. ], batch size: 57, lr: 3.11e-02, grad_scale: 32.0 2023-11-18 05:52:52,873 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:53:28,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=87693.33333333333, ans=0.2 2023-11-18 05:53:45,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=87826.66666666667, ans=0.125 2023-11-18 05:53:46,906 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1150, loss[loss=0.1847, simple_loss=0.1991, pruned_loss=0.07473, audio_tagging_loss=0.01044, over 16356.00 frames. ], tot_loss[loss=0.1361, simple_loss=0.1401, pruned_loss=0.05334, audio_tagging_loss=0.01276, over 3035232.77 frames. ], batch size: 56, lr: 3.10e-02, grad_scale: 32.0 2023-11-18 05:54:15,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.11 vs. limit=10.0 2023-11-18 05:54:28,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 1.025e+02 1.107e+02 1.275e+02 1.816e+02, threshold=2.214e+02, percent-clipped=0.0 2023-11-18 05:54:36,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=22.5 2023-11-18 05:54:43,951 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1200, loss[loss=0.1541, simple_loss=0.1629, pruned_loss=0.06039, audio_tagging_loss=0.01229, over 14509.00 frames. ], tot_loss[loss=0.1372, simple_loss=0.1411, pruned_loss=0.0539, audio_tagging_loss=0.01278, over 3032365.91 frames. ], batch size: 54, lr: 3.10e-02, grad_scale: 32.0 2023-11-18 05:54:49,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=88160.0, ans=0.1 2023-11-18 05:54:51,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=88160.0, ans=0.0 2023-11-18 05:55:21,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=88360.0, ans=0.125 2023-11-18 05:55:30,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=88426.66666666667, ans=0.2 2023-11-18 05:55:40,046 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1250, loss[loss=0.1314, simple_loss=0.1366, pruned_loss=0.04767, audio_tagging_loss=0.0154, over 14380.00 frames. ], tot_loss[loss=0.1375, simple_loss=0.1409, pruned_loss=0.05426, audio_tagging_loss=0.01282, over 3028699.13 frames. ], batch size: 53, lr: 3.09e-02, grad_scale: 32.0 2023-11-18 05:55:46,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=88493.33333333333, ans=0.04949747468305833 2023-11-18 05:55:47,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=88493.33333333333, ans=0.1 2023-11-18 05:55:54,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=88560.0, ans=0.1 2023-11-18 05:55:56,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=88560.0, ans=0.2 2023-11-18 05:56:03,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=15.0 2023-11-18 05:56:11,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=88626.66666666667, ans=0.95 2023-11-18 05:56:18,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2023-11-18 05:56:19,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=88693.33333333333, ans=0.125 2023-11-18 05:56:20,666 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.599e+01 1.015e+02 1.167e+02 1.344e+02 2.286e+02, threshold=2.335e+02, percent-clipped=1.0 2023-11-18 05:56:23,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2023-11-18 05:56:25,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=88760.0, ans=10.0 2023-11-18 05:56:25,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88760.0, ans=0.1 2023-11-18 05:56:30,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=88760.0, ans=0.125 2023-11-18 05:56:36,653 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1300, loss[loss=0.1306, simple_loss=0.1396, pruned_loss=0.0482, audio_tagging_loss=0.01258, over 15331.00 frames. ], tot_loss[loss=0.1364, simple_loss=0.1396, pruned_loss=0.05366, audio_tagging_loss=0.0129, over 3030495.29 frames. ], batch size: 58, lr: 3.09e-02, grad_scale: 32.0 2023-11-18 05:56:47,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=88893.33333333333, ans=0.125 2023-11-18 05:57:33,128 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1350, loss[loss=0.1277, simple_loss=0.128, pruned_loss=0.04765, audio_tagging_loss=0.01608, over 15086.00 frames. ], tot_loss[loss=0.1374, simple_loss=0.1404, pruned_loss=0.0542, audio_tagging_loss=0.01296, over 3031974.30 frames. ], batch size: 61, lr: 3.09e-02, grad_scale: 32.0 2023-11-18 05:57:34,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=89160.0, ans=0.125 2023-11-18 05:57:47,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=89226.66666666667, ans=0.1 2023-11-18 05:57:50,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=89226.66666666667, ans=0.0 2023-11-18 05:57:53,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=89226.66666666667, ans=0.0 2023-11-18 05:58:11,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=89360.0, ans=0.2 2023-11-18 05:58:14,277 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 1.078e+02 1.206e+02 1.341e+02 1.953e+02, threshold=2.412e+02, percent-clipped=0.0 2023-11-18 05:58:14,324 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:58:29,863 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1400, loss[loss=0.1598, simple_loss=0.1696, pruned_loss=0.06371, audio_tagging_loss=0.01129, over 15542.00 frames. ], tot_loss[loss=0.1372, simple_loss=0.1405, pruned_loss=0.05399, audio_tagging_loss=0.01295, over 3033380.79 frames. ], batch size: 56, lr: 3.08e-02, grad_scale: 32.0 2023-11-18 05:58:37,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=89493.33333333333, ans=0.125 2023-11-18 05:58:52,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2023-11-18 05:58:56,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=89626.66666666667, ans=0.0 2023-11-18 05:58:59,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=89626.66666666667, ans=0.0 2023-11-18 05:59:03,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=89693.33333333333, ans=0.0 2023-11-18 05:59:27,054 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1450, loss[loss=0.1558, simple_loss=0.1558, pruned_loss=0.06286, audio_tagging_loss=0.01506, over 14854.00 frames. ], tot_loss[loss=0.1366, simple_loss=0.1399, pruned_loss=0.05357, audio_tagging_loss=0.01306, over 3027929.87 frames. ], batch size: 57, lr: 3.08e-02, grad_scale: 32.0 2023-11-18 05:59:27,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=89826.66666666667, ans=0.0 2023-11-18 05:59:37,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.69 vs. limit=22.5 2023-11-18 05:59:48,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=89960.0, ans=0.0 2023-11-18 05:59:54,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=89960.0, ans=0.125 2023-11-18 05:59:56,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=89960.0, ans=0.125 2023-11-18 06:00:07,186 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.834e+01 1.064e+02 1.199e+02 1.327e+02 1.919e+02, threshold=2.398e+02, percent-clipped=0.0 2023-11-18 06:00:08,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=90026.66666666667, ans=0.125 2023-11-18 06:00:08,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=90026.66666666667, ans=0.125 2023-11-18 06:00:18,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=90093.33333333333, ans=0.125 2023-11-18 06:00:21,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=90160.0, ans=0.125 2023-11-18 06:00:23,236 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1500, loss[loss=0.145, simple_loss=0.1409, pruned_loss=0.05573, audio_tagging_loss=0.01886, over 13905.00 frames. ], tot_loss[loss=0.1363, simple_loss=0.1396, pruned_loss=0.05333, audio_tagging_loss=0.01317, over 3026139.44 frames. ], batch size: 55, lr: 3.07e-02, grad_scale: 32.0 2023-11-18 06:00:34,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=90226.66666666667, ans=0.125 2023-11-18 06:00:36,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=90226.66666666667, ans=0.0 2023-11-18 06:00:45,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=90293.33333333333, ans=0.125 2023-11-18 06:01:02,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.00 vs. limit=15.0 2023-11-18 06:01:08,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=90426.66666666667, ans=0.125 2023-11-18 06:01:11,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=90426.66666666667, ans=0.1 2023-11-18 06:01:19,521 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1550, loss[loss=0.1393, simple_loss=0.1402, pruned_loss=0.05142, audio_tagging_loss=0.01782, over 16402.00 frames. ], tot_loss[loss=0.1385, simple_loss=0.1417, pruned_loss=0.05434, audio_tagging_loss=0.0133, over 3033689.34 frames. ], batch size: 60, lr: 3.07e-02, grad_scale: 32.0 2023-11-18 06:01:24,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.19 vs. limit=22.5 2023-11-18 06:01:35,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=90560.0, ans=0.125 2023-11-18 06:01:38,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=90560.0, ans=0.0 2023-11-18 06:01:42,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90626.66666666667, ans=0.1 2023-11-18 06:01:42,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2023-11-18 06:01:52,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=90693.33333333333, ans=0.125 2023-11-18 06:01:54,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=90693.33333333333, ans=0.125 2023-11-18 06:02:00,209 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 1.048e+02 1.182e+02 1.332e+02 1.868e+02, threshold=2.363e+02, percent-clipped=0.0 2023-11-18 06:02:04,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.68 vs. limit=15.0 2023-11-18 06:02:05,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=90760.0, ans=0.2 2023-11-18 06:02:07,807 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.52 vs. limit=22.5 2023-11-18 06:02:15,851 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1600, loss[loss=0.1094, simple_loss=0.1109, pruned_loss=0.04032, audio_tagging_loss=0.01363, over 15556.00 frames. ], tot_loss[loss=0.1389, simple_loss=0.1421, pruned_loss=0.05453, audio_tagging_loss=0.01327, over 3042385.43 frames. ], batch size: 58, lr: 3.06e-02, grad_scale: 32.0 2023-11-18 06:02:19,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=90826.66666666667, ans=0.2 2023-11-18 06:02:24,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=90826.66666666667, ans=0.125 2023-11-18 06:02:24,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.31 vs. limit=6.0 2023-11-18 06:02:27,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=90893.33333333333, ans=0.1 2023-11-18 06:02:35,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=90893.33333333333, ans=0.125 2023-11-18 06:02:37,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.29 vs. limit=22.5 2023-11-18 06:02:47,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=90960.0, ans=0.125 2023-11-18 06:02:54,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=22.5 2023-11-18 06:03:00,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=91093.33333333333, ans=0.125 2023-11-18 06:03:11,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=91160.0, ans=0.0 2023-11-18 06:03:12,235 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1650, loss[loss=0.1345, simple_loss=0.12, pruned_loss=0.05729, audio_tagging_loss=0.01724, over 16213.00 frames. ], tot_loss[loss=0.1372, simple_loss=0.1406, pruned_loss=0.05357, audio_tagging_loss=0.01329, over 3048866.61 frames. ], batch size: 62, lr: 3.06e-02, grad_scale: 32.0 2023-11-18 06:03:14,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=91160.0, ans=0.015 2023-11-18 06:03:15,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=91160.0, ans=0.04949747468305833 2023-11-18 06:03:38,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=91293.33333333333, ans=0.0 2023-11-18 06:03:45,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.80 vs. limit=15.0 2023-11-18 06:03:53,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.238e+01 1.052e+02 1.201e+02 1.408e+02 1.916e+02, threshold=2.401e+02, percent-clipped=0.0 2023-11-18 06:04:06,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-11-18 06:04:09,192 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1700, loss[loss=0.1572, simple_loss=0.1675, pruned_loss=0.06173, audio_tagging_loss=0.01176, over 15098.00 frames. ], tot_loss[loss=0.137, simple_loss=0.1404, pruned_loss=0.05355, audio_tagging_loss=0.01326, over 3046836.56 frames. ], batch size: 57, lr: 3.06e-02, grad_scale: 32.0 2023-11-18 06:04:39,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2023-11-18 06:05:06,098 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1750, loss[loss=0.1676, simple_loss=0.1882, pruned_loss=0.06434, audio_tagging_loss=0.009173, over 15391.00 frames. ], tot_loss[loss=0.1371, simple_loss=0.141, pruned_loss=0.05356, audio_tagging_loss=0.01305, over 3056158.63 frames. ], batch size: 56, lr: 3.05e-02, grad_scale: 32.0 2023-11-18 06:05:10,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=91826.66666666667, ans=0.125 2023-11-18 06:05:18,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=91893.33333333333, ans=0.0 2023-11-18 06:05:19,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=91893.33333333333, ans=0.125 2023-11-18 06:05:26,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=91893.33333333333, ans=0.0 2023-11-18 06:05:36,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2023-11-18 06:05:47,231 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.450e+01 1.111e+02 1.237e+02 1.383e+02 2.082e+02, threshold=2.473e+02, percent-clipped=0.0 2023-11-18 06:05:54,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=92093.33333333333, ans=0.1 2023-11-18 06:06:02,398 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1800, loss[loss=0.1517, simple_loss=0.1548, pruned_loss=0.06048, audio_tagging_loss=0.01379, over 15249.00 frames. ], tot_loss[loss=0.1369, simple_loss=0.141, pruned_loss=0.05341, audio_tagging_loss=0.01296, over 3054954.59 frames. ], batch size: 57, lr: 3.05e-02, grad_scale: 32.0 2023-11-18 06:06:09,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=92160.0, ans=0.0 2023-11-18 06:06:17,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=92226.66666666667, ans=0.0 2023-11-18 06:06:24,457 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-11-18 06:06:29,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=92293.33333333333, ans=0.0 2023-11-18 06:06:32,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=92293.33333333333, ans=0.0 2023-11-18 06:06:38,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.30 vs. limit=22.5 2023-11-18 06:06:58,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=92426.66666666667, ans=0.125 2023-11-18 06:06:59,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=92493.33333333333, ans=0.125 2023-11-18 06:06:59,992 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1850, loss[loss=0.1403, simple_loss=0.1372, pruned_loss=0.0577, audio_tagging_loss=0.014, over 14252.00 frames. ], tot_loss[loss=0.1354, simple_loss=0.1395, pruned_loss=0.05275, audio_tagging_loss=0.01293, over 3049598.86 frames. ], batch size: 53, lr: 3.04e-02, grad_scale: 32.0 2023-11-18 06:07:04,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=92493.33333333333, ans=0.2 2023-11-18 06:07:23,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=92626.66666666667, ans=0.07 2023-11-18 06:07:31,116 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=12.0 2023-11-18 06:07:35,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=92693.33333333333, ans=0.2 2023-11-18 06:07:40,256 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 1.045e+02 1.179e+02 1.336e+02 1.806e+02, threshold=2.358e+02, percent-clipped=0.0 2023-11-18 06:07:55,804 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1900, loss[loss=0.1377, simple_loss=0.1477, pruned_loss=0.05276, audio_tagging_loss=0.01106, over 15492.00 frames. ], tot_loss[loss=0.1361, simple_loss=0.1407, pruned_loss=0.05305, audio_tagging_loss=0.01271, over 3056783.55 frames. ], batch size: 57, lr: 3.04e-02, grad_scale: 32.0 2023-11-18 06:08:51,654 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 1950, loss[loss=0.1032, simple_loss=0.1028, pruned_loss=0.03743, audio_tagging_loss=0.01436, over 14438.00 frames. ], tot_loss[loss=0.1361, simple_loss=0.1405, pruned_loss=0.05295, audio_tagging_loss=0.01287, over 3055429.18 frames. ], batch size: 56, lr: 3.03e-02, grad_scale: 32.0 2023-11-18 06:09:00,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.96 vs. limit=6.0 2023-11-18 06:09:07,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=93226.66666666667, ans=0.125 2023-11-18 06:09:18,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.25 vs. limit=15.0 2023-11-18 06:09:27,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=12.0 2023-11-18 06:09:32,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=93360.0, ans=0.0 2023-11-18 06:09:32,959 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 1.040e+02 1.152e+02 1.328e+02 1.978e+02, threshold=2.303e+02, percent-clipped=0.0 2023-11-18 06:09:34,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93360.0, ans=0.1 2023-11-18 06:09:46,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=93426.66666666667, ans=0.125 2023-11-18 06:09:48,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=93493.33333333333, ans=0.07 2023-11-18 06:09:49,681 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2000, loss[loss=0.1212, simple_loss=0.1271, pruned_loss=0.04438, audio_tagging_loss=0.01327, over 14736.00 frames. ], tot_loss[loss=0.1355, simple_loss=0.1397, pruned_loss=0.05278, audio_tagging_loss=0.01289, over 3051578.19 frames. ], batch size: 55, lr: 3.03e-02, grad_scale: 64.0 2023-11-18 06:09:53,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=93493.33333333333, ans=10.0 2023-11-18 06:10:00,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=93560.0, ans=0.0 2023-11-18 06:10:17,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.04 vs. limit=5.0 2023-11-18 06:10:45,929 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2050, loss[loss=0.1378, simple_loss=0.1493, pruned_loss=0.05334, audio_tagging_loss=0.009842, over 15688.00 frames. ], tot_loss[loss=0.1371, simple_loss=0.1417, pruned_loss=0.05348, audio_tagging_loss=0.01277, over 3046947.24 frames. ], batch size: 59, lr: 3.03e-02, grad_scale: 64.0 2023-11-18 06:11:04,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=93893.33333333333, ans=0.125 2023-11-18 06:11:11,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.44 vs. limit=22.5 2023-11-18 06:11:26,280 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.950e+01 1.053e+02 1.201e+02 1.345e+02 1.920e+02, threshold=2.401e+02, percent-clipped=0.0 2023-11-18 06:11:34,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2023-11-18 06:11:41,152 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2100, loss[loss=0.1159, simple_loss=0.1147, pruned_loss=0.04297, audio_tagging_loss=0.01551, over 15138.00 frames. ], tot_loss[loss=0.1363, simple_loss=0.1408, pruned_loss=0.05314, audio_tagging_loss=0.01282, over 3048252.52 frames. ], batch size: 55, lr: 3.02e-02, grad_scale: 64.0 2023-11-18 06:11:56,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=94226.66666666667, ans=0.125 2023-11-18 06:11:56,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=94226.66666666667, ans=22.5 2023-11-18 06:12:28,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=94426.66666666667, ans=0.125 2023-11-18 06:12:28,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=94426.66666666667, ans=0.2 2023-11-18 06:12:36,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.52 vs. limit=22.5 2023-11-18 06:12:37,005 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2150, loss[loss=0.09553, simple_loss=0.0918, pruned_loss=0.03337, audio_tagging_loss=0.01627, over 14707.00 frames. ], tot_loss[loss=0.1366, simple_loss=0.141, pruned_loss=0.05326, audio_tagging_loss=0.01286, over 3048032.83 frames. ], batch size: 56, lr: 3.02e-02, grad_scale: 64.0 2023-11-18 06:12:38,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94493.33333333333, ans=0.1 2023-11-18 06:12:47,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.41 vs. limit=15.0 2023-11-18 06:12:52,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2023-11-18 06:13:10,132 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:13:15,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=94693.33333333333, ans=15.0 2023-11-18 06:13:18,104 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.172e+01 1.043e+02 1.205e+02 1.372e+02 2.009e+02, threshold=2.410e+02, percent-clipped=0.0 2023-11-18 06:13:22,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=94760.0, ans=0.125 2023-11-18 06:13:28,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2023-11-18 06:13:30,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=94760.0, ans=0.125 2023-11-18 06:13:31,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=94760.0, ans=0.2 2023-11-18 06:13:32,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=94760.0, ans=0.2 2023-11-18 06:13:34,782 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2200, loss[loss=0.1263, simple_loss=0.1353, pruned_loss=0.04705, audio_tagging_loss=0.01162, over 14616.00 frames. ], tot_loss[loss=0.1361, simple_loss=0.1403, pruned_loss=0.05309, audio_tagging_loss=0.01283, over 3043653.46 frames. ], batch size: 56, lr: 3.01e-02, grad_scale: 64.0 2023-11-18 06:13:41,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=94826.66666666667, ans=0.125 2023-11-18 06:13:41,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=94826.66666666667, ans=0.125 2023-11-18 06:13:47,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=94893.33333333333, ans=0.2 2023-11-18 06:13:52,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=94893.33333333333, ans=0.125 2023-11-18 06:13:57,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.77 vs. limit=22.5 2023-11-18 06:13:58,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.42 vs. limit=15.0 2023-11-18 06:14:16,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=95026.66666666667, ans=0.2 2023-11-18 06:14:18,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=95093.33333333333, ans=0.2 2023-11-18 06:14:30,425 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2250, loss[loss=0.1102, simple_loss=0.1038, pruned_loss=0.04335, audio_tagging_loss=0.01491, over 14726.00 frames. ], tot_loss[loss=0.1362, simple_loss=0.1401, pruned_loss=0.05325, audio_tagging_loss=0.01289, over 3042951.32 frames. ], batch size: 56, lr: 3.01e-02, grad_scale: 32.0 2023-11-18 06:14:44,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=95226.66666666667, ans=0.1 2023-11-18 06:14:46,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=95226.66666666667, ans=0.2 2023-11-18 06:15:06,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=95360.0, ans=0.04949747468305833 2023-11-18 06:15:12,290 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.736e+01 1.062e+02 1.230e+02 1.401e+02 2.481e+02, threshold=2.461e+02, percent-clipped=1.0 2023-11-18 06:15:17,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=95426.66666666667, ans=0.2 2023-11-18 06:15:26,914 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2300, loss[loss=0.1525, simple_loss=0.1613, pruned_loss=0.05862, audio_tagging_loss=0.01327, over 15246.00 frames. ], tot_loss[loss=0.1362, simple_loss=0.1402, pruned_loss=0.05318, audio_tagging_loss=0.01294, over 3039103.05 frames. ], batch size: 56, lr: 3.01e-02, grad_scale: 32.0 2023-11-18 06:15:43,344 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:15:53,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=95626.66666666667, ans=0.125 2023-11-18 06:16:03,865 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:16:07,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=95693.33333333333, ans=0.0 2023-11-18 06:16:07,706 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:16:12,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=95760.0, ans=0.2 2023-11-18 06:16:13,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=95760.0, ans=15.0 2023-11-18 06:16:15,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.30 vs. limit=22.5 2023-11-18 06:16:15,571 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:16:16,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.97 vs. limit=6.0 2023-11-18 06:16:24,200 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2350, loss[loss=0.1181, simple_loss=0.1163, pruned_loss=0.04556, audio_tagging_loss=0.01437, over 14781.00 frames. ], tot_loss[loss=0.1356, simple_loss=0.1395, pruned_loss=0.05274, audio_tagging_loss=0.01307, over 3045239.39 frames. ], batch size: 57, lr: 3.00e-02, grad_scale: 32.0 2023-11-18 06:16:25,560 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:16:25,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=95826.66666666667, ans=0.125 2023-11-18 06:16:26,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=95826.66666666667, ans=0.1 2023-11-18 06:16:30,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=95826.66666666667, ans=0.2 2023-11-18 06:16:30,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=95826.66666666667, ans=0.125 2023-11-18 06:16:40,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=95893.33333333333, ans=0.07 2023-11-18 06:16:46,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2023-11-18 06:17:02,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=96026.66666666667, ans=0.09899494936611666 2023-11-18 06:17:06,434 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.925e+01 1.033e+02 1.167e+02 1.342e+02 2.194e+02, threshold=2.335e+02, percent-clipped=0.0 2023-11-18 06:17:13,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=96093.33333333333, ans=15.0 2023-11-18 06:17:20,437 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2400, loss[loss=0.1314, simple_loss=0.1341, pruned_loss=0.0498, audio_tagging_loss=0.01455, over 14790.00 frames. ], tot_loss[loss=0.1354, simple_loss=0.1393, pruned_loss=0.05272, audio_tagging_loss=0.01309, over 3038628.31 frames. ], batch size: 54, lr: 3.00e-02, grad_scale: 32.0 2023-11-18 06:17:30,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=96226.66666666667, ans=0.1 2023-11-18 06:17:50,047 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:17:56,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=96360.0, ans=0.0 2023-11-18 06:18:06,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=96426.66666666667, ans=0.2 2023-11-18 06:18:16,570 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2450, loss[loss=0.1508, simple_loss=0.147, pruned_loss=0.06102, audio_tagging_loss=0.01634, over 14206.00 frames. ], tot_loss[loss=0.136, simple_loss=0.1398, pruned_loss=0.05282, audio_tagging_loss=0.01325, over 3047404.47 frames. ], batch size: 53, lr: 2.99e-02, grad_scale: 32.0 2023-11-18 06:18:28,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=96560.0, ans=0.2 2023-11-18 06:18:30,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=96560.0, ans=0.125 2023-11-18 06:18:37,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=96560.0, ans=0.1 2023-11-18 06:18:45,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=96626.66666666667, ans=0.05 2023-11-18 06:18:46,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=96626.66666666667, ans=0.0 2023-11-18 06:18:58,688 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.568e+01 1.053e+02 1.171e+02 1.330e+02 1.894e+02, threshold=2.342e+02, percent-clipped=0.0 2023-11-18 06:19:13,724 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2500, loss[loss=0.1439, simple_loss=0.1482, pruned_loss=0.05665, audio_tagging_loss=0.01315, over 15935.00 frames. ], tot_loss[loss=0.1356, simple_loss=0.1393, pruned_loss=0.05277, audio_tagging_loss=0.01323, over 3051373.70 frames. ], batch size: 60, lr: 2.99e-02, grad_scale: 32.0 2023-11-18 06:19:22,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2023-11-18 06:19:44,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=96960.0, ans=0.2 2023-11-18 06:19:45,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=96960.0, ans=0.125 2023-11-18 06:19:46,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=97026.66666666667, ans=0.125 2023-11-18 06:19:50,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=97026.66666666667, ans=0.125 2023-11-18 06:20:09,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=97160.0, ans=0.125 2023-11-18 06:20:10,030 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2550, loss[loss=0.138, simple_loss=0.1452, pruned_loss=0.05383, audio_tagging_loss=0.01157, over 15701.00 frames. ], tot_loss[loss=0.1361, simple_loss=0.1401, pruned_loss=0.05295, audio_tagging_loss=0.01311, over 3047666.02 frames. ], batch size: 56, lr: 2.98e-02, grad_scale: 32.0 2023-11-18 06:20:13,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=97160.0, ans=0.125 2023-11-18 06:20:14,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=97160.0, ans=0.125 2023-11-18 06:20:26,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=97226.66666666667, ans=0.0 2023-11-18 06:20:30,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=97226.66666666667, ans=0.125 2023-11-18 06:20:46,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=97360.0, ans=0.0 2023-11-18 06:20:51,781 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.040e+01 1.036e+02 1.193e+02 1.343e+02 1.842e+02, threshold=2.386e+02, percent-clipped=0.0 2023-11-18 06:21:02,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=97426.66666666667, ans=0.125 2023-11-18 06:21:05,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=97493.33333333333, ans=0.02 2023-11-18 06:21:06,241 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2600, loss[loss=0.09961, simple_loss=0.102, pruned_loss=0.03498, audio_tagging_loss=0.01364, over 15204.00 frames. ], tot_loss[loss=0.135, simple_loss=0.1391, pruned_loss=0.05261, audio_tagging_loss=0.01288, over 3050483.62 frames. ], batch size: 58, lr: 2.98e-02, grad_scale: 32.0 2023-11-18 06:21:08,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=97493.33333333333, ans=0.0 2023-11-18 06:21:09,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=97493.33333333333, ans=0.07 2023-11-18 06:21:20,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=97560.0, ans=0.0 2023-11-18 06:22:02,790 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2650, loss[loss=0.1014, simple_loss=0.09309, pruned_loss=0.03999, audio_tagging_loss=0.01483, over 15985.00 frames. ], tot_loss[loss=0.1355, simple_loss=0.1399, pruned_loss=0.05272, audio_tagging_loss=0.01284, over 3052399.22 frames. ], batch size: 64, lr: 2.98e-02, grad_scale: 32.0 2023-11-18 06:22:18,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=97893.33333333333, ans=0.125 2023-11-18 06:22:18,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=97893.33333333333, ans=0.04949747468305833 2023-11-18 06:22:32,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=97960.0, ans=0.125 2023-11-18 06:22:34,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=97960.0, ans=0.1 2023-11-18 06:22:42,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=98026.66666666667, ans=0.0 2023-11-18 06:22:44,863 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.930e+01 1.069e+02 1.231e+02 1.397e+02 2.138e+02, threshold=2.463e+02, percent-clipped=0.0 2023-11-18 06:22:59,927 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2700, loss[loss=0.1379, simple_loss=0.1468, pruned_loss=0.05439, audio_tagging_loss=0.01018, over 14465.00 frames. ], tot_loss[loss=0.136, simple_loss=0.1405, pruned_loss=0.05306, audio_tagging_loss=0.01273, over 3052151.68 frames. ], batch size: 55, lr: 2.97e-02, grad_scale: 32.0 2023-11-18 06:23:07,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=98160.0, ans=0.1 2023-11-18 06:23:20,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=98226.66666666667, ans=0.0 2023-11-18 06:23:20,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=98226.66666666667, ans=0.0 2023-11-18 06:23:35,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=98360.0, ans=0.125 2023-11-18 06:23:56,188 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2750, loss[loss=0.1381, simple_loss=0.1554, pruned_loss=0.04992, audio_tagging_loss=0.01043, over 16028.00 frames. ], tot_loss[loss=0.1344, simple_loss=0.1388, pruned_loss=0.05226, audio_tagging_loss=0.01268, over 3057280.74 frames. ], batch size: 56, lr: 2.97e-02, grad_scale: 32.0 2023-11-18 06:23:57,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=12.0 2023-11-18 06:24:00,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=98493.33333333333, ans=0.1 2023-11-18 06:24:04,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=98493.33333333333, ans=0.025 2023-11-18 06:24:05,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=98493.33333333333, ans=0.125 2023-11-18 06:24:06,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=15.0 2023-11-18 06:24:07,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=98560.0, ans=0.125 2023-11-18 06:24:11,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=98560.0, ans=0.125 2023-11-18 06:24:18,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=98626.66666666667, ans=0.125 2023-11-18 06:24:20,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=98626.66666666667, ans=0.0 2023-11-18 06:24:28,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=98626.66666666667, ans=0.5 2023-11-18 06:24:37,491 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 1.008e+02 1.194e+02 1.354e+02 1.877e+02, threshold=2.388e+02, percent-clipped=0.0 2023-11-18 06:24:42,370 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:24:50,585 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.083e+00 2023-11-18 06:24:52,450 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2800, loss[loss=0.1925, simple_loss=0.2041, pruned_loss=0.07889, audio_tagging_loss=0.01154, over 14605.00 frames. ], tot_loss[loss=0.1336, simple_loss=0.1381, pruned_loss=0.0519, audio_tagging_loss=0.01269, over 3055025.58 frames. ], batch size: 55, lr: 2.96e-02, grad_scale: 32.0 2023-11-18 06:25:08,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=98893.33333333333, ans=0.125 2023-11-18 06:25:10,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=98893.33333333333, ans=0.2 2023-11-18 06:25:34,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=99026.66666666667, ans=0.1 2023-11-18 06:25:39,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=99093.33333333333, ans=15.0 2023-11-18 06:25:48,865 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2850, loss[loss=0.1587, simple_loss=0.1597, pruned_loss=0.06649, audio_tagging_loss=0.0124, over 15413.00 frames. ], tot_loss[loss=0.133, simple_loss=0.1374, pruned_loss=0.05161, audio_tagging_loss=0.01274, over 3046900.41 frames. ], batch size: 57, lr: 2.96e-02, grad_scale: 32.0 2023-11-18 06:26:16,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.15 vs. limit=22.5 2023-11-18 06:26:17,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.65 vs. limit=22.5 2023-11-18 06:26:24,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=99360.0, ans=0.1 2023-11-18 06:26:27,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=99360.0, ans=0.5 2023-11-18 06:26:30,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.061e+01 1.055e+02 1.276e+02 1.437e+02 2.072e+02, threshold=2.552e+02, percent-clipped=0.0 2023-11-18 06:26:31,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=99360.0, ans=22.5 2023-11-18 06:26:34,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2023-11-18 06:26:45,308 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2900, loss[loss=0.1667, simple_loss=0.1678, pruned_loss=0.06929, audio_tagging_loss=0.01351, over 14463.00 frames. ], tot_loss[loss=0.1326, simple_loss=0.1369, pruned_loss=0.0514, audio_tagging_loss=0.01279, over 3043744.79 frames. ], batch size: 55, lr: 2.96e-02, grad_scale: 32.0 2023-11-18 06:26:54,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=99493.33333333333, ans=0.125 2023-11-18 06:27:05,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2023-11-18 06:27:29,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=99760.0, ans=0.125 2023-11-18 06:27:39,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.41 vs. limit=6.0 2023-11-18 06:27:42,256 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 2950, loss[loss=0.1516, simple_loss=0.1552, pruned_loss=0.06133, audio_tagging_loss=0.01266, over 15256.00 frames. ], tot_loss[loss=0.1331, simple_loss=0.1375, pruned_loss=0.05155, audio_tagging_loss=0.01282, over 3040642.04 frames. ], batch size: 56, lr: 2.95e-02, grad_scale: 16.0 2023-11-18 06:27:44,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=99826.66666666667, ans=0.125 2023-11-18 06:27:45,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=99826.66666666667, ans=0.1 2023-11-18 06:27:50,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.88 vs. limit=10.0 2023-11-18 06:27:55,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=99893.33333333333, ans=0.0 2023-11-18 06:27:55,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=99893.33333333333, ans=0.125 2023-11-18 06:28:16,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=100026.66666666667, ans=0.0 2023-11-18 06:28:17,503 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.72 vs. limit=15.0 2023-11-18 06:28:18,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=100026.66666666667, ans=15.0 2023-11-18 06:28:25,091 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.080e+01 1.058e+02 1.250e+02 1.448e+02 1.793e+02, threshold=2.500e+02, percent-clipped=0.0 2023-11-18 06:28:38,683 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3000, loss[loss=0.1214, simple_loss=0.1165, pruned_loss=0.048, audio_tagging_loss=0.0152, over 13800.00 frames. ], tot_loss[loss=0.1324, simple_loss=0.1366, pruned_loss=0.05123, audio_tagging_loss=0.01287, over 3031209.70 frames. ], batch size: 56, lr: 2.95e-02, grad_scale: 16.0 2023-11-18 06:28:38,683 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 06:29:12,289 INFO [train_asr.py:1147] (1/4) Epoch 2, validation: loss=0.0901, simple_loss=0.07118, pruned_loss=0.01674, audio_tagging_loss=0.03777, over 4681554.00 frames. 2023-11-18 06:29:12,290 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 06:29:40,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=100293.33333333333, ans=0.1 2023-11-18 06:29:46,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=100360.0, ans=0.125 2023-11-18 06:29:46,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=100360.0, ans=0.125 2023-11-18 06:29:48,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=100360.0, ans=10.0 2023-11-18 06:30:05,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.35 vs. limit=10.0 2023-11-18 06:30:07,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=100493.33333333333, ans=0.125 2023-11-18 06:30:08,707 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3050, loss[loss=0.1022, simple_loss=0.1088, pruned_loss=0.0355, audio_tagging_loss=0.0123, over 15507.00 frames. ], tot_loss[loss=0.1339, simple_loss=0.1386, pruned_loss=0.05187, audio_tagging_loss=0.01275, over 3035650.33 frames. ], batch size: 61, lr: 2.94e-02, grad_scale: 16.0 2023-11-18 06:30:24,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.40 vs. limit=15.0 2023-11-18 06:30:26,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=100560.0, ans=0.0 2023-11-18 06:30:39,917 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:30:41,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0 2023-11-18 06:30:50,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=100693.33333333333, ans=0.0 2023-11-18 06:30:51,083 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.130e+01 1.055e+02 1.164e+02 1.306e+02 1.882e+02, threshold=2.329e+02, percent-clipped=0.0 2023-11-18 06:30:52,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=100760.0, ans=0.125 2023-11-18 06:30:54,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=100760.0, ans=0.0 2023-11-18 06:31:04,635 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3100, loss[loss=0.1392, simple_loss=0.15, pruned_loss=0.05482, audio_tagging_loss=0.009417, over 14751.00 frames. ], tot_loss[loss=0.1344, simple_loss=0.1387, pruned_loss=0.05214, audio_tagging_loss=0.01293, over 3033610.28 frames. ], batch size: 56, lr: 2.94e-02, grad_scale: 16.0 2023-11-18 06:31:06,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=15.0 2023-11-18 06:31:18,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=100893.33333333333, ans=0.2 2023-11-18 06:31:25,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=100893.33333333333, ans=0.95 2023-11-18 06:31:36,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=100960.0, ans=0.125 2023-11-18 06:31:40,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=101026.66666666667, ans=0.125 2023-11-18 06:31:50,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101093.33333333333, ans=0.1 2023-11-18 06:31:55,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.72 vs. limit=22.5 2023-11-18 06:32:00,116 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3150, loss[loss=0.205, simple_loss=0.2137, pruned_loss=0.08346, audio_tagging_loss=0.01472, over 15282.00 frames. ], tot_loss[loss=0.1353, simple_loss=0.14, pruned_loss=0.05242, audio_tagging_loss=0.01287, over 3039185.98 frames. ], batch size: 55, lr: 2.94e-02, grad_scale: 16.0 2023-11-18 06:32:16,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=101226.66666666667, ans=0.2 2023-11-18 06:32:17,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=101226.66666666667, ans=0.125 2023-11-18 06:32:18,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=101226.66666666667, ans=0.125 2023-11-18 06:32:32,586 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.50 vs. limit=10.0 2023-11-18 06:32:38,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=101360.0, ans=0.125 2023-11-18 06:32:39,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=101360.0, ans=0.125 2023-11-18 06:32:43,613 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.183e+01 1.035e+02 1.177e+02 1.341e+02 1.863e+02, threshold=2.355e+02, percent-clipped=0.0 2023-11-18 06:32:49,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2023-11-18 06:32:57,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=101493.33333333333, ans=0.125 2023-11-18 06:32:58,085 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3200, loss[loss=0.1148, simple_loss=0.1092, pruned_loss=0.04358, audio_tagging_loss=0.01665, over 15404.00 frames. ], tot_loss[loss=0.1344, simple_loss=0.1393, pruned_loss=0.05176, audio_tagging_loss=0.01301, over 3041736.01 frames. ], batch size: 59, lr: 2.93e-02, grad_scale: 32.0 2023-11-18 06:32:59,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=101493.33333333333, ans=0.125 2023-11-18 06:33:02,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=101493.33333333333, ans=0.0 2023-11-18 06:33:23,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=101626.66666666667, ans=0.0 2023-11-18 06:33:29,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=101626.66666666667, ans=0.0 2023-11-18 06:33:52,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=101760.0, ans=0.125 2023-11-18 06:33:54,451 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3250, loss[loss=0.1094, simple_loss=0.1139, pruned_loss=0.03699, audio_tagging_loss=0.01541, over 15330.00 frames. ], tot_loss[loss=0.1343, simple_loss=0.1393, pruned_loss=0.05154, audio_tagging_loss=0.01314, over 3043408.67 frames. ], batch size: 57, lr: 2.93e-02, grad_scale: 32.0 2023-11-18 06:33:56,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.81 vs. limit=10.0 2023-11-18 06:34:05,362 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:34:21,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=101960.0, ans=0.07 2023-11-18 06:34:22,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2023-11-18 06:34:36,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=102026.66666666667, ans=0.1 2023-11-18 06:34:37,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.40 vs. limit=15.0 2023-11-18 06:34:37,265 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.501e+01 1.067e+02 1.209e+02 1.454e+02 2.188e+02, threshold=2.419e+02, percent-clipped=0.0 2023-11-18 06:34:43,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=102093.33333333333, ans=10.0 2023-11-18 06:34:50,103 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3300, loss[loss=0.1719, simple_loss=0.1739, pruned_loss=0.07302, audio_tagging_loss=0.01195, over 15190.00 frames. ], tot_loss[loss=0.1343, simple_loss=0.1393, pruned_loss=0.05155, audio_tagging_loss=0.01313, over 3049125.10 frames. ], batch size: 56, lr: 2.93e-02, grad_scale: 32.0 2023-11-18 06:34:57,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=102160.0, ans=0.125 2023-11-18 06:35:01,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=102226.66666666667, ans=0.125 2023-11-18 06:35:14,213 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=12.0 2023-11-18 06:35:19,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=102293.33333333333, ans=0.125 2023-11-18 06:35:20,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=102293.33333333333, ans=0.0 2023-11-18 06:35:43,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=102426.66666666667, ans=0.0 2023-11-18 06:35:46,867 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3350, loss[loss=0.1125, simple_loss=0.1151, pruned_loss=0.03986, audio_tagging_loss=0.01507, over 13822.00 frames. ], tot_loss[loss=0.1346, simple_loss=0.1394, pruned_loss=0.0518, audio_tagging_loss=0.0131, over 3040383.65 frames. ], batch size: 54, lr: 2.92e-02, grad_scale: 32.0 2023-11-18 06:35:54,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2023-11-18 06:36:05,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=102560.0, ans=0.125 2023-11-18 06:36:17,494 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:36:30,158 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.335e+01 1.052e+02 1.183e+02 1.313e+02 1.850e+02, threshold=2.366e+02, percent-clipped=0.0 2023-11-18 06:36:42,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=102760.0, ans=0.125 2023-11-18 06:36:44,255 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3400, loss[loss=0.1348, simple_loss=0.1372, pruned_loss=0.05417, audio_tagging_loss=0.01209, over 15234.00 frames. ], tot_loss[loss=0.1353, simple_loss=0.1404, pruned_loss=0.05209, audio_tagging_loss=0.01299, over 3042321.04 frames. ], batch size: 57, lr: 2.92e-02, grad_scale: 32.0 2023-11-18 06:37:36,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=103093.33333333333, ans=0.2 2023-11-18 06:37:39,600 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3450, loss[loss=0.08562, simple_loss=0.08749, pruned_loss=0.02833, audio_tagging_loss=0.01354, over 15428.00 frames. ], tot_loss[loss=0.1341, simple_loss=0.1391, pruned_loss=0.05172, audio_tagging_loss=0.01285, over 3040726.54 frames. ], batch size: 61, lr: 2.91e-02, grad_scale: 32.0 2023-11-18 06:37:48,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=103160.0, ans=0.5 2023-11-18 06:37:51,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.62 vs. limit=12.0 2023-11-18 06:37:56,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=103226.66666666667, ans=0.125 2023-11-18 06:38:00,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=12.0 2023-11-18 06:38:02,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=103293.33333333333, ans=0.125 2023-11-18 06:38:08,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=103293.33333333333, ans=0.125 2023-11-18 06:38:12,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=103360.0, ans=0.125 2023-11-18 06:38:18,011 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2023-11-18 06:38:21,874 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.158e+01 1.088e+02 1.277e+02 1.401e+02 2.193e+02, threshold=2.554e+02, percent-clipped=0.0 2023-11-18 06:38:32,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=103426.66666666667, ans=0.1 2023-11-18 06:38:35,894 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3500, loss[loss=0.1079, simple_loss=0.113, pruned_loss=0.03799, audio_tagging_loss=0.01339, over 15666.00 frames. ], tot_loss[loss=0.1335, simple_loss=0.1385, pruned_loss=0.05142, audio_tagging_loss=0.01286, over 3045045.11 frames. ], batch size: 60, lr: 2.91e-02, grad_scale: 32.0 2023-11-18 06:38:38,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=103493.33333333333, ans=0.1 2023-11-18 06:38:39,304 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:38:45,167 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:39:02,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=103626.66666666667, ans=0.0 2023-11-18 06:39:03,744 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:39:09,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=103693.33333333333, ans=0.2 2023-11-18 06:39:29,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=103760.0, ans=0.07 2023-11-18 06:39:32,481 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3550, loss[loss=0.146, simple_loss=0.1439, pruned_loss=0.06105, audio_tagging_loss=0.01299, over 14716.00 frames. ], tot_loss[loss=0.134, simple_loss=0.1391, pruned_loss=0.05165, audio_tagging_loss=0.01275, over 3044634.16 frames. ], batch size: 56, lr: 2.91e-02, grad_scale: 32.0 2023-11-18 06:39:57,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=103960.0, ans=0.04949747468305833 2023-11-18 06:39:58,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=103960.0, ans=0.1 2023-11-18 06:40:02,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=103960.0, ans=0.125 2023-11-18 06:40:15,311 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.977e+01 9.988e+01 1.160e+02 1.284e+02 2.391e+02, threshold=2.320e+02, percent-clipped=0.0 2023-11-18 06:40:19,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.37 vs. limit=15.0 2023-11-18 06:40:28,310 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3600, loss[loss=0.1077, simple_loss=0.1172, pruned_loss=0.03689, audio_tagging_loss=0.01215, over 14975.00 frames. ], tot_loss[loss=0.1338, simple_loss=0.139, pruned_loss=0.05163, audio_tagging_loss=0.01273, over 3041372.99 frames. ], batch size: 56, lr: 2.90e-02, grad_scale: 32.0 2023-11-18 06:41:06,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=104360.0, ans=0.125 2023-11-18 06:41:11,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=104360.0, ans=0.1 2023-11-18 06:41:24,472 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3650, loss[loss=0.1339, simple_loss=0.1378, pruned_loss=0.05299, audio_tagging_loss=0.01204, over 14421.00 frames. ], tot_loss[loss=0.1344, simple_loss=0.1394, pruned_loss=0.05195, audio_tagging_loss=0.01272, over 3044012.55 frames. ], batch size: 56, lr: 2.90e-02, grad_scale: 32.0 2023-11-18 06:41:53,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=104626.66666666667, ans=0.125 2023-11-18 06:42:07,178 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.564e+01 1.051e+02 1.152e+02 1.363e+02 2.191e+02, threshold=2.304e+02, percent-clipped=0.0 2023-11-18 06:42:10,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=104760.0, ans=0.0 2023-11-18 06:42:20,863 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3700, loss[loss=0.09546, simple_loss=0.09069, pruned_loss=0.03396, audio_tagging_loss=0.01615, over 14559.00 frames. ], tot_loss[loss=0.1343, simple_loss=0.1393, pruned_loss=0.05195, audio_tagging_loss=0.01272, over 3049220.87 frames. ], batch size: 56, lr: 2.90e-02, grad_scale: 32.0 2023-11-18 06:42:23,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=104826.66666666667, ans=0.0 2023-11-18 06:42:27,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=104826.66666666667, ans=0.0 2023-11-18 06:42:40,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=104893.33333333333, ans=0.125 2023-11-18 06:42:40,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=104893.33333333333, ans=0.125 2023-11-18 06:42:43,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.46 vs. limit=15.0 2023-11-18 06:43:03,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=105026.66666666667, ans=0.125 2023-11-18 06:43:17,350 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3750, loss[loss=0.1658, simple_loss=0.1839, pruned_loss=0.06627, audio_tagging_loss=0.007569, over 14475.00 frames. ], tot_loss[loss=0.1346, simple_loss=0.1399, pruned_loss=0.05206, audio_tagging_loss=0.01259, over 3046573.98 frames. ], batch size: 53, lr: 2.89e-02, grad_scale: 32.0 2023-11-18 06:43:31,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.05 vs. limit=6.0 2023-11-18 06:43:32,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.10 vs. limit=15.0 2023-11-18 06:43:44,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=105293.33333333333, ans=0.125 2023-11-18 06:43:54,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=105360.0, ans=0.1 2023-11-18 06:43:56,418 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:44:00,706 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.685e+01 1.091e+02 1.248e+02 1.454e+02 2.022e+02, threshold=2.495e+02, percent-clipped=0.0 2023-11-18 06:44:13,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=105493.33333333333, ans=0.2 2023-11-18 06:44:14,158 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3800, loss[loss=0.1027, simple_loss=0.09899, pruned_loss=0.03598, audio_tagging_loss=0.01723, over 14997.00 frames. ], tot_loss[loss=0.1343, simple_loss=0.139, pruned_loss=0.05204, audio_tagging_loss=0.01277, over 3044724.21 frames. ], batch size: 56, lr: 2.89e-02, grad_scale: 32.0 2023-11-18 06:44:16,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=105493.33333333333, ans=0.125 2023-11-18 06:44:25,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.84 vs. limit=15.0 2023-11-18 06:44:42,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=105626.66666666667, ans=0.0 2023-11-18 06:44:46,519 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=12.0 2023-11-18 06:44:56,331 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:45:02,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.40 vs. limit=12.0 2023-11-18 06:45:03,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2023-11-18 06:45:04,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.08 vs. limit=15.0 2023-11-18 06:45:10,955 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3850, loss[loss=0.1013, simple_loss=0.09845, pruned_loss=0.03595, audio_tagging_loss=0.01617, over 14092.00 frames. ], tot_loss[loss=0.1334, simple_loss=0.1378, pruned_loss=0.05146, audio_tagging_loss=0.01298, over 3043819.92 frames. ], batch size: 55, lr: 2.88e-02, grad_scale: 32.0 2023-11-18 06:45:11,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=105826.66666666667, ans=0.125 2023-11-18 06:45:26,055 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2023-11-18 06:45:51,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=106026.66666666667, ans=0.125 2023-11-18 06:45:53,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 1.026e+02 1.153e+02 1.299e+02 2.070e+02, threshold=2.305e+02, percent-clipped=0.0 2023-11-18 06:45:59,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=106093.33333333333, ans=0.0 2023-11-18 06:46:06,607 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3900, loss[loss=0.1548, simple_loss=0.1682, pruned_loss=0.05709, audio_tagging_loss=0.01365, over 15638.00 frames. ], tot_loss[loss=0.132, simple_loss=0.1365, pruned_loss=0.05071, audio_tagging_loss=0.01308, over 3039963.58 frames. ], batch size: 57, lr: 2.88e-02, grad_scale: 32.0 2023-11-18 06:46:19,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=106226.66666666667, ans=0.1 2023-11-18 06:46:40,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=106360.0, ans=0.2 2023-11-18 06:46:50,715 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:46:58,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.60 vs. limit=15.0 2023-11-18 06:46:58,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=106426.66666666667, ans=0.0 2023-11-18 06:47:03,430 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 3950, loss[loss=0.1184, simple_loss=0.1207, pruned_loss=0.04471, audio_tagging_loss=0.01337, over 14971.00 frames. ], tot_loss[loss=0.1321, simple_loss=0.1366, pruned_loss=0.05061, audio_tagging_loss=0.01318, over 3043363.39 frames. ], batch size: 56, lr: 2.88e-02, grad_scale: 32.0 2023-11-18 06:47:06,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=106493.33333333333, ans=0.125 2023-11-18 06:47:25,768 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.39 vs. limit=15.0 2023-11-18 06:47:30,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=106626.66666666667, ans=0.1 2023-11-18 06:47:43,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.36 vs. limit=6.0 2023-11-18 06:47:48,874 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 1.025e+02 1.127e+02 1.249e+02 1.832e+02, threshold=2.254e+02, percent-clipped=0.0 2023-11-18 06:47:56,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=106760.0, ans=0.2 2023-11-18 06:47:56,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=106760.0, ans=0.2 2023-11-18 06:48:02,276 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4000, loss[loss=0.1705, simple_loss=0.1868, pruned_loss=0.06944, audio_tagging_loss=0.007713, over 15285.00 frames. ], tot_loss[loss=0.1329, simple_loss=0.1375, pruned_loss=0.05101, audio_tagging_loss=0.01318, over 3035880.78 frames. ], batch size: 58, lr: 2.87e-02, grad_scale: 32.0 2023-11-18 06:48:06,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=106826.66666666667, ans=0.2 2023-11-18 06:48:18,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=106893.33333333333, ans=0.125 2023-11-18 06:48:21,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=106893.33333333333, ans=0.125 2023-11-18 06:48:24,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=106960.0, ans=0.125 2023-11-18 06:48:29,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2023-11-18 06:48:43,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=107026.66666666667, ans=0.1 2023-11-18 06:48:58,673 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4050, loss[loss=0.1339, simple_loss=0.1406, pruned_loss=0.05392, audio_tagging_loss=0.009704, over 15116.00 frames. ], tot_loss[loss=0.1342, simple_loss=0.139, pruned_loss=0.05159, audio_tagging_loss=0.01314, over 3040992.38 frames. ], batch size: 57, lr: 2.87e-02, grad_scale: 32.0 2023-11-18 06:48:59,805 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:49:09,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=107226.66666666667, ans=0.125 2023-11-18 06:49:18,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=107226.66666666667, ans=0.125 2023-11-18 06:49:40,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=107360.0, ans=0.0 2023-11-18 06:49:41,708 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.845e+01 1.077e+02 1.199e+02 1.331e+02 2.496e+02, threshold=2.397e+02, percent-clipped=1.0 2023-11-18 06:49:46,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2023-11-18 06:49:55,684 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4100, loss[loss=0.1377, simple_loss=0.1412, pruned_loss=0.05582, audio_tagging_loss=0.01128, over 16420.00 frames. ], tot_loss[loss=0.134, simple_loss=0.1389, pruned_loss=0.05148, audio_tagging_loss=0.01303, over 3046579.01 frames. ], batch size: 61, lr: 2.87e-02, grad_scale: 32.0 2023-11-18 06:50:10,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=107560.0, ans=0.2 2023-11-18 06:50:30,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=107693.33333333333, ans=0.0 2023-11-18 06:50:51,923 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4150, loss[loss=0.1372, simple_loss=0.1409, pruned_loss=0.05295, audio_tagging_loss=0.01384, over 14678.00 frames. ], tot_loss[loss=0.134, simple_loss=0.1399, pruned_loss=0.05142, audio_tagging_loss=0.01264, over 3042772.92 frames. ], batch size: 54, lr: 2.86e-02, grad_scale: 32.0 2023-11-18 06:50:54,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.59 vs. limit=15.0 2023-11-18 06:50:59,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.69 vs. limit=22.5 2023-11-18 06:51:02,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=107893.33333333333, ans=0.0 2023-11-18 06:51:08,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=107893.33333333333, ans=0.125 2023-11-18 06:51:10,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=107893.33333333333, ans=0.125 2023-11-18 06:51:16,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=107960.0, ans=0.125 2023-11-18 06:51:24,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=107960.0, ans=0.0 2023-11-18 06:51:31,967 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:51:34,782 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.19 vs. limit=15.0 2023-11-18 06:51:35,152 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.314e+01 1.032e+02 1.149e+02 1.336e+02 2.371e+02, threshold=2.297e+02, percent-clipped=0.0 2023-11-18 06:51:43,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.97 vs. limit=6.0 2023-11-18 06:51:45,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=108093.33333333333, ans=0.125 2023-11-18 06:51:47,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=108160.0, ans=0.2 2023-11-18 06:51:48,694 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4200, loss[loss=0.1342, simple_loss=0.1411, pruned_loss=0.05207, audio_tagging_loss=0.01158, over 15998.00 frames. ], tot_loss[loss=0.1327, simple_loss=0.1385, pruned_loss=0.05095, audio_tagging_loss=0.01252, over 3043673.65 frames. ], batch size: 58, lr: 2.86e-02, grad_scale: 32.0 2023-11-18 06:51:58,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=108226.66666666667, ans=0.0 2023-11-18 06:52:17,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=108293.33333333333, ans=0.1 2023-11-18 06:52:26,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=108360.0, ans=15.0 2023-11-18 06:52:35,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=108426.66666666667, ans=0.1 2023-11-18 06:52:44,349 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4250, loss[loss=0.1852, simple_loss=0.1986, pruned_loss=0.07803, audio_tagging_loss=0.007895, over 15670.00 frames. ], tot_loss[loss=0.1351, simple_loss=0.1413, pruned_loss=0.05207, audio_tagging_loss=0.01235, over 3045877.76 frames. ], batch size: 54, lr: 2.85e-02, grad_scale: 32.0 2023-11-18 06:52:50,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=108493.33333333333, ans=0.125 2023-11-18 06:53:14,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=108626.66666666667, ans=0.125 2023-11-18 06:53:20,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.80 vs. limit=15.0 2023-11-18 06:53:26,871 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 9.019e+01 1.076e+02 1.189e+02 1.301e+02 1.957e+02, threshold=2.378e+02, percent-clipped=0.0 2023-11-18 06:53:28,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=108760.0, ans=0.125 2023-11-18 06:53:41,490 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4300, loss[loss=0.198, simple_loss=0.2113, pruned_loss=0.08433, audio_tagging_loss=0.007975, over 15874.00 frames. ], tot_loss[loss=0.1351, simple_loss=0.1411, pruned_loss=0.05217, audio_tagging_loss=0.01243, over 3044288.17 frames. ], batch size: 56, lr: 2.85e-02, grad_scale: 32.0 2023-11-18 06:53:43,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.63 vs. limit=10.0 2023-11-18 06:53:49,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=108826.66666666667, ans=0.0 2023-11-18 06:54:11,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.14 vs. limit=22.5 2023-11-18 06:54:15,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=109026.66666666667, ans=0.1 2023-11-18 06:54:22,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=109026.66666666667, ans=0.0 2023-11-18 06:54:30,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109093.33333333333, ans=0.1 2023-11-18 06:54:31,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=109093.33333333333, ans=0.025 2023-11-18 06:54:37,455 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4350, loss[loss=0.1586, simple_loss=0.1694, pruned_loss=0.06268, audio_tagging_loss=0.01118, over 16434.00 frames. ], tot_loss[loss=0.1344, simple_loss=0.1402, pruned_loss=0.05183, audio_tagging_loss=0.01249, over 3042429.99 frames. ], batch size: 63, lr: 2.85e-02, grad_scale: 32.0 2023-11-18 06:54:39,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=109160.0, ans=0.07 2023-11-18 06:54:47,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=109226.66666666667, ans=0.0 2023-11-18 06:55:00,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=109293.33333333333, ans=0.125 2023-11-18 06:55:00,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=109293.33333333333, ans=0.125 2023-11-18 06:55:00,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=109293.33333333333, ans=0.1 2023-11-18 06:55:01,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=109293.33333333333, ans=0.125 2023-11-18 06:55:05,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=109293.33333333333, ans=0.1 2023-11-18 06:55:20,482 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.140e+01 1.013e+02 1.155e+02 1.315e+02 2.105e+02, threshold=2.309e+02, percent-clipped=0.0 2023-11-18 06:55:24,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=109426.66666666667, ans=0.125 2023-11-18 06:55:33,443 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4400, loss[loss=0.1478, simple_loss=0.1586, pruned_loss=0.05826, audio_tagging_loss=0.01028, over 15032.00 frames. ], tot_loss[loss=0.1343, simple_loss=0.14, pruned_loss=0.05168, audio_tagging_loss=0.01265, over 3039256.55 frames. ], batch size: 57, lr: 2.84e-02, grad_scale: 32.0 2023-11-18 06:55:34,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109493.33333333333, ans=0.1 2023-11-18 06:55:42,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2023-11-18 06:55:47,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=109560.0, ans=22.5 2023-11-18 06:55:51,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=109560.0, ans=0.95 2023-11-18 06:56:00,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=109626.66666666667, ans=0.125 2023-11-18 06:56:23,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.06 vs. limit=15.0 2023-11-18 06:56:27,557 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.20 vs. limit=15.0 2023-11-18 06:56:29,199 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4450, loss[loss=0.161, simple_loss=0.1624, pruned_loss=0.0674, audio_tagging_loss=0.01241, over 14280.00 frames. ], tot_loss[loss=0.1342, simple_loss=0.1396, pruned_loss=0.05177, audio_tagging_loss=0.01263, over 3041327.28 frames. ], batch size: 55, lr: 2.84e-02, grad_scale: 32.0 2023-11-18 06:56:40,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=109893.33333333333, ans=0.04949747468305833 2023-11-18 06:56:57,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=109960.0, ans=0.125 2023-11-18 06:57:03,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=110026.66666666667, ans=0.0 2023-11-18 06:57:05,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=110026.66666666667, ans=0.1 2023-11-18 06:57:11,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.260e+01 1.110e+02 1.220e+02 1.457e+02 2.260e+02, threshold=2.440e+02, percent-clipped=0.0 2023-11-18 06:57:24,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=110093.33333333333, ans=0.2 2023-11-18 06:57:26,475 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4500, loss[loss=0.1034, simple_loss=0.1055, pruned_loss=0.03635, audio_tagging_loss=0.01427, over 16321.00 frames. ], tot_loss[loss=0.1334, simple_loss=0.1387, pruned_loss=0.05131, audio_tagging_loss=0.01271, over 3040589.35 frames. ], batch size: 62, lr: 2.84e-02, grad_scale: 32.0 2023-11-18 06:57:34,285 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.345e+00 2023-11-18 06:57:35,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.25 vs. limit=22.5 2023-11-18 06:57:57,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110293.33333333333, ans=0.1 2023-11-18 06:58:13,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=110426.66666666667, ans=0.02 2023-11-18 06:58:19,877 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2023-11-18 06:58:21,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.93 vs. limit=22.5 2023-11-18 06:58:22,424 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4550, loss[loss=0.1049, simple_loss=0.1087, pruned_loss=0.038, audio_tagging_loss=0.01259, over 15517.00 frames. ], tot_loss[loss=0.1323, simple_loss=0.1375, pruned_loss=0.05076, audio_tagging_loss=0.01281, over 3035290.05 frames. ], batch size: 57, lr: 2.83e-02, grad_scale: 32.0 2023-11-18 06:58:29,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110493.33333333333, ans=0.1 2023-11-18 06:58:32,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=110560.0, ans=0.125 2023-11-18 06:58:45,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=110626.66666666667, ans=0.0 2023-11-18 06:58:55,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=110626.66666666667, ans=0.125 2023-11-18 06:59:05,467 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.958e+01 1.009e+02 1.148e+02 1.280e+02 1.877e+02, threshold=2.295e+02, percent-clipped=0.0 2023-11-18 06:59:05,504 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:59:07,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110760.0, ans=0.1 2023-11-18 06:59:18,758 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4600, loss[loss=0.1465, simple_loss=0.1444, pruned_loss=0.06084, audio_tagging_loss=0.01343, over 15283.00 frames. ], tot_loss[loss=0.133, simple_loss=0.1382, pruned_loss=0.0511, audio_tagging_loss=0.01284, over 3043876.34 frames. ], batch size: 56, lr: 2.83e-02, grad_scale: 32.0 2023-11-18 06:59:23,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=110826.66666666667, ans=0.0 2023-11-18 06:59:36,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=110893.33333333333, ans=0.125 2023-11-18 06:59:37,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.66 vs. limit=22.5 2023-11-18 06:59:52,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.77 vs. limit=22.5 2023-11-18 07:00:08,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=111093.33333333333, ans=0.0 2023-11-18 07:00:15,440 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4650, loss[loss=0.1358, simple_loss=0.139, pruned_loss=0.05322, audio_tagging_loss=0.01312, over 15390.00 frames. ], tot_loss[loss=0.1336, simple_loss=0.1388, pruned_loss=0.05132, audio_tagging_loss=0.01294, over 3037711.73 frames. ], batch size: 58, lr: 2.83e-02, grad_scale: 32.0 2023-11-18 07:00:16,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=111160.0, ans=0.125 2023-11-18 07:00:27,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111226.66666666667, ans=0.1 2023-11-18 07:00:48,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=111360.0, ans=0.0 2023-11-18 07:00:48,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2023-11-18 07:00:49,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=111360.0, ans=0.04949747468305833 2023-11-18 07:00:55,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111360.0, ans=0.1 2023-11-18 07:00:58,026 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.738e+01 1.062e+02 1.161e+02 1.332e+02 2.161e+02, threshold=2.322e+02, percent-clipped=0.0 2023-11-18 07:01:10,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.34 vs. limit=15.0 2023-11-18 07:01:10,899 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4700, loss[loss=0.1485, simple_loss=0.1528, pruned_loss=0.06129, audio_tagging_loss=0.01077, over 15194.00 frames. ], tot_loss[loss=0.134, simple_loss=0.1394, pruned_loss=0.05138, audio_tagging_loss=0.01298, over 3041953.81 frames. ], batch size: 57, lr: 2.82e-02, grad_scale: 32.0 2023-11-18 07:01:38,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=111626.66666666667, ans=0.07 2023-11-18 07:01:49,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2023-11-18 07:01:52,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=111693.33333333333, ans=0.0 2023-11-18 07:02:03,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=111760.0, ans=0.0 2023-11-18 07:02:05,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=111760.0, ans=0.125 2023-11-18 07:02:06,893 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4750, loss[loss=0.09888, simple_loss=0.0998, pruned_loss=0.03512, audio_tagging_loss=0.01387, over 16433.00 frames. ], tot_loss[loss=0.1337, simple_loss=0.1386, pruned_loss=0.05126, audio_tagging_loss=0.01311, over 3046579.40 frames. ], batch size: 62, lr: 2.82e-02, grad_scale: 32.0 2023-11-18 07:02:12,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=111826.66666666667, ans=0.2 2023-11-18 07:02:16,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=111826.66666666667, ans=0.0 2023-11-18 07:02:21,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=111893.33333333333, ans=0.125 2023-11-18 07:02:22,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=111893.33333333333, ans=0.125 2023-11-18 07:02:22,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=12.0 2023-11-18 07:02:22,426 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2023-11-18 07:02:23,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=111893.33333333333, ans=0.1 2023-11-18 07:02:49,773 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.988e+01 1.063e+02 1.146e+02 1.305e+02 1.876e+02, threshold=2.292e+02, percent-clipped=0.0 2023-11-18 07:02:50,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-18 07:03:03,776 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4800, loss[loss=0.1245, simple_loss=0.1322, pruned_loss=0.04445, audio_tagging_loss=0.01396, over 16261.00 frames. ], tot_loss[loss=0.1326, simple_loss=0.1375, pruned_loss=0.05066, audio_tagging_loss=0.01319, over 3052977.63 frames. ], batch size: 62, lr: 2.82e-02, grad_scale: 32.0 2023-11-18 07:03:06,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=112160.0, ans=0.1 2023-11-18 07:03:11,201 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.47 vs. limit=10.0 2023-11-18 07:03:12,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=112160.0, ans=0.125 2023-11-18 07:03:12,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2023-11-18 07:03:20,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=112226.66666666667, ans=0.125 2023-11-18 07:03:24,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=112293.33333333333, ans=0.125 2023-11-18 07:03:46,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=112360.0, ans=0.1 2023-11-18 07:03:59,956 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4850, loss[loss=0.1282, simple_loss=0.1368, pruned_loss=0.04628, audio_tagging_loss=0.01346, over 15621.00 frames. ], tot_loss[loss=0.1323, simple_loss=0.1371, pruned_loss=0.05046, audio_tagging_loss=0.01327, over 3053746.35 frames. ], batch size: 60, lr: 2.81e-02, grad_scale: 32.0 2023-11-18 07:04:07,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.14 vs. limit=22.5 2023-11-18 07:04:28,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=112626.66666666667, ans=0.2 2023-11-18 07:04:32,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=112693.33333333333, ans=0.0 2023-11-18 07:04:40,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=112693.33333333333, ans=10.0 2023-11-18 07:04:42,629 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 1.047e+02 1.164e+02 1.344e+02 1.766e+02, threshold=2.328e+02, percent-clipped=0.0 2023-11-18 07:04:42,864 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:04:45,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=112760.0, ans=0.125 2023-11-18 07:04:56,042 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4900, loss[loss=0.1302, simple_loss=0.1349, pruned_loss=0.04929, audio_tagging_loss=0.01341, over 14833.00 frames. ], tot_loss[loss=0.1326, simple_loss=0.1376, pruned_loss=0.05056, audio_tagging_loss=0.0132, over 3051255.41 frames. ], batch size: 56, lr: 2.81e-02, grad_scale: 32.0 2023-11-18 07:04:56,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=112826.66666666667, ans=0.1 2023-11-18 07:05:05,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=112826.66666666667, ans=0.125 2023-11-18 07:05:06,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.15 vs. limit=10.0 2023-11-18 07:05:17,421 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:05:33,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=113026.66666666667, ans=0.1 2023-11-18 07:05:33,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=113026.66666666667, ans=0.0 2023-11-18 07:05:51,906 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 4950, loss[loss=0.1569, simple_loss=0.1771, pruned_loss=0.05828, audio_tagging_loss=0.01007, over 15655.00 frames. ], tot_loss[loss=0.1315, simple_loss=0.1369, pruned_loss=0.05008, audio_tagging_loss=0.01297, over 3055502.64 frames. ], batch size: 55, lr: 2.81e-02, grad_scale: 64.0 2023-11-18 07:05:53,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=113160.0, ans=0.0 2023-11-18 07:06:20,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=113293.33333333333, ans=0.125 2023-11-18 07:06:24,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=113360.0, ans=0.2 2023-11-18 07:06:34,752 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.114e+01 1.049e+02 1.181e+02 1.339e+02 2.582e+02, threshold=2.362e+02, percent-clipped=1.0 2023-11-18 07:06:48,202 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5000, loss[loss=0.1303, simple_loss=0.1439, pruned_loss=0.04798, audio_tagging_loss=0.01039, over 14816.00 frames. ], tot_loss[loss=0.132, simple_loss=0.1374, pruned_loss=0.05046, audio_tagging_loss=0.0128, over 3050514.46 frames. ], batch size: 57, lr: 2.80e-02, grad_scale: 64.0 2023-11-18 07:06:49,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.47 vs. limit=22.5 2023-11-18 07:06:57,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=113493.33333333333, ans=0.0 2023-11-18 07:07:03,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=113560.0, ans=0.125 2023-11-18 07:07:09,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.41 vs. limit=15.0 2023-11-18 07:07:14,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=113626.66666666667, ans=0.125 2023-11-18 07:07:23,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=113693.33333333333, ans=0.2 2023-11-18 07:07:33,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.25 vs. limit=15.0 2023-11-18 07:07:40,498 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.56 vs. limit=15.0 2023-11-18 07:07:44,827 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5050, loss[loss=0.1287, simple_loss=0.1315, pruned_loss=0.04994, audio_tagging_loss=0.01297, over 15406.00 frames. ], tot_loss[loss=0.1324, simple_loss=0.1382, pruned_loss=0.05062, audio_tagging_loss=0.01268, over 3051443.48 frames. ], batch size: 58, lr: 2.80e-02, grad_scale: 64.0 2023-11-18 07:07:58,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=113893.33333333333, ans=0.0 2023-11-18 07:08:27,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.042e+01 1.016e+02 1.164e+02 1.342e+02 1.810e+02, threshold=2.328e+02, percent-clipped=0.0 2023-11-18 07:08:41,045 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5100, loss[loss=0.1288, simple_loss=0.1327, pruned_loss=0.05206, audio_tagging_loss=0.01035, over 14614.00 frames. ], tot_loss[loss=0.1317, simple_loss=0.1377, pruned_loss=0.05022, audio_tagging_loss=0.01261, over 3053199.34 frames. ], batch size: 56, lr: 2.79e-02, grad_scale: 64.0 2023-11-18 07:08:41,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=114160.0, ans=0.2 2023-11-18 07:09:09,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=114293.33333333333, ans=0.0 2023-11-18 07:09:11,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=114293.33333333333, ans=0.0 2023-11-18 07:09:13,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=114360.0, ans=0.125 2023-11-18 07:09:24,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=114360.0, ans=0.0 2023-11-18 07:09:32,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=114426.66666666667, ans=0.0 2023-11-18 07:09:35,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=114426.66666666667, ans=0.2 2023-11-18 07:09:37,376 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5150, loss[loss=0.1138, simple_loss=0.1123, pruned_loss=0.04531, audio_tagging_loss=0.01238, over 15486.00 frames. ], tot_loss[loss=0.1301, simple_loss=0.1356, pruned_loss=0.04961, audio_tagging_loss=0.01273, over 3049012.23 frames. ], batch size: 58, lr: 2.79e-02, grad_scale: 16.0 2023-11-18 07:10:08,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=114626.66666666667, ans=0.125 2023-11-18 07:10:22,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.209e+01 1.018e+02 1.157e+02 1.320e+02 3.492e+02, threshold=2.315e+02, percent-clipped=2.0 2023-11-18 07:10:26,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=114760.0, ans=0.2 2023-11-18 07:10:31,365 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:10:33,996 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5200, loss[loss=0.08749, simple_loss=0.09196, pruned_loss=0.03021, audio_tagging_loss=0.01131, over 16018.00 frames. ], tot_loss[loss=0.1309, simple_loss=0.1367, pruned_loss=0.04991, audio_tagging_loss=0.01261, over 3041897.75 frames. ], batch size: 62, lr: 2.79e-02, grad_scale: 32.0 2023-11-18 07:10:42,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=114826.66666666667, ans=15.0 2023-11-18 07:11:09,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.94 vs. limit=22.5 2023-11-18 07:11:25,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=115093.33333333333, ans=0.2 2023-11-18 07:11:30,239 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5250, loss[loss=0.07402, simple_loss=0.06534, pruned_loss=0.02725, audio_tagging_loss=0.01409, over 14544.00 frames. ], tot_loss[loss=0.1309, simple_loss=0.1367, pruned_loss=0.04998, audio_tagging_loss=0.01255, over 3035990.79 frames. ], batch size: 57, lr: 2.78e-02, grad_scale: 32.0 2023-11-18 07:11:30,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=115160.0, ans=0.125 2023-11-18 07:11:43,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2023-11-18 07:11:45,002 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:11:55,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=115293.33333333333, ans=0.125 2023-11-18 07:12:07,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=115360.0, ans=0.09899494936611666 2023-11-18 07:12:14,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=115426.66666666667, ans=0.0 2023-11-18 07:12:15,368 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.068e+01 1.019e+02 1.120e+02 1.285e+02 1.660e+02, threshold=2.240e+02, percent-clipped=0.0 2023-11-18 07:12:19,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=115426.66666666667, ans=0.09899494936611666 2023-11-18 07:12:26,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2023-11-18 07:12:26,631 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5300, loss[loss=0.07888, simple_loss=0.07962, pruned_loss=0.02377, audio_tagging_loss=0.01531, over 16033.00 frames. ], tot_loss[loss=0.1318, simple_loss=0.138, pruned_loss=0.05047, audio_tagging_loss=0.01237, over 3044129.28 frames. ], batch size: 63, lr: 2.78e-02, grad_scale: 32.0 2023-11-18 07:12:39,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115560.0, ans=0.1 2023-11-18 07:12:50,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=115626.66666666667, ans=0.125 2023-11-18 07:13:06,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.43 vs. limit=6.0 2023-11-18 07:13:12,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115760.0, ans=0.1 2023-11-18 07:13:16,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=115760.0, ans=0.125 2023-11-18 07:13:22,351 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5350, loss[loss=0.1415, simple_loss=0.1445, pruned_loss=0.05717, audio_tagging_loss=0.01204, over 14632.00 frames. ], tot_loss[loss=0.1317, simple_loss=0.1376, pruned_loss=0.05051, audio_tagging_loss=0.01239, over 3043429.57 frames. ], batch size: 54, lr: 2.78e-02, grad_scale: 32.0 2023-11-18 07:13:38,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115893.33333333333, ans=0.1 2023-11-18 07:14:05,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.93 vs. limit=22.5 2023-11-18 07:14:08,566 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.439e+01 1.052e+02 1.204e+02 1.359e+02 2.060e+02, threshold=2.407e+02, percent-clipped=0.0 2023-11-18 07:14:20,307 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5400, loss[loss=0.1506, simple_loss=0.1579, pruned_loss=0.05676, audio_tagging_loss=0.01486, over 14699.00 frames. ], tot_loss[loss=0.131, simple_loss=0.1369, pruned_loss=0.05006, audio_tagging_loss=0.01252, over 3041823.76 frames. ], batch size: 53, lr: 2.77e-02, grad_scale: 32.0 2023-11-18 07:14:20,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=116160.0, ans=0.0 2023-11-18 07:14:36,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=116226.66666666667, ans=0.125 2023-11-18 07:14:48,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=116293.33333333333, ans=0.125 2023-11-18 07:14:50,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=116293.33333333333, ans=0.0 2023-11-18 07:15:15,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.08 vs. limit=15.0 2023-11-18 07:15:16,441 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5450, loss[loss=0.1228, simple_loss=0.1274, pruned_loss=0.04727, audio_tagging_loss=0.01179, over 15130.00 frames. ], tot_loss[loss=0.1318, simple_loss=0.1372, pruned_loss=0.0504, audio_tagging_loss=0.01275, over 3042165.48 frames. ], batch size: 56, lr: 2.77e-02, grad_scale: 32.0 2023-11-18 07:16:01,588 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.036e+01 1.012e+02 1.167e+02 1.341e+02 1.969e+02, threshold=2.335e+02, percent-clipped=0.0 2023-11-18 07:16:04,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.04 vs. limit=10.0 2023-11-18 07:16:12,363 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5500, loss[loss=0.1574, simple_loss=0.1772, pruned_loss=0.06131, audio_tagging_loss=0.007446, over 15566.00 frames. ], tot_loss[loss=0.132, simple_loss=0.1372, pruned_loss=0.05059, audio_tagging_loss=0.01277, over 3036448.69 frames. ], batch size: 56, lr: 2.77e-02, grad_scale: 32.0 2023-11-18 07:16:16,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.40 vs. limit=6.0 2023-11-18 07:16:22,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=116893.33333333333, ans=0.035 2023-11-18 07:16:25,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=116893.33333333333, ans=0.125 2023-11-18 07:16:31,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=116893.33333333333, ans=0.0 2023-11-18 07:16:32,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=116893.33333333333, ans=0.0 2023-11-18 07:16:44,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2023-11-18 07:16:46,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=117026.66666666667, ans=0.125 2023-11-18 07:16:56,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.07 vs. limit=12.0 2023-11-18 07:17:08,006 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5550, loss[loss=0.0997, simple_loss=0.08782, pruned_loss=0.03505, audio_tagging_loss=0.02074, over 15548.00 frames. ], tot_loss[loss=0.1312, simple_loss=0.1361, pruned_loss=0.05014, audio_tagging_loss=0.01299, over 3042595.32 frames. ], batch size: 61, lr: 2.76e-02, grad_scale: 32.0 2023-11-18 07:17:12,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2023-11-18 07:17:23,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=117226.66666666667, ans=0.07 2023-11-18 07:17:41,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=117360.0, ans=0.035 2023-11-18 07:17:53,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.091e+01 1.048e+02 1.161e+02 1.291e+02 1.886e+02, threshold=2.323e+02, percent-clipped=0.0 2023-11-18 07:17:55,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.52 vs. limit=22.5 2023-11-18 07:18:03,583 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:18:05,490 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5600, loss[loss=0.1208, simple_loss=0.1204, pruned_loss=0.04446, audio_tagging_loss=0.01615, over 15878.00 frames. ], tot_loss[loss=0.1321, simple_loss=0.1375, pruned_loss=0.05039, audio_tagging_loss=0.013, over 3050951.10 frames. ], batch size: 60, lr: 2.76e-02, grad_scale: 32.0 2023-11-18 07:18:06,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=117493.33333333333, ans=0.125 2023-11-18 07:18:16,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=117560.0, ans=0.125 2023-11-18 07:18:16,745 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.09 vs. limit=15.0 2023-11-18 07:18:28,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=117626.66666666667, ans=0.0 2023-11-18 07:18:30,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=117626.66666666667, ans=0.125 2023-11-18 07:18:34,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=117626.66666666667, ans=0.0 2023-11-18 07:18:35,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=117626.66666666667, ans=0.0 2023-11-18 07:18:45,160 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 07:18:46,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.34 vs. limit=10.0 2023-11-18 07:18:55,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.62 vs. limit=10.0 2023-11-18 07:18:59,264 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.439e+00 2023-11-18 07:19:01,201 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5650, loss[loss=0.1578, simple_loss=0.1652, pruned_loss=0.06334, audio_tagging_loss=0.01184, over 14884.00 frames. ], tot_loss[loss=0.1311, simple_loss=0.1363, pruned_loss=0.04985, audio_tagging_loss=0.01309, over 3052893.43 frames. ], batch size: 54, lr: 2.76e-02, grad_scale: 32.0 2023-11-18 07:19:06,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=117826.66666666667, ans=22.5 2023-11-18 07:19:25,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=117960.0, ans=0.0 2023-11-18 07:19:32,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=117960.0, ans=0.0 2023-11-18 07:19:45,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=118093.33333333333, ans=12.0 2023-11-18 07:19:46,194 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 1.030e+02 1.132e+02 1.306e+02 2.340e+02, threshold=2.264e+02, percent-clipped=1.0 2023-11-18 07:19:47,804 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.13 vs. limit=6.0 2023-11-18 07:19:51,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=118093.33333333333, ans=0.125 2023-11-18 07:19:57,378 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5700, loss[loss=0.09443, simple_loss=0.101, pruned_loss=0.03239, audio_tagging_loss=0.01155, over 15337.00 frames. ], tot_loss[loss=0.1311, simple_loss=0.1363, pruned_loss=0.04984, audio_tagging_loss=0.01306, over 3058232.53 frames. ], batch size: 58, lr: 2.75e-02, grad_scale: 32.0 2023-11-18 07:19:58,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=118160.0, ans=0.125 2023-11-18 07:20:00,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=118160.0, ans=0.125 2023-11-18 07:20:10,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=118226.66666666667, ans=0.2 2023-11-18 07:20:18,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=118226.66666666667, ans=0.125 2023-11-18 07:20:18,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=118226.66666666667, ans=0.125 2023-11-18 07:20:18,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=118226.66666666667, ans=0.125 2023-11-18 07:20:29,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2023-11-18 07:20:29,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=118360.0, ans=0.1 2023-11-18 07:20:35,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=118360.0, ans=0.1 2023-11-18 07:20:49,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2023-11-18 07:20:53,848 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5750, loss[loss=0.1909, simple_loss=0.2043, pruned_loss=0.08087, audio_tagging_loss=0.007833, over 15911.00 frames. ], tot_loss[loss=0.1311, simple_loss=0.1368, pruned_loss=0.04989, audio_tagging_loss=0.01281, over 3058513.82 frames. ], batch size: 58, lr: 2.75e-02, grad_scale: 32.0 2023-11-18 07:20:54,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=118493.33333333333, ans=0.1 2023-11-18 07:20:55,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=118493.33333333333, ans=0.1 2023-11-18 07:21:11,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2023-11-18 07:21:13,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=118560.0, ans=0.125 2023-11-18 07:21:20,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=118626.66666666667, ans=0.1 2023-11-18 07:21:28,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=118693.33333333333, ans=0.0 2023-11-18 07:21:38,451 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 1.021e+02 1.146e+02 1.318e+02 2.072e+02, threshold=2.291e+02, percent-clipped=0.0 2023-11-18 07:21:42,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=118760.0, ans=0.0 2023-11-18 07:21:49,062 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5800, loss[loss=0.1117, simple_loss=0.1048, pruned_loss=0.04241, audio_tagging_loss=0.01689, over 14485.00 frames. ], tot_loss[loss=0.131, simple_loss=0.1368, pruned_loss=0.04991, audio_tagging_loss=0.01269, over 3057789.30 frames. ], batch size: 54, lr: 2.75e-02, grad_scale: 32.0 2023-11-18 07:21:57,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=118826.66666666667, ans=0.125 2023-11-18 07:22:03,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=118893.33333333333, ans=0.125 2023-11-18 07:22:08,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=118893.33333333333, ans=0.125 2023-11-18 07:22:14,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2023-11-18 07:22:26,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2023-11-18 07:22:28,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2023-11-18 07:22:38,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=119093.33333333333, ans=0.0 2023-11-18 07:22:44,817 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5850, loss[loss=0.1464, simple_loss=0.1549, pruned_loss=0.05765, audio_tagging_loss=0.01126, over 15946.00 frames. ], tot_loss[loss=0.1304, simple_loss=0.136, pruned_loss=0.04963, audio_tagging_loss=0.01277, over 3053701.68 frames. ], batch size: 60, lr: 2.74e-02, grad_scale: 32.0 2023-11-18 07:22:47,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=119160.0, ans=0.125 2023-11-18 07:22:48,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=119160.0, ans=0.125 2023-11-18 07:22:56,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=119226.66666666667, ans=0.125 2023-11-18 07:22:58,836 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:23:04,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=119226.66666666667, ans=0.1 2023-11-18 07:23:06,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=119293.33333333333, ans=0.1 2023-11-18 07:23:10,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=119293.33333333333, ans=0.0 2023-11-18 07:23:29,154 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.248e+01 1.029e+02 1.172e+02 1.323e+02 1.755e+02, threshold=2.344e+02, percent-clipped=0.0 2023-11-18 07:23:37,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=119426.66666666667, ans=0.125 2023-11-18 07:23:40,986 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5900, loss[loss=0.1343, simple_loss=0.1313, pruned_loss=0.05527, audio_tagging_loss=0.01337, over 14743.00 frames. ], tot_loss[loss=0.1319, simple_loss=0.138, pruned_loss=0.05029, audio_tagging_loss=0.01265, over 3048839.71 frames. ], batch size: 56, lr: 2.74e-02, grad_scale: 32.0 2023-11-18 07:23:53,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.41 vs. limit=15.0 2023-11-18 07:24:00,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=119560.0, ans=0.125 2023-11-18 07:24:09,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2023-11-18 07:24:25,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=119760.0, ans=0.2 2023-11-18 07:24:36,656 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 5950, loss[loss=0.1369, simple_loss=0.1478, pruned_loss=0.05278, audio_tagging_loss=0.01019, over 15169.00 frames. ], tot_loss[loss=0.1326, simple_loss=0.1389, pruned_loss=0.05062, audio_tagging_loss=0.0125, over 3052764.31 frames. ], batch size: 58, lr: 2.74e-02, grad_scale: 32.0 2023-11-18 07:24:40,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=119826.66666666667, ans=0.1 2023-11-18 07:24:54,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.92 vs. limit=10.0 2023-11-18 07:25:21,030 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.274e+01 1.016e+02 1.169e+02 1.321e+02 1.949e+02, threshold=2.338e+02, percent-clipped=0.0 2023-11-18 07:25:22,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=120093.33333333333, ans=0.0 2023-11-18 07:25:32,023 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6000, loss[loss=0.1485, simple_loss=0.1603, pruned_loss=0.05881, audio_tagging_loss=0.009587, over 17108.00 frames. ], tot_loss[loss=0.1322, simple_loss=0.1388, pruned_loss=0.05038, audio_tagging_loss=0.01244, over 3049075.63 frames. ], batch size: 64, lr: 2.73e-02, grad_scale: 32.0 2023-11-18 07:25:32,024 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 07:26:04,336 INFO [train_asr.py:1147] (1/4) Epoch 2, validation: loss=0.08772, simple_loss=0.06916, pruned_loss=0.01519, audio_tagging_loss=0.03794, over 4681554.00 frames. 2023-11-18 07:26:04,337 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 07:26:10,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=120160.0, ans=0.1 2023-11-18 07:26:11,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=120160.0, ans=0.125 2023-11-18 07:26:17,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=120226.66666666667, ans=0.0 2023-11-18 07:26:21,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.02 vs. limit=6.0 2023-11-18 07:26:25,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=120293.33333333333, ans=0.0 2023-11-18 07:26:26,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=120293.33333333333, ans=0.125 2023-11-18 07:26:37,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=120360.0, ans=0.125 2023-11-18 07:26:44,870 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 07:26:56,529 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:27:00,584 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6050, loss[loss=0.155, simple_loss=0.1568, pruned_loss=0.0624, audio_tagging_loss=0.01421, over 15836.00 frames. ], tot_loss[loss=0.1321, simple_loss=0.1384, pruned_loss=0.05044, audio_tagging_loss=0.01246, over 3048349.07 frames. ], batch size: 59, lr: 2.73e-02, grad_scale: 32.0 2023-11-18 07:27:01,098 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.16 vs. limit=22.5 2023-11-18 07:27:01,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=120493.33333333333, ans=0.2 2023-11-18 07:27:01,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=120493.33333333333, ans=0.0 2023-11-18 07:27:02,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=120493.33333333333, ans=0.0 2023-11-18 07:27:07,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.88 vs. limit=6.0 2023-11-18 07:27:22,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.61 vs. limit=22.5 2023-11-18 07:27:23,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=120626.66666666667, ans=0.125 2023-11-18 07:27:39,940 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:27:41,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=120693.33333333333, ans=0.125 2023-11-18 07:27:43,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=120693.33333333333, ans=0.0 2023-11-18 07:27:46,215 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.607e+01 1.101e+02 1.234e+02 1.349e+02 2.388e+02, threshold=2.468e+02, percent-clipped=1.0 2023-11-18 07:27:46,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=120760.0, ans=0.0 2023-11-18 07:27:54,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=120760.0, ans=0.125 2023-11-18 07:27:54,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=120760.0, ans=0.125 2023-11-18 07:27:57,583 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6100, loss[loss=0.1549, simple_loss=0.166, pruned_loss=0.06015, audio_tagging_loss=0.01174, over 14695.00 frames. ], tot_loss[loss=0.1307, simple_loss=0.1365, pruned_loss=0.04983, audio_tagging_loss=0.01266, over 3046897.90 frames. ], batch size: 54, lr: 2.73e-02, grad_scale: 32.0 2023-11-18 07:28:06,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=120826.66666666667, ans=0.025 2023-11-18 07:28:25,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=120960.0, ans=0.0 2023-11-18 07:28:43,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=121093.33333333333, ans=15.0 2023-11-18 07:28:54,892 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6150, loss[loss=0.1455, simple_loss=0.1529, pruned_loss=0.05765, audio_tagging_loss=0.0114, over 15438.00 frames. ], tot_loss[loss=0.1308, simple_loss=0.1365, pruned_loss=0.04987, audio_tagging_loss=0.01264, over 3050529.43 frames. ], batch size: 57, lr: 2.73e-02, grad_scale: 32.0 2023-11-18 07:29:08,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=121226.66666666667, ans=0.125 2023-11-18 07:29:16,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=121293.33333333333, ans=0.125 2023-11-18 07:29:16,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=121293.33333333333, ans=0.125 2023-11-18 07:29:24,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=121293.33333333333, ans=0.125 2023-11-18 07:29:37,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=121360.0, ans=0.0 2023-11-18 07:29:40,652 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 1.065e+02 1.218e+02 1.371e+02 2.442e+02, threshold=2.436e+02, percent-clipped=0.0 2023-11-18 07:29:52,066 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6200, loss[loss=0.139, simple_loss=0.1446, pruned_loss=0.05314, audio_tagging_loss=0.01355, over 14708.00 frames. ], tot_loss[loss=0.1316, simple_loss=0.1377, pruned_loss=0.05007, audio_tagging_loss=0.01267, over 3050418.27 frames. ], batch size: 56, lr: 2.72e-02, grad_scale: 32.0 2023-11-18 07:29:56,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=121493.33333333333, ans=0.125 2023-11-18 07:29:58,737 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.435e+00 2023-11-18 07:30:00,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2023-11-18 07:30:29,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=121693.33333333333, ans=0.025 2023-11-18 07:30:46,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=121760.0, ans=0.125 2023-11-18 07:30:48,807 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6250, loss[loss=0.1325, simple_loss=0.1378, pruned_loss=0.05073, audio_tagging_loss=0.0129, over 15175.00 frames. ], tot_loss[loss=0.1315, simple_loss=0.1373, pruned_loss=0.05003, audio_tagging_loss=0.01279, over 3047706.37 frames. ], batch size: 57, lr: 2.72e-02, grad_scale: 32.0 2023-11-18 07:31:33,887 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 1.000e+02 1.109e+02 1.237e+02 1.670e+02, threshold=2.218e+02, percent-clipped=0.0 2023-11-18 07:31:40,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=122093.33333333333, ans=0.2 2023-11-18 07:31:42,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2023-11-18 07:31:45,281 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6300, loss[loss=0.1238, simple_loss=0.132, pruned_loss=0.04922, audio_tagging_loss=0.008629, over 15657.00 frames. ], tot_loss[loss=0.1326, simple_loss=0.1385, pruned_loss=0.05044, audio_tagging_loss=0.01291, over 3050575.71 frames. ], batch size: 56, lr: 2.72e-02, grad_scale: 32.0 2023-11-18 07:31:46,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=122160.0, ans=0.2 2023-11-18 07:31:49,313 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:32:06,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=122226.66666666667, ans=0.0 2023-11-18 07:32:15,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.89 vs. limit=15.0 2023-11-18 07:32:25,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=122360.0, ans=0.0 2023-11-18 07:32:27,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=122360.0, ans=0.125 2023-11-18 07:32:34,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.93 vs. limit=6.0 2023-11-18 07:32:42,032 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6350, loss[loss=0.1404, simple_loss=0.1483, pruned_loss=0.05254, audio_tagging_loss=0.01367, over 15742.00 frames. ], tot_loss[loss=0.1329, simple_loss=0.139, pruned_loss=0.05041, audio_tagging_loss=0.01293, over 3050102.13 frames. ], batch size: 56, lr: 2.71e-02, grad_scale: 32.0 2023-11-18 07:32:47,928 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.96 vs. limit=22.5 2023-11-18 07:33:08,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=122626.66666666667, ans=0.0 2023-11-18 07:33:19,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=122693.33333333333, ans=0.1 2023-11-18 07:33:27,681 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.961e+01 1.030e+02 1.146e+02 1.327e+02 2.114e+02, threshold=2.291e+02, percent-clipped=0.0 2023-11-18 07:33:27,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=122760.0, ans=0.0 2023-11-18 07:33:30,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=122760.0, ans=0.5 2023-11-18 07:33:39,600 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6400, loss[loss=0.09266, simple_loss=0.09836, pruned_loss=0.03119, audio_tagging_loss=0.01229, over 15718.00 frames. ], tot_loss[loss=0.1314, simple_loss=0.1373, pruned_loss=0.04972, audio_tagging_loss=0.0131, over 3052390.25 frames. ], batch size: 60, lr: 2.71e-02, grad_scale: 32.0 2023-11-18 07:33:40,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2023-11-18 07:33:50,512 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.59 vs. limit=15.0 2023-11-18 07:33:54,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=122893.33333333333, ans=0.0 2023-11-18 07:33:56,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=122893.33333333333, ans=0.2 2023-11-18 07:34:00,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.01 vs. limit=6.0 2023-11-18 07:34:03,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=122960.0, ans=0.2 2023-11-18 07:34:35,488 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6450, loss[loss=0.15, simple_loss=0.1526, pruned_loss=0.06204, audio_tagging_loss=0.01166, over 14399.00 frames. ], tot_loss[loss=0.1308, simple_loss=0.1362, pruned_loss=0.04942, audio_tagging_loss=0.01324, over 3047998.72 frames. ], batch size: 54, lr: 2.71e-02, grad_scale: 32.0 2023-11-18 07:34:42,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=123160.0, ans=0.0 2023-11-18 07:35:01,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=123293.33333333333, ans=0.09899494936611666 2023-11-18 07:35:01,629 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2023-11-18 07:35:20,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 1.021e+02 1.177e+02 1.311e+02 2.345e+02, threshold=2.354e+02, percent-clipped=1.0 2023-11-18 07:35:31,883 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6500, loss[loss=0.1625, simple_loss=0.1619, pruned_loss=0.06865, audio_tagging_loss=0.01295, over 17142.00 frames. ], tot_loss[loss=0.1299, simple_loss=0.1354, pruned_loss=0.04904, audio_tagging_loss=0.01321, over 3054199.04 frames. ], batch size: 66, lr: 2.70e-02, grad_scale: 32.0 2023-11-18 07:36:03,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2023-11-18 07:36:21,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=123760.0, ans=0.0 2023-11-18 07:36:28,336 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6550, loss[loss=0.1136, simple_loss=0.1195, pruned_loss=0.04495, audio_tagging_loss=0.008893, over 15031.00 frames. ], tot_loss[loss=0.1306, simple_loss=0.1367, pruned_loss=0.04932, audio_tagging_loss=0.01288, over 3043707.97 frames. ], batch size: 57, lr: 2.70e-02, grad_scale: 32.0 2023-11-18 07:36:28,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=123826.66666666667, ans=0.125 2023-11-18 07:36:38,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2023-11-18 07:36:58,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=123960.0, ans=0.125 2023-11-18 07:37:13,824 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 9.989e+01 1.139e+02 1.347e+02 1.768e+02, threshold=2.277e+02, percent-clipped=0.0 2023-11-18 07:37:18,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.48 vs. limit=15.0 2023-11-18 07:37:24,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=124160.0, ans=0.2 2023-11-18 07:37:25,650 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6600, loss[loss=0.1516, simple_loss=0.1616, pruned_loss=0.05839, audio_tagging_loss=0.0124, over 14707.00 frames. ], tot_loss[loss=0.13, simple_loss=0.1361, pruned_loss=0.04917, audio_tagging_loss=0.01283, over 3035685.26 frames. ], batch size: 56, lr: 2.70e-02, grad_scale: 32.0 2023-11-18 07:37:26,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=124160.0, ans=0.125 2023-11-18 07:37:30,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=124160.0, ans=0.125 2023-11-18 07:37:46,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=124293.33333333333, ans=0.035 2023-11-18 07:37:55,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2023-11-18 07:37:58,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=124360.0, ans=0.0 2023-11-18 07:38:10,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=124426.66666666667, ans=0.0 2023-11-18 07:38:21,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=124493.33333333333, ans=0.2 2023-11-18 07:38:22,510 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6650, loss[loss=0.1424, simple_loss=0.1478, pruned_loss=0.05747, audio_tagging_loss=0.01101, over 14719.00 frames. ], tot_loss[loss=0.1312, simple_loss=0.1376, pruned_loss=0.04972, audio_tagging_loss=0.01269, over 3041935.47 frames. ], batch size: 54, lr: 2.69e-02, grad_scale: 32.0 2023-11-18 07:38:23,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=124493.33333333333, ans=0.125 2023-11-18 07:38:35,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=124560.0, ans=0.0 2023-11-18 07:38:46,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=124626.66666666667, ans=0.1 2023-11-18 07:38:48,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2023-11-18 07:38:52,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2023-11-18 07:39:00,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=15.0 2023-11-18 07:39:07,718 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.903e+01 1.041e+02 1.128e+02 1.286e+02 1.870e+02, threshold=2.255e+02, percent-clipped=0.0 2023-11-18 07:39:16,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.58 vs. limit=22.5 2023-11-18 07:39:18,521 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6700, loss[loss=0.1007, simple_loss=0.09373, pruned_loss=0.03637, audio_tagging_loss=0.01745, over 15662.00 frames. ], tot_loss[loss=0.1307, simple_loss=0.137, pruned_loss=0.04953, audio_tagging_loss=0.0127, over 3038047.57 frames. ], batch size: 59, lr: 2.69e-02, grad_scale: 32.0 2023-11-18 07:39:18,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=124826.66666666667, ans=0.0 2023-11-18 07:39:27,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=124826.66666666667, ans=0.07 2023-11-18 07:39:41,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=124960.0, ans=0.1 2023-11-18 07:40:16,328 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6750, loss[loss=0.1139, simple_loss=0.1191, pruned_loss=0.04221, audio_tagging_loss=0.01219, over 15196.00 frames. ], tot_loss[loss=0.1297, simple_loss=0.1362, pruned_loss=0.04898, audio_tagging_loss=0.01266, over 3034620.35 frames. ], batch size: 58, lr: 2.69e-02, grad_scale: 32.0 2023-11-18 07:40:19,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=125160.0, ans=0.0 2023-11-18 07:40:21,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-11-18 07:40:25,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=125160.0, ans=0.125 2023-11-18 07:40:40,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2023-11-18 07:40:45,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=125293.33333333333, ans=0.0 2023-11-18 07:41:01,711 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.301e+01 1.017e+02 1.137e+02 1.334e+02 2.157e+02, threshold=2.275e+02, percent-clipped=0.0 2023-11-18 07:41:01,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=125426.66666666667, ans=0.0 2023-11-18 07:41:09,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=22.5 2023-11-18 07:41:13,076 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6800, loss[loss=0.112, simple_loss=0.1099, pruned_loss=0.04188, audio_tagging_loss=0.01522, over 14722.00 frames. ], tot_loss[loss=0.1291, simple_loss=0.1356, pruned_loss=0.04859, audio_tagging_loss=0.01273, over 3035164.08 frames. ], batch size: 58, lr: 2.68e-02, grad_scale: 32.0 2023-11-18 07:41:24,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=125560.0, ans=15.0 2023-11-18 07:41:34,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=125626.66666666667, ans=0.125 2023-11-18 07:41:35,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=125626.66666666667, ans=0.0 2023-11-18 07:41:37,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=125626.66666666667, ans=0.125 2023-11-18 07:41:56,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=125693.33333333333, ans=0.1 2023-11-18 07:42:07,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=125760.0, ans=0.1 2023-11-18 07:42:09,015 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6850, loss[loss=0.1442, simple_loss=0.1546, pruned_loss=0.05492, audio_tagging_loss=0.01198, over 15930.00 frames. ], tot_loss[loss=0.1285, simple_loss=0.1349, pruned_loss=0.0484, audio_tagging_loss=0.01266, over 3034562.63 frames. ], batch size: 58, lr: 2.68e-02, grad_scale: 32.0 2023-11-18 07:42:11,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=125826.66666666667, ans=0.07 2023-11-18 07:42:18,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.00 vs. limit=12.0 2023-11-18 07:42:38,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=125960.0, ans=0.125 2023-11-18 07:42:44,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.37 vs. limit=15.0 2023-11-18 07:42:54,311 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 9.921e+01 1.152e+02 1.334e+02 2.003e+02, threshold=2.305e+02, percent-clipped=0.0 2023-11-18 07:42:57,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.02 vs. limit=6.0 2023-11-18 07:43:05,707 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6900, loss[loss=0.1516, simple_loss=0.1552, pruned_loss=0.06386, audio_tagging_loss=0.01008, over 14927.00 frames. ], tot_loss[loss=0.1287, simple_loss=0.1357, pruned_loss=0.04827, audio_tagging_loss=0.01264, over 3043776.22 frames. ], batch size: 55, lr: 2.68e-02, grad_scale: 32.0 2023-11-18 07:43:10,147 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.88 vs. limit=10.0 2023-11-18 07:43:29,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=126293.33333333333, ans=0.2 2023-11-18 07:43:33,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=126293.33333333333, ans=10.0 2023-11-18 07:43:48,927 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 07:44:00,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=126426.66666666667, ans=0.0 2023-11-18 07:44:03,090 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 6950, loss[loss=0.1143, simple_loss=0.1146, pruned_loss=0.04137, audio_tagging_loss=0.01566, over 15743.00 frames. ], tot_loss[loss=0.1281, simple_loss=0.1352, pruned_loss=0.04783, audio_tagging_loss=0.01263, over 3042974.43 frames. ], batch size: 61, lr: 2.68e-02, grad_scale: 32.0 2023-11-18 07:44:12,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=126493.33333333333, ans=0.0 2023-11-18 07:44:23,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=126560.0, ans=10.0 2023-11-18 07:44:27,150 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.24 vs. limit=22.5 2023-11-18 07:44:38,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.21 vs. limit=15.0 2023-11-18 07:44:48,853 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.915e+01 1.002e+02 1.157e+02 1.287e+02 1.874e+02, threshold=2.315e+02, percent-clipped=0.0 2023-11-18 07:44:50,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=126760.0, ans=0.0 2023-11-18 07:44:57,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=126760.0, ans=0.125 2023-11-18 07:44:59,693 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7000, loss[loss=0.1373, simple_loss=0.1463, pruned_loss=0.05179, audio_tagging_loss=0.01238, over 15265.00 frames. ], tot_loss[loss=0.1277, simple_loss=0.1344, pruned_loss=0.04769, audio_tagging_loss=0.01276, over 3043764.78 frames. ], batch size: 56, lr: 2.67e-02, grad_scale: 32.0 2023-11-18 07:45:11,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=126893.33333333333, ans=0.125 2023-11-18 07:45:14,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=126893.33333333333, ans=0.125 2023-11-18 07:45:26,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=126960.0, ans=0.125 2023-11-18 07:45:27,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=126960.0, ans=0.1 2023-11-18 07:45:27,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=126960.0, ans=0.2 2023-11-18 07:45:29,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2023-11-18 07:45:34,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=127026.66666666667, ans=0.0 2023-11-18 07:45:35,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=127026.66666666667, ans=0.125 2023-11-18 07:45:35,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=15.0 2023-11-18 07:45:37,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=127026.66666666667, ans=0.125 2023-11-18 07:45:43,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.61 vs. limit=15.0 2023-11-18 07:45:46,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=127093.33333333333, ans=0.2 2023-11-18 07:45:47,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=127093.33333333333, ans=0.0 2023-11-18 07:45:47,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-11-18 07:45:56,130 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7050, loss[loss=0.1005, simple_loss=0.1061, pruned_loss=0.03286, audio_tagging_loss=0.01459, over 16120.00 frames. ], tot_loss[loss=0.128, simple_loss=0.1349, pruned_loss=0.04787, audio_tagging_loss=0.0127, over 3046833.94 frames. ], batch size: 60, lr: 2.67e-02, grad_scale: 32.0 2023-11-18 07:46:03,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=127160.0, ans=0.0 2023-11-18 07:46:39,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=127360.0, ans=0.05 2023-11-18 07:46:40,325 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:46:40,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=127426.66666666667, ans=0.125 2023-11-18 07:46:40,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=127426.66666666667, ans=0.0 2023-11-18 07:46:41,215 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 1.025e+02 1.162e+02 1.246e+02 1.816e+02, threshold=2.324e+02, percent-clipped=0.0 2023-11-18 07:46:41,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=127426.66666666667, ans=0.125 2023-11-18 07:46:53,126 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7100, loss[loss=0.1135, simple_loss=0.1179, pruned_loss=0.04232, audio_tagging_loss=0.01221, over 16085.00 frames. ], tot_loss[loss=0.1284, simple_loss=0.1348, pruned_loss=0.04808, audio_tagging_loss=0.0129, over 3050741.92 frames. ], batch size: 61, lr: 2.67e-02, grad_scale: 32.0 2023-11-18 07:46:53,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=127493.33333333333, ans=0.125 2023-11-18 07:47:02,595 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:47:09,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=127560.0, ans=0.125 2023-11-18 07:47:11,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2023-11-18 07:47:26,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=127693.33333333333, ans=0.2 2023-11-18 07:47:30,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=127693.33333333333, ans=0.2 2023-11-18 07:47:32,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=127693.33333333333, ans=0.125 2023-11-18 07:47:47,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=127760.0, ans=0.125 2023-11-18 07:47:49,829 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7150, loss[loss=0.1226, simple_loss=0.1264, pruned_loss=0.04879, audio_tagging_loss=0.01055, over 14170.00 frames. ], tot_loss[loss=0.1288, simple_loss=0.1355, pruned_loss=0.04816, audio_tagging_loss=0.01291, over 3054015.20 frames. ], batch size: 54, lr: 2.66e-02, grad_scale: 64.0 2023-11-18 07:48:03,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=127893.33333333333, ans=0.2 2023-11-18 07:48:05,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.88 vs. limit=22.5 2023-11-18 07:48:10,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=127893.33333333333, ans=0.0 2023-11-18 07:48:27,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2023-11-18 07:48:35,317 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.931e+01 1.022e+02 1.134e+02 1.283e+02 2.595e+02, threshold=2.267e+02, percent-clipped=2.0 2023-11-18 07:48:46,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.66 vs. limit=22.5 2023-11-18 07:48:46,826 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7200, loss[loss=0.151, simple_loss=0.1656, pruned_loss=0.056, audio_tagging_loss=0.01223, over 14987.00 frames. ], tot_loss[loss=0.1304, simple_loss=0.1371, pruned_loss=0.04886, audio_tagging_loss=0.01295, over 3053910.47 frames. ], batch size: 58, lr: 2.66e-02, grad_scale: 64.0 2023-11-18 07:49:09,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2023-11-18 07:49:28,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=128360.0, ans=0.125 2023-11-18 07:49:39,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=128426.66666666667, ans=0.125 2023-11-18 07:49:43,999 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7250, loss[loss=0.1047, simple_loss=0.1111, pruned_loss=0.03579, audio_tagging_loss=0.01341, over 13786.00 frames. ], tot_loss[loss=0.1298, simple_loss=0.1363, pruned_loss=0.04861, audio_tagging_loss=0.01308, over 3047712.81 frames. ], batch size: 52, lr: 2.66e-02, grad_scale: 64.0 2023-11-18 07:49:49,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=128493.33333333333, ans=0.125 2023-11-18 07:50:10,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=128626.66666666667, ans=0.125 2023-11-18 07:50:13,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=128626.66666666667, ans=0.05 2023-11-18 07:50:29,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.726e+01 1.028e+02 1.141e+02 1.257e+02 1.791e+02, threshold=2.282e+02, percent-clipped=0.0 2023-11-18 07:50:40,925 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7300, loss[loss=0.1575, simple_loss=0.167, pruned_loss=0.06614, audio_tagging_loss=0.007905, over 15491.00 frames. ], tot_loss[loss=0.13, simple_loss=0.1368, pruned_loss=0.04878, audio_tagging_loss=0.01282, over 3050487.65 frames. ], batch size: 58, lr: 2.65e-02, grad_scale: 64.0 2023-11-18 07:50:44,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=128826.66666666667, ans=0.0 2023-11-18 07:50:48,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.63 vs. limit=22.5 2023-11-18 07:50:54,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=128893.33333333333, ans=0.125 2023-11-18 07:50:59,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128893.33333333333, ans=0.1 2023-11-18 07:51:18,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=129026.66666666667, ans=0.1 2023-11-18 07:51:29,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2023-11-18 07:51:37,924 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7350, loss[loss=0.1275, simple_loss=0.1301, pruned_loss=0.04999, audio_tagging_loss=0.01242, over 15489.00 frames. ], tot_loss[loss=0.128, simple_loss=0.1345, pruned_loss=0.04806, audio_tagging_loss=0.01268, over 3052232.13 frames. ], batch size: 57, lr: 2.65e-02, grad_scale: 64.0 2023-11-18 07:52:23,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 1.003e+02 1.122e+02 1.264e+02 2.098e+02, threshold=2.243e+02, percent-clipped=0.0 2023-11-18 07:52:35,920 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7400, loss[loss=0.1086, simple_loss=0.111, pruned_loss=0.03891, audio_tagging_loss=0.01417, over 17319.00 frames. ], tot_loss[loss=0.1288, simple_loss=0.1355, pruned_loss=0.04846, audio_tagging_loss=0.01258, over 3047606.57 frames. ], batch size: 64, lr: 2.65e-02, grad_scale: 64.0 2023-11-18 07:52:52,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.14 vs. limit=22.5 2023-11-18 07:52:55,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=129560.0, ans=0.0 2023-11-18 07:53:32,142 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7450, loss[loss=0.1078, simple_loss=0.1102, pruned_loss=0.03796, audio_tagging_loss=0.01476, over 15401.00 frames. ], tot_loss[loss=0.1289, simple_loss=0.1356, pruned_loss=0.04857, audio_tagging_loss=0.01252, over 3042540.18 frames. ], batch size: 61, lr: 2.65e-02, grad_scale: 64.0 2023-11-18 07:53:33,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=129826.66666666667, ans=0.125 2023-11-18 07:53:39,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=129826.66666666667, ans=0.125 2023-11-18 07:53:43,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=129893.33333333333, ans=0.125 2023-11-18 07:53:45,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.76 vs. limit=6.0 2023-11-18 07:54:10,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2023-11-18 07:54:16,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=130093.33333333333, ans=0.125 2023-11-18 07:54:17,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 1.036e+02 1.151e+02 1.366e+02 1.976e+02, threshold=2.301e+02, percent-clipped=0.0 2023-11-18 07:54:25,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=130093.33333333333, ans=0.125 2023-11-18 07:54:29,346 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7500, loss[loss=0.1041, simple_loss=0.1085, pruned_loss=0.03815, audio_tagging_loss=0.01171, over 15537.00 frames. ], tot_loss[loss=0.1287, simple_loss=0.1359, pruned_loss=0.04835, audio_tagging_loss=0.0124, over 3050904.88 frames. ], batch size: 58, lr: 2.64e-02, grad_scale: 64.0 2023-11-18 07:54:31,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.19 vs. limit=15.0 2023-11-18 07:54:33,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=130160.0, ans=0.125 2023-11-18 07:55:17,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2023-11-18 07:55:25,985 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7550, loss[loss=0.1161, simple_loss=0.1239, pruned_loss=0.04398, audio_tagging_loss=0.01009, over 15794.00 frames. ], tot_loss[loss=0.1287, simple_loss=0.1356, pruned_loss=0.04853, audio_tagging_loss=0.01239, over 3053618.98 frames. ], batch size: 60, lr: 2.64e-02, grad_scale: 64.0 2023-11-18 07:55:29,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.64 vs. limit=10.0 2023-11-18 07:56:09,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=130693.33333333333, ans=0.2 2023-11-18 07:56:12,136 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.352e+01 1.032e+02 1.116e+02 1.247e+02 1.797e+02, threshold=2.232e+02, percent-clipped=0.0 2023-11-18 07:56:22,897 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7600, loss[loss=0.1356, simple_loss=0.1406, pruned_loss=0.05054, audio_tagging_loss=0.01478, over 15123.00 frames. ], tot_loss[loss=0.1276, simple_loss=0.1346, pruned_loss=0.04799, audio_tagging_loss=0.01237, over 3057300.36 frames. ], batch size: 58, lr: 2.64e-02, grad_scale: 64.0 2023-11-18 07:56:35,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2023-11-18 07:56:45,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=130960.0, ans=0.04949747468305833 2023-11-18 07:56:51,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=130960.0, ans=0.0 2023-11-18 07:56:54,633 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-11-18 07:57:00,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=131026.66666666667, ans=0.1 2023-11-18 07:57:10,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=131093.33333333334, ans=15.0 2023-11-18 07:57:19,657 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7650, loss[loss=0.1396, simple_loss=0.1458, pruned_loss=0.0526, audio_tagging_loss=0.0141, over 15057.00 frames. ], tot_loss[loss=0.1275, simple_loss=0.1341, pruned_loss=0.04805, audio_tagging_loss=0.01239, over 3053170.11 frames. ], batch size: 58, lr: 2.63e-02, grad_scale: 64.0 2023-11-18 07:57:31,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.19 vs. limit=15.0 2023-11-18 07:57:36,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.45 vs. limit=15.0 2023-11-18 07:57:43,282 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:57:44,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=131293.33333333334, ans=0.0 2023-11-18 07:57:50,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=131293.33333333334, ans=0.125 2023-11-18 07:57:57,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.31 vs. limit=15.0 2023-11-18 07:57:59,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2023-11-18 07:58:01,822 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.98 vs. limit=6.0 2023-11-18 07:58:05,096 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 1.041e+02 1.157e+02 1.349e+02 1.751e+02, threshold=2.314e+02, percent-clipped=0.0 2023-11-18 07:58:14,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=131426.66666666666, ans=10.0 2023-11-18 07:58:16,552 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7700, loss[loss=0.1151, simple_loss=0.1266, pruned_loss=0.042, audio_tagging_loss=0.009783, over 15576.00 frames. ], tot_loss[loss=0.1268, simple_loss=0.1338, pruned_loss=0.04745, audio_tagging_loss=0.01241, over 3058580.25 frames. ], batch size: 57, lr: 2.63e-02, grad_scale: 64.0 2023-11-18 07:58:22,682 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:58:25,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=131493.33333333334, ans=0.125 2023-11-18 07:58:28,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=131560.0, ans=0.125 2023-11-18 07:58:38,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.75 vs. limit=15.0 2023-11-18 07:58:38,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=131626.66666666666, ans=0.125 2023-11-18 07:58:50,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.52 vs. limit=22.5 2023-11-18 07:58:55,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=131693.33333333334, ans=0.04949747468305833 2023-11-18 07:59:01,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=131760.0, ans=0.0 2023-11-18 07:59:05,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=131760.0, ans=0.0 2023-11-18 07:59:13,109 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7750, loss[loss=0.1022, simple_loss=0.09892, pruned_loss=0.03917, audio_tagging_loss=0.01361, over 15405.00 frames. ], tot_loss[loss=0.1275, simple_loss=0.1344, pruned_loss=0.04782, audio_tagging_loss=0.01248, over 3060469.99 frames. ], batch size: 61, lr: 2.63e-02, grad_scale: 64.0 2023-11-18 07:59:39,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131960.0, ans=0.1 2023-11-18 07:59:56,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=132026.66666666666, ans=0.1 2023-11-18 07:59:58,785 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 1.007e+02 1.110e+02 1.214e+02 2.200e+02, threshold=2.220e+02, percent-clipped=0.0 2023-11-18 08:00:09,681 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7800, loss[loss=0.09635, simple_loss=0.0902, pruned_loss=0.03648, audio_tagging_loss=0.01478, over 15244.00 frames. ], tot_loss[loss=0.1274, simple_loss=0.1343, pruned_loss=0.04771, audio_tagging_loss=0.01259, over 3058116.32 frames. ], batch size: 58, lr: 2.62e-02, grad_scale: 64.0 2023-11-18 08:00:42,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=132293.33333333334, ans=0.0 2023-11-18 08:00:42,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=15.0 2023-11-18 08:01:05,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=132426.66666666666, ans=0.1 2023-11-18 08:01:07,209 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7850, loss[loss=0.1453, simple_loss=0.1535, pruned_loss=0.05336, audio_tagging_loss=0.01513, over 15010.00 frames. ], tot_loss[loss=0.1284, simple_loss=0.1351, pruned_loss=0.0481, audio_tagging_loss=0.01274, over 3057597.09 frames. ], batch size: 54, lr: 2.62e-02, grad_scale: 64.0 2023-11-18 08:01:21,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.22 vs. limit=12.0 2023-11-18 08:01:24,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=132560.0, ans=0.0 2023-11-18 08:01:25,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.27 vs. limit=10.0 2023-11-18 08:01:28,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2023-11-18 08:01:41,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=132693.33333333334, ans=0.2 2023-11-18 08:01:41,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=132693.33333333334, ans=0.1 2023-11-18 08:01:43,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=132693.33333333334, ans=0.07 2023-11-18 08:01:45,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.63 vs. limit=15.0 2023-11-18 08:01:53,777 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.431e+01 1.081e+02 1.200e+02 1.326e+02 3.280e+02, threshold=2.400e+02, percent-clipped=1.0 2023-11-18 08:01:54,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.73 vs. limit=15.0 2023-11-18 08:01:56,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=132760.0, ans=0.125 2023-11-18 08:02:03,870 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7900, loss[loss=0.1415, simple_loss=0.1467, pruned_loss=0.05317, audio_tagging_loss=0.01497, over 15170.00 frames. ], tot_loss[loss=0.129, simple_loss=0.1355, pruned_loss=0.04832, audio_tagging_loss=0.01294, over 3048369.20 frames. ], batch size: 56, lr: 2.62e-02, grad_scale: 32.0 2023-11-18 08:02:06,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=132826.66666666666, ans=0.2 2023-11-18 08:02:13,065 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.63 vs. limit=15.0 2023-11-18 08:02:26,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.77 vs. limit=10.0 2023-11-18 08:02:31,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=132960.0, ans=0.125 2023-11-18 08:02:32,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=132960.0, ans=0.0 2023-11-18 08:02:37,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=133026.66666666666, ans=0.1 2023-11-18 08:02:42,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=133026.66666666666, ans=0.125 2023-11-18 08:02:59,864 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 7950, loss[loss=0.1143, simple_loss=0.1198, pruned_loss=0.04044, audio_tagging_loss=0.01397, over 15271.00 frames. ], tot_loss[loss=0.1293, simple_loss=0.1359, pruned_loss=0.04841, audio_tagging_loss=0.01294, over 3050762.70 frames. ], batch size: 56, lr: 2.62e-02, grad_scale: 32.0 2023-11-18 08:03:12,351 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:03:12,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=133226.66666666666, ans=0.2 2023-11-18 08:03:36,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=133360.0, ans=0.1 2023-11-18 08:03:37,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=133360.0, ans=0.125 2023-11-18 08:03:39,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-11-18 08:03:48,631 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 1.027e+02 1.116e+02 1.320e+02 1.890e+02, threshold=2.232e+02, percent-clipped=0.0 2023-11-18 08:03:58,890 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8000, loss[loss=0.108, simple_loss=0.1099, pruned_loss=0.03656, audio_tagging_loss=0.01654, over 14864.00 frames. ], tot_loss[loss=0.129, simple_loss=0.1352, pruned_loss=0.04829, audio_tagging_loss=0.01308, over 3043645.41 frames. ], batch size: 58, lr: 2.61e-02, grad_scale: 32.0 2023-11-18 08:04:27,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=133626.66666666666, ans=0.125 2023-11-18 08:04:31,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.80 vs. limit=22.5 2023-11-18 08:04:36,062 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.98 vs. limit=15.0 2023-11-18 08:04:44,388 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.07 vs. limit=15.0 2023-11-18 08:04:56,053 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8050, loss[loss=0.1447, simple_loss=0.1472, pruned_loss=0.05913, audio_tagging_loss=0.01198, over 15517.00 frames. ], tot_loss[loss=0.1293, simple_loss=0.1357, pruned_loss=0.04838, audio_tagging_loss=0.01307, over 3040129.39 frames. ], batch size: 58, lr: 2.61e-02, grad_scale: 32.0 2023-11-18 08:05:18,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=133960.0, ans=0.2 2023-11-18 08:05:26,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=133960.0, ans=0.0 2023-11-18 08:05:42,356 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.010e+01 1.125e+02 1.300e+02 1.606e+02 2.209e+02, threshold=2.601e+02, percent-clipped=0.0 2023-11-18 08:05:52,000 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8100, loss[loss=0.114, simple_loss=0.1095, pruned_loss=0.04668, audio_tagging_loss=0.01257, over 14152.00 frames. ], tot_loss[loss=0.1291, simple_loss=0.1356, pruned_loss=0.04839, audio_tagging_loss=0.01288, over 3037150.43 frames. ], batch size: 53, lr: 2.61e-02, grad_scale: 32.0 2023-11-18 08:05:54,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=134160.0, ans=0.1 2023-11-18 08:06:04,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=134226.66666666666, ans=0.95 2023-11-18 08:06:05,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=134226.66666666666, ans=0.125 2023-11-18 08:06:42,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=134426.66666666666, ans=0.2 2023-11-18 08:06:48,379 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8150, loss[loss=0.1097, simple_loss=0.1116, pruned_loss=0.04418, audio_tagging_loss=0.009753, over 15493.00 frames. ], tot_loss[loss=0.13, simple_loss=0.1367, pruned_loss=0.04902, audio_tagging_loss=0.01262, over 3042137.51 frames. ], batch size: 61, lr: 2.60e-02, grad_scale: 32.0 2023-11-18 08:07:01,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.91 vs. limit=10.0 2023-11-18 08:07:02,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=134560.0, ans=0.125 2023-11-18 08:07:05,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=134560.0, ans=0.0 2023-11-18 08:07:17,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=134626.66666666666, ans=0.125 2023-11-18 08:07:34,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-11-18 08:07:34,444 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.378e+01 1.047e+02 1.173e+02 1.336e+02 3.591e+02, threshold=2.346e+02, percent-clipped=1.0 2023-11-18 08:07:37,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=134760.0, ans=0.125 2023-11-18 08:07:44,589 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:07:45,620 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8200, loss[loss=0.121, simple_loss=0.1164, pruned_loss=0.04642, audio_tagging_loss=0.01639, over 16224.00 frames. ], tot_loss[loss=0.129, simple_loss=0.1359, pruned_loss=0.04855, audio_tagging_loss=0.0125, over 3049262.83 frames. ], batch size: 63, lr: 2.60e-02, grad_scale: 32.0 2023-11-18 08:08:03,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=134893.33333333334, ans=0.125 2023-11-18 08:08:41,582 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8250, loss[loss=0.1393, simple_loss=0.1487, pruned_loss=0.05137, audio_tagging_loss=0.01358, over 15628.00 frames. ], tot_loss[loss=0.1284, simple_loss=0.1354, pruned_loss=0.04826, audio_tagging_loss=0.01247, over 3047999.09 frames. ], batch size: 56, lr: 2.60e-02, grad_scale: 32.0 2023-11-18 08:08:41,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=135160.0, ans=0.125 2023-11-18 08:08:44,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=135160.0, ans=0.2 2023-11-18 08:08:46,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=135160.0, ans=0.125 2023-11-18 08:09:09,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=135293.33333333334, ans=0.125 2023-11-18 08:09:11,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.24 vs. limit=22.5 2023-11-18 08:09:22,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=135360.0, ans=0.125 2023-11-18 08:09:27,855 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.216e+01 1.092e+02 1.238e+02 1.418e+02 2.138e+02, threshold=2.477e+02, percent-clipped=0.0 2023-11-18 08:09:38,156 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8300, loss[loss=0.1209, simple_loss=0.1232, pruned_loss=0.04521, audio_tagging_loss=0.01409, over 15453.00 frames. ], tot_loss[loss=0.1281, simple_loss=0.1349, pruned_loss=0.04814, audio_tagging_loss=0.01255, over 3048604.64 frames. ], batch size: 58, lr: 2.60e-02, grad_scale: 32.0 2023-11-18 08:09:41,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=135493.33333333334, ans=0.2 2023-11-18 08:09:41,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=135493.33333333334, ans=0.125 2023-11-18 08:09:48,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.66 vs. limit=22.5 2023-11-18 08:09:52,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=135560.0, ans=0.0 2023-11-18 08:09:52,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2023-11-18 08:10:11,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=135693.33333333334, ans=0.125 2023-11-18 08:10:15,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=135693.33333333334, ans=0.125 2023-11-18 08:10:30,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=135760.0, ans=0.1 2023-11-18 08:10:35,049 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8350, loss[loss=0.1431, simple_loss=0.1525, pruned_loss=0.05553, audio_tagging_loss=0.0113, over 15195.00 frames. ], tot_loss[loss=0.1275, simple_loss=0.1346, pruned_loss=0.04776, audio_tagging_loss=0.01243, over 3045136.99 frames. ], batch size: 55, lr: 2.59e-02, grad_scale: 32.0 2023-11-18 08:10:48,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=135893.33333333334, ans=0.1 2023-11-18 08:11:01,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=135960.0, ans=0.0 2023-11-18 08:11:14,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=136026.66666666666, ans=0.125 2023-11-18 08:11:21,586 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 1.030e+02 1.156e+02 1.318e+02 1.873e+02, threshold=2.311e+02, percent-clipped=0.0 2023-11-18 08:11:26,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=136093.33333333334, ans=0.1 2023-11-18 08:11:31,730 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8400, loss[loss=0.1278, simple_loss=0.1324, pruned_loss=0.04746, audio_tagging_loss=0.0141, over 16043.00 frames. ], tot_loss[loss=0.1277, simple_loss=0.1349, pruned_loss=0.0479, audio_tagging_loss=0.01235, over 3051407.36 frames. ], batch size: 58, lr: 2.59e-02, grad_scale: 32.0 2023-11-18 08:11:32,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=136160.0, ans=0.0 2023-11-18 08:11:37,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=136160.0, ans=0.02 2023-11-18 08:11:54,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2023-11-18 08:11:57,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=136293.33333333334, ans=0.0 2023-11-18 08:11:59,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=136293.33333333334, ans=0.2 2023-11-18 08:12:03,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=136293.33333333334, ans=0.2 2023-11-18 08:12:23,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=136426.66666666666, ans=0.0 2023-11-18 08:12:28,239 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8450, loss[loss=0.1096, simple_loss=0.1091, pruned_loss=0.03781, audio_tagging_loss=0.01726, over 15175.00 frames. ], tot_loss[loss=0.1278, simple_loss=0.135, pruned_loss=0.04791, audio_tagging_loss=0.01239, over 3052886.28 frames. ], batch size: 58, lr: 2.59e-02, grad_scale: 32.0 2023-11-18 08:12:38,766 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.864e+00 2023-11-18 08:12:51,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=136626.66666666666, ans=0.07 2023-11-18 08:13:03,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=136693.33333333334, ans=0.2 2023-11-18 08:13:05,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=136693.33333333334, ans=0.0 2023-11-18 08:13:14,556 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.166e+01 1.028e+02 1.129e+02 1.256e+02 1.884e+02, threshold=2.258e+02, percent-clipped=0.0 2023-11-18 08:13:16,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=136760.0, ans=10.0 2023-11-18 08:13:19,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=15.0 2023-11-18 08:13:21,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=136760.0, ans=0.1 2023-11-18 08:13:21,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=136760.0, ans=0.1 2023-11-18 08:13:25,408 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8500, loss[loss=0.1234, simple_loss=0.1396, pruned_loss=0.0401, audio_tagging_loss=0.01346, over 15323.00 frames. ], tot_loss[loss=0.1275, simple_loss=0.1347, pruned_loss=0.04765, audio_tagging_loss=0.01249, over 3055047.69 frames. ], batch size: 55, lr: 2.59e-02, grad_scale: 32.0 2023-11-18 08:14:10,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=137093.33333333334, ans=0.0 2023-11-18 08:14:10,904 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-11-18 08:14:14,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2023-11-18 08:14:14,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=137093.33333333334, ans=0.035 2023-11-18 08:14:21,084 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8550, loss[loss=0.1558, simple_loss=0.1759, pruned_loss=0.05879, audio_tagging_loss=0.009017, over 15675.00 frames. ], tot_loss[loss=0.1278, simple_loss=0.1355, pruned_loss=0.04758, audio_tagging_loss=0.01242, over 3056449.65 frames. ], batch size: 56, lr: 2.58e-02, grad_scale: 32.0 2023-11-18 08:14:28,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=137160.0, ans=0.0 2023-11-18 08:14:38,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=137226.66666666666, ans=0.125 2023-11-18 08:15:07,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=137426.66666666666, ans=0.125 2023-11-18 08:15:07,824 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.950e+01 1.016e+02 1.105e+02 1.283e+02 1.880e+02, threshold=2.211e+02, percent-clipped=0.0 2023-11-18 08:15:09,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=137426.66666666666, ans=0.125 2023-11-18 08:15:18,204 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8600, loss[loss=0.09268, simple_loss=0.0959, pruned_loss=0.03209, audio_tagging_loss=0.01264, over 13836.00 frames. ], tot_loss[loss=0.1277, simple_loss=0.1352, pruned_loss=0.04758, audio_tagging_loss=0.01252, over 3054496.08 frames. ], batch size: 54, lr: 2.58e-02, grad_scale: 32.0 2023-11-18 08:15:20,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-11-18 08:15:39,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=137626.66666666666, ans=0.125 2023-11-18 08:15:42,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=22.5 2023-11-18 08:16:15,002 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8650, loss[loss=0.106, simple_loss=0.1016, pruned_loss=0.04031, audio_tagging_loss=0.01487, over 14900.00 frames. ], tot_loss[loss=0.1284, simple_loss=0.1359, pruned_loss=0.04778, audio_tagging_loss=0.01263, over 3057201.49 frames. ], batch size: 57, lr: 2.58e-02, grad_scale: 32.0 2023-11-18 08:16:21,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=137826.66666666666, ans=0.125 2023-11-18 08:16:35,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=137893.33333333334, ans=0.0 2023-11-18 08:16:38,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=137960.0, ans=0.125 2023-11-18 08:16:49,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=138026.66666666666, ans=0.0 2023-11-18 08:17:01,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 1.026e+02 1.125e+02 1.305e+02 1.898e+02, threshold=2.250e+02, percent-clipped=0.0 2023-11-18 08:17:06,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=138093.33333333334, ans=0.0 2023-11-18 08:17:11,114 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8700, loss[loss=0.1375, simple_loss=0.1536, pruned_loss=0.05051, audio_tagging_loss=0.01023, over 14029.00 frames. ], tot_loss[loss=0.1269, simple_loss=0.1342, pruned_loss=0.04701, audio_tagging_loss=0.01278, over 3053857.63 frames. ], batch size: 53, lr: 2.57e-02, grad_scale: 32.0 2023-11-18 08:17:27,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2023-11-18 08:17:35,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.24 vs. limit=15.0 2023-11-18 08:17:48,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2023-11-18 08:18:03,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=138426.66666666666, ans=0.125 2023-11-18 08:18:07,695 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8750, loss[loss=0.1284, simple_loss=0.1392, pruned_loss=0.04457, audio_tagging_loss=0.01429, over 14949.00 frames. ], tot_loss[loss=0.1276, simple_loss=0.1349, pruned_loss=0.04736, audio_tagging_loss=0.01276, over 3055649.28 frames. ], batch size: 56, lr: 2.57e-02, grad_scale: 32.0 2023-11-18 08:18:07,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=138493.33333333334, ans=0.0 2023-11-18 08:18:20,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2023-11-18 08:18:28,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=138560.0, ans=0.0 2023-11-18 08:18:39,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=138626.66666666666, ans=0.2 2023-11-18 08:18:42,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=138693.33333333334, ans=0.0 2023-11-18 08:18:43,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=138693.33333333334, ans=0.2 2023-11-18 08:18:49,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.49 vs. limit=15.0 2023-11-18 08:18:55,033 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.161e+01 1.039e+02 1.204e+02 1.359e+02 1.963e+02, threshold=2.408e+02, percent-clipped=0.0 2023-11-18 08:19:03,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=138760.0, ans=0.5 2023-11-18 08:19:05,272 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8800, loss[loss=0.1073, simple_loss=0.1192, pruned_loss=0.0361, audio_tagging_loss=0.01162, over 15502.00 frames. ], tot_loss[loss=0.1271, simple_loss=0.1343, pruned_loss=0.04703, audio_tagging_loss=0.01291, over 3052274.17 frames. ], batch size: 58, lr: 2.57e-02, grad_scale: 32.0 2023-11-18 08:19:07,059 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:19:12,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=138826.66666666666, ans=0.125 2023-11-18 08:19:30,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=138960.0, ans=0.125 2023-11-18 08:19:30,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=138960.0, ans=0.125 2023-11-18 08:19:33,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=138960.0, ans=0.1 2023-11-18 08:19:56,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=139093.33333333334, ans=0.125 2023-11-18 08:19:57,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=139093.33333333334, ans=0.125 2023-11-18 08:20:00,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=139160.0, ans=0.1 2023-11-18 08:20:01,571 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8850, loss[loss=0.134, simple_loss=0.1334, pruned_loss=0.05439, audio_tagging_loss=0.01294, over 14487.00 frames. ], tot_loss[loss=0.1265, simple_loss=0.1335, pruned_loss=0.04676, audio_tagging_loss=0.01301, over 3055078.65 frames. ], batch size: 56, lr: 2.57e-02, grad_scale: 32.0 2023-11-18 08:20:01,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=139160.0, ans=0.0 2023-11-18 08:20:09,062 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:20:14,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=139226.66666666666, ans=0.1 2023-11-18 08:20:36,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=22.5 2023-11-18 08:20:37,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2023-11-18 08:20:39,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=139360.0, ans=0.1 2023-11-18 08:20:40,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=139360.0, ans=0.125 2023-11-18 08:20:47,633 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 1.044e+02 1.177e+02 1.340e+02 1.901e+02, threshold=2.354e+02, percent-clipped=0.0 2023-11-18 08:20:49,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.60 vs. limit=6.0 2023-11-18 08:20:57,238 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8900, loss[loss=0.1156, simple_loss=0.13, pruned_loss=0.03953, audio_tagging_loss=0.01111, over 14720.00 frames. ], tot_loss[loss=0.1259, simple_loss=0.133, pruned_loss=0.04659, audio_tagging_loss=0.01285, over 3055573.14 frames. ], batch size: 57, lr: 2.56e-02, grad_scale: 32.0 2023-11-18 08:21:04,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=139493.33333333334, ans=0.125 2023-11-18 08:21:35,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=139693.33333333334, ans=0.0 2023-11-18 08:21:40,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2023-11-18 08:21:41,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=139760.0, ans=0.125 2023-11-18 08:21:54,082 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 8950, loss[loss=0.103, simple_loss=0.09851, pruned_loss=0.04081, audio_tagging_loss=0.01297, over 14507.00 frames. ], tot_loss[loss=0.1269, simple_loss=0.1345, pruned_loss=0.04704, audio_tagging_loss=0.01261, over 3048813.32 frames. ], batch size: 57, lr: 2.56e-02, grad_scale: 32.0 2023-11-18 08:21:55,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=139826.66666666666, ans=0.09899494936611666 2023-11-18 08:22:07,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.00 vs. limit=22.5 2023-11-18 08:22:10,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=139893.33333333334, ans=0.1 2023-11-18 08:22:19,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=139960.0, ans=0.125 2023-11-18 08:22:19,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2023-11-18 08:22:20,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=139960.0, ans=0.125 2023-11-18 08:22:29,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=140026.66666666666, ans=0.0 2023-11-18 08:22:31,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=140026.66666666666, ans=0.04949747468305833 2023-11-18 08:22:41,390 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.130e+01 1.003e+02 1.129e+02 1.259e+02 1.857e+02, threshold=2.258e+02, percent-clipped=0.0 2023-11-18 08:22:51,051 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9000, loss[loss=0.09146, simple_loss=0.08563, pruned_loss=0.03721, audio_tagging_loss=0.01144, over 14233.00 frames. ], tot_loss[loss=0.1261, simple_loss=0.1337, pruned_loss=0.04671, audio_tagging_loss=0.01251, over 3052573.83 frames. ], batch size: 53, lr: 2.56e-02, grad_scale: 32.0 2023-11-18 08:22:51,052 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 08:23:26,294 INFO [train_asr.py:1147] (1/4) Epoch 2, validation: loss=0.08723, simple_loss=0.06802, pruned_loss=0.01417, audio_tagging_loss=0.03906, over 4681554.00 frames. 2023-11-18 08:23:26,294 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 08:23:30,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=140160.0, ans=0.125 2023-11-18 08:23:53,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=140293.33333333334, ans=0.125 2023-11-18 08:24:22,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2023-11-18 08:24:22,612 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9050, loss[loss=0.1491, simple_loss=0.1582, pruned_loss=0.05992, audio_tagging_loss=0.01006, over 15362.00 frames. ], tot_loss[loss=0.1265, simple_loss=0.1339, pruned_loss=0.04712, audio_tagging_loss=0.01245, over 3053442.08 frames. ], batch size: 58, lr: 2.56e-02, grad_scale: 32.0 2023-11-18 08:24:36,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=140560.0, ans=0.0 2023-11-18 08:24:37,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=140560.0, ans=0.2 2023-11-18 08:25:00,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=140693.33333333334, ans=0.125 2023-11-18 08:25:08,332 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 1.025e+02 1.134e+02 1.283e+02 1.776e+02, threshold=2.268e+02, percent-clipped=0.0 2023-11-18 08:25:17,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=140826.66666666666, ans=0.1 2023-11-18 08:25:18,126 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9100, loss[loss=0.1245, simple_loss=0.1252, pruned_loss=0.04966, audio_tagging_loss=0.01227, over 13806.00 frames. ], tot_loss[loss=0.126, simple_loss=0.1335, pruned_loss=0.04699, audio_tagging_loss=0.01227, over 3056950.23 frames. ], batch size: 53, lr: 2.55e-02, grad_scale: 32.0 2023-11-18 08:25:25,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=140826.66666666666, ans=0.0 2023-11-18 08:25:25,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=140826.66666666666, ans=0.125 2023-11-18 08:25:56,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=141026.66666666666, ans=0.125 2023-11-18 08:25:58,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=141026.66666666666, ans=0.5 2023-11-18 08:26:05,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=141093.33333333334, ans=0.125 2023-11-18 08:26:15,083 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9150, loss[loss=0.1568, simple_loss=0.1595, pruned_loss=0.0663, audio_tagging_loss=0.01078, over 14536.00 frames. ], tot_loss[loss=0.1265, simple_loss=0.1344, pruned_loss=0.04715, audio_tagging_loss=0.01213, over 3055837.71 frames. ], batch size: 53, lr: 2.55e-02, grad_scale: 32.0 2023-11-18 08:26:18,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=141160.0, ans=0.125 2023-11-18 08:26:18,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=141160.0, ans=0.125 2023-11-18 08:26:41,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=141293.33333333334, ans=0.125 2023-11-18 08:26:42,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=141293.33333333334, ans=0.95 2023-11-18 08:26:52,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2023-11-18 08:27:01,447 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.257e+01 1.062e+02 1.145e+02 1.276e+02 2.030e+02, threshold=2.290e+02, percent-clipped=0.0 2023-11-18 08:27:12,358 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9200, loss[loss=0.1338, simple_loss=0.1421, pruned_loss=0.05135, audio_tagging_loss=0.01136, over 16536.00 frames. ], tot_loss[loss=0.1262, simple_loss=0.1338, pruned_loss=0.04706, audio_tagging_loss=0.01227, over 3057868.51 frames. ], batch size: 59, lr: 2.55e-02, grad_scale: 32.0 2023-11-18 08:27:41,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.61 vs. limit=22.5 2023-11-18 08:27:46,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=141693.33333333334, ans=0.125 2023-11-18 08:27:49,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=141693.33333333334, ans=0.2 2023-11-18 08:27:52,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141693.33333333334, ans=0.1 2023-11-18 08:28:00,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=141760.0, ans=0.09899494936611666 2023-11-18 08:28:08,762 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9250, loss[loss=0.1253, simple_loss=0.1148, pruned_loss=0.05343, audio_tagging_loss=0.01446, over 14784.00 frames. ], tot_loss[loss=0.1256, simple_loss=0.1329, pruned_loss=0.04689, audio_tagging_loss=0.01224, over 3059380.45 frames. ], batch size: 58, lr: 2.54e-02, grad_scale: 32.0 2023-11-18 08:28:09,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=141826.66666666666, ans=0.125 2023-11-18 08:28:20,203 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.06 vs. limit=12.0 2023-11-18 08:28:23,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.34 vs. limit=10.0 2023-11-18 08:28:30,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=141960.0, ans=0.125 2023-11-18 08:28:40,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=141960.0, ans=0.5 2023-11-18 08:28:41,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=142026.66666666666, ans=0.125 2023-11-18 08:28:55,019 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.910e+01 1.033e+02 1.140e+02 1.302e+02 2.365e+02, threshold=2.281e+02, percent-clipped=1.0 2023-11-18 08:28:55,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=142093.33333333334, ans=0.125 2023-11-18 08:28:58,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=142093.33333333334, ans=0.0 2023-11-18 08:29:04,827 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9300, loss[loss=0.1037, simple_loss=0.1109, pruned_loss=0.03316, audio_tagging_loss=0.0151, over 14495.00 frames. ], tot_loss[loss=0.1251, simple_loss=0.1329, pruned_loss=0.04657, audio_tagging_loss=0.01214, over 3054752.45 frames. ], batch size: 55, lr: 2.54e-02, grad_scale: 32.0 2023-11-18 08:29:13,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=142160.0, ans=0.0 2023-11-18 08:29:15,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=15.0 2023-11-18 08:29:22,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.22 vs. limit=22.5 2023-11-18 08:29:43,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=142360.0, ans=0.0 2023-11-18 08:29:44,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=142360.0, ans=0.1 2023-11-18 08:30:01,747 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9350, loss[loss=0.113, simple_loss=0.1167, pruned_loss=0.03985, audio_tagging_loss=0.01481, over 15164.00 frames. ], tot_loss[loss=0.1246, simple_loss=0.1322, pruned_loss=0.04619, audio_tagging_loss=0.0123, over 3051487.04 frames. ], batch size: 56, lr: 2.54e-02, grad_scale: 32.0 2023-11-18 08:30:21,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=142560.0, ans=0.125 2023-11-18 08:30:37,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=142693.33333333334, ans=0.04949747468305833 2023-11-18 08:30:48,963 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.103e+01 1.054e+02 1.142e+02 1.283e+02 1.990e+02, threshold=2.284e+02, percent-clipped=0.0 2023-11-18 08:30:56,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=142760.0, ans=0.1 2023-11-18 08:30:59,168 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9400, loss[loss=0.1238, simple_loss=0.1357, pruned_loss=0.04442, audio_tagging_loss=0.01153, over 14445.00 frames. ], tot_loss[loss=0.1249, simple_loss=0.1325, pruned_loss=0.04619, audio_tagging_loss=0.01247, over 3053249.68 frames. ], batch size: 57, lr: 2.54e-02, grad_scale: 32.0 2023-11-18 08:31:21,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.96 vs. limit=22.5 2023-11-18 08:31:42,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=143026.66666666666, ans=0.0 2023-11-18 08:31:50,652 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:31:51,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=143093.33333333334, ans=0.0 2023-11-18 08:31:52,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.82 vs. limit=10.0 2023-11-18 08:31:54,923 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9450, loss[loss=0.08102, simple_loss=0.08323, pruned_loss=0.02744, audio_tagging_loss=0.01197, over 14796.00 frames. ], tot_loss[loss=0.1256, simple_loss=0.1332, pruned_loss=0.04652, audio_tagging_loss=0.01254, over 3052770.27 frames. ], batch size: 59, lr: 2.53e-02, grad_scale: 32.0 2023-11-18 08:31:57,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=143160.0, ans=0.5 2023-11-18 08:32:05,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=143226.66666666666, ans=0.0 2023-11-18 08:32:06,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=143226.66666666666, ans=0.1 2023-11-18 08:32:30,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=143360.0, ans=0.0 2023-11-18 08:32:38,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2023-11-18 08:32:39,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=143426.66666666666, ans=0.125 2023-11-18 08:32:41,714 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.457e+01 1.024e+02 1.132e+02 1.318e+02 2.507e+02, threshold=2.264e+02, percent-clipped=1.0 2023-11-18 08:32:51,331 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9500, loss[loss=0.1351, simple_loss=0.1472, pruned_loss=0.05002, audio_tagging_loss=0.01149, over 16394.00 frames. ], tot_loss[loss=0.1264, simple_loss=0.1337, pruned_loss=0.0469, audio_tagging_loss=0.01266, over 3049097.10 frames. ], batch size: 60, lr: 2.53e-02, grad_scale: 32.0 2023-11-18 08:33:06,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=143560.0, ans=0.125 2023-11-18 08:33:19,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=143626.66666666666, ans=0.07 2023-11-18 08:33:20,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.23 vs. limit=22.5 2023-11-18 08:33:34,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=143693.33333333334, ans=0.1 2023-11-18 08:33:36,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.40 vs. limit=12.0 2023-11-18 08:33:42,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.65 vs. limit=12.0 2023-11-18 08:33:42,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=143760.0, ans=0.1 2023-11-18 08:33:48,288 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9550, loss[loss=0.103, simple_loss=0.1141, pruned_loss=0.03383, audio_tagging_loss=0.01213, over 15324.00 frames. ], tot_loss[loss=0.1262, simple_loss=0.1335, pruned_loss=0.0467, audio_tagging_loss=0.01281, over 3046239.69 frames. ], batch size: 60, lr: 2.53e-02, grad_scale: 32.0 2023-11-18 08:33:49,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.50 vs. limit=22.5 2023-11-18 08:33:52,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=143826.66666666666, ans=0.2 2023-11-18 08:33:58,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=143893.33333333334, ans=0.1 2023-11-18 08:34:34,606 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 9.724e+01 1.130e+02 1.324e+02 2.108e+02, threshold=2.261e+02, percent-clipped=0.0 2023-11-18 08:34:44,682 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9600, loss[loss=0.1234, simple_loss=0.1486, pruned_loss=0.03882, audio_tagging_loss=0.01033, over 15482.00 frames. ], tot_loss[loss=0.127, simple_loss=0.1346, pruned_loss=0.04698, audio_tagging_loss=0.01273, over 3050620.88 frames. ], batch size: 55, lr: 2.53e-02, grad_scale: 32.0 2023-11-18 08:34:45,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=144160.0, ans=0.0 2023-11-18 08:35:24,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=144360.0, ans=0.0 2023-11-18 08:35:28,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-11-18 08:35:34,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=144426.66666666666, ans=0.1 2023-11-18 08:35:37,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=144426.66666666666, ans=0.1 2023-11-18 08:35:40,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=144493.33333333334, ans=0.0 2023-11-18 08:35:41,218 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9650, loss[loss=0.1046, simple_loss=0.1127, pruned_loss=0.03796, audio_tagging_loss=0.01031, over 15659.00 frames. ], tot_loss[loss=0.1267, simple_loss=0.1341, pruned_loss=0.04692, audio_tagging_loss=0.01276, over 3046411.50 frames. ], batch size: 61, lr: 2.52e-02, grad_scale: 32.0 2023-11-18 08:35:54,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=144560.0, ans=0.04949747468305833 2023-11-18 08:36:14,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=144693.33333333334, ans=0.0 2023-11-18 08:36:27,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2023-11-18 08:36:27,679 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.399e+01 1.034e+02 1.159e+02 1.347e+02 1.813e+02, threshold=2.318e+02, percent-clipped=0.0 2023-11-18 08:36:38,026 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9700, loss[loss=0.1287, simple_loss=0.1425, pruned_loss=0.04759, audio_tagging_loss=0.009853, over 14858.00 frames. ], tot_loss[loss=0.126, simple_loss=0.1334, pruned_loss=0.04674, audio_tagging_loss=0.0126, over 3049956.07 frames. ], batch size: 57, lr: 2.52e-02, grad_scale: 32.0 2023-11-18 08:36:40,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=144826.66666666666, ans=0.1 2023-11-18 08:37:07,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.87 vs. limit=15.0 2023-11-18 08:37:08,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=144960.0, ans=0.0 2023-11-18 08:37:19,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=145026.66666666666, ans=0.125 2023-11-18 08:37:21,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=145026.66666666666, ans=0.2 2023-11-18 08:37:33,993 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9750, loss[loss=0.0869, simple_loss=0.08535, pruned_loss=0.0303, audio_tagging_loss=0.01393, over 15216.00 frames. ], tot_loss[loss=0.1261, simple_loss=0.1337, pruned_loss=0.04682, audio_tagging_loss=0.01244, over 3051812.29 frames. ], batch size: 59, lr: 2.52e-02, grad_scale: 32.0 2023-11-18 08:37:48,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=145226.66666666666, ans=0.1 2023-11-18 08:37:58,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=145293.33333333334, ans=0.0 2023-11-18 08:38:03,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=145293.33333333334, ans=0.5 2023-11-18 08:38:14,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=145360.0, ans=0.125 2023-11-18 08:38:16,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=145360.0, ans=0.0 2023-11-18 08:38:16,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=145360.0, ans=0.0 2023-11-18 08:38:21,154 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 1.005e+02 1.144e+02 1.318e+02 1.775e+02, threshold=2.288e+02, percent-clipped=0.0 2023-11-18 08:38:27,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=145426.66666666666, ans=0.09899494936611666 2023-11-18 08:38:31,524 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9800, loss[loss=0.08386, simple_loss=0.08667, pruned_loss=0.02754, audio_tagging_loss=0.01298, over 15657.00 frames. ], tot_loss[loss=0.1261, simple_loss=0.1336, pruned_loss=0.04688, audio_tagging_loss=0.01241, over 3046394.13 frames. ], batch size: 60, lr: 2.52e-02, grad_scale: 32.0 2023-11-18 08:38:38,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=145493.33333333334, ans=0.125 2023-11-18 08:38:49,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=145560.0, ans=0.125 2023-11-18 08:39:04,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=145693.33333333334, ans=0.1 2023-11-18 08:39:06,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2023-11-18 08:39:19,469 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:39:26,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=145760.0, ans=0.125 2023-11-18 08:39:28,512 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9850, loss[loss=0.168, simple_loss=0.1867, pruned_loss=0.06735, audio_tagging_loss=0.007309, over 16009.00 frames. ], tot_loss[loss=0.1264, simple_loss=0.134, pruned_loss=0.04702, audio_tagging_loss=0.01238, over 3045876.83 frames. ], batch size: 57, lr: 2.51e-02, grad_scale: 32.0 2023-11-18 08:39:55,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=145960.0, ans=0.125 2023-11-18 08:39:58,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=145960.0, ans=0.125 2023-11-18 08:40:14,766 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 1.021e+02 1.122e+02 1.308e+02 2.084e+02, threshold=2.244e+02, percent-clipped=0.0 2023-11-18 08:40:24,457 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9900, loss[loss=0.1115, simple_loss=0.127, pruned_loss=0.037, audio_tagging_loss=0.01097, over 14308.00 frames. ], tot_loss[loss=0.1255, simple_loss=0.1331, pruned_loss=0.04653, audio_tagging_loss=0.01241, over 3051193.50 frames. ], batch size: 53, lr: 2.51e-02, grad_scale: 64.0 2023-11-18 08:40:42,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=146226.66666666666, ans=0.07 2023-11-18 08:40:48,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=146293.33333333334, ans=0.0 2023-11-18 08:40:54,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=22.5 2023-11-18 08:40:58,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=146360.0, ans=0.125 2023-11-18 08:41:12,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=146426.66666666666, ans=0.125 2023-11-18 08:41:19,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.40 vs. limit=15.0 2023-11-18 08:41:20,913 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 9950, loss[loss=0.129, simple_loss=0.1395, pruned_loss=0.04937, audio_tagging_loss=0.009863, over 15299.00 frames. ], tot_loss[loss=0.125, simple_loss=0.1325, pruned_loss=0.04624, audio_tagging_loss=0.01245, over 3051249.01 frames. ], batch size: 55, lr: 2.51e-02, grad_scale: 64.0 2023-11-18 08:41:29,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.29 vs. limit=15.0 2023-11-18 08:41:35,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=146560.0, ans=0.125 2023-11-18 08:41:46,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=146626.66666666666, ans=0.07 2023-11-18 08:41:52,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=146626.66666666666, ans=0.125 2023-11-18 08:42:06,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=15.0 2023-11-18 08:42:07,455 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.831e+01 1.033e+02 1.176e+02 1.296e+02 1.958e+02, threshold=2.352e+02, percent-clipped=0.0 2023-11-18 08:42:07,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2023-11-18 08:42:14,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=146760.0, ans=0.0 2023-11-18 08:42:18,282 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10000, loss[loss=0.08171, simple_loss=0.08587, pruned_loss=0.02627, audio_tagging_loss=0.0125, over 14776.00 frames. ], tot_loss[loss=0.1257, simple_loss=0.1333, pruned_loss=0.04679, audio_tagging_loss=0.01231, over 3046236.46 frames. ], batch size: 58, lr: 2.51e-02, grad_scale: 64.0 2023-11-18 08:42:32,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=146893.33333333334, ans=0.05 2023-11-18 08:42:41,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=146960.0, ans=0.0 2023-11-18 08:43:00,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.64 vs. limit=6.0 2023-11-18 08:43:06,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.67 vs. limit=15.0 2023-11-18 08:43:09,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=147093.33333333334, ans=0.2 2023-11-18 08:43:11,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=147093.33333333334, ans=0.0 2023-11-18 08:43:14,064 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.13 vs. limit=22.5 2023-11-18 08:43:14,450 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10050, loss[loss=0.1482, simple_loss=0.1641, pruned_loss=0.05565, audio_tagging_loss=0.01046, over 13888.00 frames. ], tot_loss[loss=0.1263, simple_loss=0.1339, pruned_loss=0.04702, audio_tagging_loss=0.01236, over 3041132.69 frames. ], batch size: 54, lr: 2.50e-02, grad_scale: 64.0 2023-11-18 08:43:18,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=147160.0, ans=0.0 2023-11-18 08:43:53,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=147360.0, ans=0.125 2023-11-18 08:43:56,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2023-11-18 08:44:01,689 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.399e+01 9.828e+01 1.108e+02 1.232e+02 2.122e+02, threshold=2.217e+02, percent-clipped=0.0 2023-11-18 08:44:10,753 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10100, loss[loss=0.138, simple_loss=0.1385, pruned_loss=0.05464, audio_tagging_loss=0.01411, over 15914.00 frames. ], tot_loss[loss=0.1255, simple_loss=0.1327, pruned_loss=0.04669, audio_tagging_loss=0.01242, over 3044244.37 frames. ], batch size: 57, lr: 2.50e-02, grad_scale: 32.0 2023-11-18 08:44:17,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2023-11-18 08:44:32,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=147560.0, ans=0.0 2023-11-18 08:44:36,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.18 vs. limit=22.5 2023-11-18 08:44:52,693 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:45:00,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=147760.0, ans=0.125 2023-11-18 08:45:08,339 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10150, loss[loss=0.1268, simple_loss=0.1459, pruned_loss=0.044, audio_tagging_loss=0.009839, over 15645.00 frames. ], tot_loss[loss=0.126, simple_loss=0.1337, pruned_loss=0.04677, audio_tagging_loss=0.0124, over 3044755.88 frames. ], batch size: 57, lr: 2.50e-02, grad_scale: 32.0 2023-11-18 08:45:12,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=147826.66666666666, ans=0.1 2023-11-18 08:45:13,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=147826.66666666666, ans=0.125 2023-11-18 08:45:15,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=147826.66666666666, ans=0.0 2023-11-18 08:45:18,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.49 vs. limit=22.5 2023-11-18 08:45:27,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=147893.33333333334, ans=0.125 2023-11-18 08:45:29,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.12 vs. limit=22.5 2023-11-18 08:45:30,098 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:45:32,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=147960.0, ans=0.5 2023-11-18 08:45:45,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=148026.66666666666, ans=0.1 2023-11-18 08:45:55,683 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.369e+01 1.049e+02 1.137e+02 1.279e+02 1.864e+02, threshold=2.275e+02, percent-clipped=0.0 2023-11-18 08:45:56,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=148093.33333333334, ans=0.125 2023-11-18 08:45:58,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=148093.33333333334, ans=0.0 2023-11-18 08:46:04,222 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10200, loss[loss=0.1179, simple_loss=0.1139, pruned_loss=0.04167, audio_tagging_loss=0.0193, over 14023.00 frames. ], tot_loss[loss=0.1265, simple_loss=0.134, pruned_loss=0.04703, audio_tagging_loss=0.01249, over 3053942.75 frames. ], batch size: 53, lr: 2.50e-02, grad_scale: 32.0 2023-11-18 08:46:06,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=148160.0, ans=0.125 2023-11-18 08:46:20,817 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:46:21,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=148226.66666666666, ans=0.07 2023-11-18 08:46:25,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=148293.33333333334, ans=0.125 2023-11-18 08:46:33,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=148293.33333333334, ans=0.125 2023-11-18 08:46:38,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=148360.0, ans=0.0 2023-11-18 08:46:39,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.81 vs. limit=22.5 2023-11-18 08:46:51,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=148426.66666666666, ans=0.2 2023-11-18 08:46:57,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=17.93 vs. limit=15.0 2023-11-18 08:47:00,838 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10250, loss[loss=0.1303, simple_loss=0.1335, pruned_loss=0.05041, audio_tagging_loss=0.01314, over 16384.00 frames. ], tot_loss[loss=0.1253, simple_loss=0.1325, pruned_loss=0.04634, audio_tagging_loss=0.01267, over 3050224.75 frames. ], batch size: 61, lr: 2.49e-02, grad_scale: 32.0 2023-11-18 08:47:03,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=148493.33333333334, ans=0.125 2023-11-18 08:47:03,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=148493.33333333334, ans=0.2 2023-11-18 08:47:11,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2023-11-18 08:47:15,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=148560.0, ans=0.125 2023-11-18 08:47:26,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.07 vs. limit=6.0 2023-11-18 08:47:46,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.61 vs. limit=15.0 2023-11-18 08:47:48,252 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 1.009e+02 1.120e+02 1.271e+02 1.895e+02, threshold=2.240e+02, percent-clipped=0.0 2023-11-18 08:47:48,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2023-11-18 08:47:51,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=148760.0, ans=0.125 2023-11-18 08:47:58,054 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10300, loss[loss=0.1314, simple_loss=0.1375, pruned_loss=0.05243, audio_tagging_loss=0.01019, over 14643.00 frames. ], tot_loss[loss=0.125, simple_loss=0.1325, pruned_loss=0.04615, audio_tagging_loss=0.01263, over 3052517.91 frames. ], batch size: 54, lr: 2.49e-02, grad_scale: 32.0 2023-11-18 08:47:59,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2023-11-18 08:48:22,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=148960.0, ans=0.125 2023-11-18 08:48:45,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=149093.33333333334, ans=0.2 2023-11-18 08:48:54,298 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10350, loss[loss=0.1292, simple_loss=0.1258, pruned_loss=0.04451, audio_tagging_loss=0.02176, over 16490.00 frames. ], tot_loss[loss=0.1256, simple_loss=0.133, pruned_loss=0.04627, audio_tagging_loss=0.01287, over 3054065.60 frames. ], batch size: 64, lr: 2.49e-02, grad_scale: 32.0 2023-11-18 08:48:59,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=149160.0, ans=0.1 2023-11-18 08:49:07,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=149226.66666666666, ans=0.125 2023-11-18 08:49:13,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2023-11-18 08:49:14,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=149226.66666666666, ans=15.0 2023-11-18 08:49:16,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=149293.33333333334, ans=0.125 2023-11-18 08:49:33,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=15.0 2023-11-18 08:49:41,440 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 9.879e+01 1.106e+02 1.235e+02 1.803e+02, threshold=2.212e+02, percent-clipped=0.0 2023-11-18 08:49:50,046 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10400, loss[loss=0.08678, simple_loss=0.08095, pruned_loss=0.02756, audio_tagging_loss=0.01874, over 13959.00 frames. ], tot_loss[loss=0.1252, simple_loss=0.1323, pruned_loss=0.04596, audio_tagging_loss=0.0131, over 3048237.65 frames. ], batch size: 56, lr: 2.49e-02, grad_scale: 32.0 2023-11-18 08:49:52,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=149493.33333333334, ans=0.0 2023-11-18 08:50:06,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=149560.0, ans=0.125 2023-11-18 08:50:08,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=149560.0, ans=0.2 2023-11-18 08:50:37,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=149760.0, ans=0.1 2023-11-18 08:50:47,473 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10450, loss[loss=0.1235, simple_loss=0.124, pruned_loss=0.05141, audio_tagging_loss=0.01007, over 15559.00 frames. ], tot_loss[loss=0.1247, simple_loss=0.1317, pruned_loss=0.04576, audio_tagging_loss=0.01304, over 3045626.68 frames. ], batch size: 60, lr: 2.48e-02, grad_scale: 32.0 2023-11-18 08:50:50,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=149826.66666666666, ans=0.025 2023-11-18 08:50:59,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=149893.33333333334, ans=0.1 2023-11-18 08:51:11,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=149960.0, ans=0.125 2023-11-18 08:51:17,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=149960.0, ans=0.2 2023-11-18 08:51:21,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=150026.66666666666, ans=0.95 2023-11-18 08:51:35,242 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.869e+01 9.874e+01 1.064e+02 1.233e+02 1.785e+02, threshold=2.128e+02, percent-clipped=0.0 2023-11-18 08:51:44,323 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10500, loss[loss=0.1295, simple_loss=0.1358, pruned_loss=0.04899, audio_tagging_loss=0.01265, over 15387.00 frames. ], tot_loss[loss=0.1242, simple_loss=0.1316, pruned_loss=0.04563, audio_tagging_loss=0.01277, over 3056307.38 frames. ], batch size: 58, lr: 2.48e-02, grad_scale: 32.0 2023-11-18 08:51:47,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=150160.0, ans=0.125 2023-11-18 08:51:47,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=150160.0, ans=0.0 2023-11-18 08:51:48,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=150160.0, ans=0.0 2023-11-18 08:52:26,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=150360.0, ans=0.1 2023-11-18 08:52:31,738 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:52:39,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=150493.33333333334, ans=0.0 2023-11-18 08:52:39,975 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10550, loss[loss=0.1099, simple_loss=0.1215, pruned_loss=0.03872, audio_tagging_loss=0.01046, over 14810.00 frames. ], tot_loss[loss=0.1249, simple_loss=0.1327, pruned_loss=0.04602, audio_tagging_loss=0.01254, over 3056651.78 frames. ], batch size: 55, lr: 2.48e-02, grad_scale: 32.0 2023-11-18 08:52:42,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=150493.33333333334, ans=0.125 2023-11-18 08:52:43,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=150493.33333333334, ans=0.125 2023-11-18 08:52:50,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=150560.0, ans=0.125 2023-11-18 08:53:06,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.93 vs. limit=22.5 2023-11-18 08:53:27,566 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 9.723e+01 1.093e+02 1.257e+02 1.576e+02, threshold=2.186e+02, percent-clipped=0.0 2023-11-18 08:53:37,313 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10600, loss[loss=0.09425, simple_loss=0.09942, pruned_loss=0.03248, audio_tagging_loss=0.01206, over 16259.00 frames. ], tot_loss[loss=0.1244, simple_loss=0.1324, pruned_loss=0.04573, audio_tagging_loss=0.01248, over 3055499.50 frames. ], batch size: 64, lr: 2.48e-02, grad_scale: 32.0 2023-11-18 08:54:04,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2023-11-18 08:54:10,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=151026.66666666666, ans=0.0 2023-11-18 08:54:22,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2023-11-18 08:54:33,698 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10650, loss[loss=0.1383, simple_loss=0.1508, pruned_loss=0.04998, audio_tagging_loss=0.01298, over 15074.00 frames. ], tot_loss[loss=0.1244, simple_loss=0.1326, pruned_loss=0.04578, audio_tagging_loss=0.01235, over 3049400.84 frames. ], batch size: 56, lr: 2.47e-02, grad_scale: 32.0 2023-11-18 08:54:36,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=151160.0, ans=0.125 2023-11-18 08:54:40,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=151160.0, ans=0.125 2023-11-18 08:54:55,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=151293.33333333334, ans=0.125 2023-11-18 08:55:14,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=151360.0, ans=0.1 2023-11-18 08:55:21,760 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.938e+01 1.018e+02 1.108e+02 1.279e+02 1.939e+02, threshold=2.217e+02, percent-clipped=0.0 2023-11-18 08:55:24,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=151426.66666666666, ans=0.125 2023-11-18 08:55:24,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=151426.66666666666, ans=0.125 2023-11-18 08:55:30,341 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10700, loss[loss=0.1247, simple_loss=0.1204, pruned_loss=0.04557, audio_tagging_loss=0.01894, over 15163.00 frames. ], tot_loss[loss=0.1257, simple_loss=0.1341, pruned_loss=0.04633, audio_tagging_loss=0.01232, over 3050096.57 frames. ], batch size: 58, lr: 2.47e-02, grad_scale: 32.0 2023-11-18 08:56:26,873 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10750, loss[loss=0.1484, simple_loss=0.1749, pruned_loss=0.0529, audio_tagging_loss=0.008055, over 14727.00 frames. ], tot_loss[loss=0.1257, simple_loss=0.134, pruned_loss=0.04641, audio_tagging_loss=0.01226, over 3049681.57 frames. ], batch size: 52, lr: 2.47e-02, grad_scale: 32.0 2023-11-18 08:56:34,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=151826.66666666666, ans=0.125 2023-11-18 08:56:43,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.27 vs. limit=22.5 2023-11-18 08:56:51,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=151960.0, ans=0.0 2023-11-18 08:57:01,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=12.0 2023-11-18 08:57:07,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=152026.66666666666, ans=0.0 2023-11-18 08:57:11,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=152093.33333333334, ans=0.1 2023-11-18 08:57:13,371 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.41 vs. limit=15.0 2023-11-18 08:57:14,951 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.096e+01 9.809e+01 1.098e+02 1.227e+02 2.197e+02, threshold=2.197e+02, percent-clipped=0.0 2023-11-18 08:57:17,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=152093.33333333334, ans=0.125 2023-11-18 08:57:21,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=152093.33333333334, ans=0.125 2023-11-18 08:57:22,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=152093.33333333334, ans=0.1 2023-11-18 08:57:24,148 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10800, loss[loss=0.1971, simple_loss=0.2154, pruned_loss=0.07983, audio_tagging_loss=0.009636, over 15405.00 frames. ], tot_loss[loss=0.1255, simple_loss=0.1339, pruned_loss=0.04625, audio_tagging_loss=0.01229, over 3055718.47 frames. ], batch size: 56, lr: 2.47e-02, grad_scale: 32.0 2023-11-18 08:57:25,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=152160.0, ans=0.125 2023-11-18 08:57:38,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2023-11-18 08:58:08,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=152426.66666666666, ans=0.1 2023-11-18 08:58:21,124 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10850, loss[loss=0.09968, simple_loss=0.09851, pruned_loss=0.03083, audio_tagging_loss=0.0196, over 14959.00 frames. ], tot_loss[loss=0.1258, simple_loss=0.1338, pruned_loss=0.04654, audio_tagging_loss=0.01235, over 3046590.61 frames. ], batch size: 60, lr: 2.46e-02, grad_scale: 32.0 2023-11-18 08:58:29,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=152493.33333333334, ans=0.2 2023-11-18 08:58:39,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=152560.0, ans=0.05 2023-11-18 08:58:48,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=152626.66666666666, ans=0.125 2023-11-18 08:58:59,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2023-11-18 08:59:00,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=152693.33333333334, ans=0.1 2023-11-18 08:59:01,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2023-11-18 08:59:08,401 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 1.080e+02 1.224e+02 1.410e+02 3.165e+02, threshold=2.449e+02, percent-clipped=2.0 2023-11-18 08:59:10,637 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:59:17,585 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10900, loss[loss=0.1203, simple_loss=0.1332, pruned_loss=0.04232, audio_tagging_loss=0.01139, over 14158.00 frames. ], tot_loss[loss=0.1257, simple_loss=0.1338, pruned_loss=0.04637, audio_tagging_loss=0.01248, over 3041376.75 frames. ], batch size: 55, lr: 2.46e-02, grad_scale: 32.0 2023-11-18 08:59:29,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=152893.33333333334, ans=0.07 2023-11-18 08:59:34,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=152893.33333333334, ans=0.2 2023-11-18 08:59:37,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=152893.33333333334, ans=0.125 2023-11-18 08:59:52,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=153026.66666666666, ans=0.1 2023-11-18 09:00:02,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2023-11-18 09:00:13,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=153160.0, ans=0.0 2023-11-18 09:00:14,337 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 10950, loss[loss=0.1052, simple_loss=0.1093, pruned_loss=0.03951, audio_tagging_loss=0.01103, over 14823.00 frames. ], tot_loss[loss=0.1249, simple_loss=0.1327, pruned_loss=0.04609, audio_tagging_loss=0.0125, over 3043418.14 frames. ], batch size: 56, lr: 2.46e-02, grad_scale: 32.0 2023-11-18 09:00:41,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=153293.33333333334, ans=0.125 2023-11-18 09:00:44,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2023-11-18 09:00:51,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=153360.0, ans=0.125 2023-11-18 09:00:58,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=153360.0, ans=0.025 2023-11-18 09:01:02,024 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.175e+01 9.744e+01 1.111e+02 1.253e+02 1.675e+02, threshold=2.223e+02, percent-clipped=0.0 2023-11-18 09:01:10,790 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11000, loss[loss=0.096, simple_loss=0.09953, pruned_loss=0.03132, audio_tagging_loss=0.01492, over 13932.00 frames. ], tot_loss[loss=0.1258, simple_loss=0.1338, pruned_loss=0.04637, audio_tagging_loss=0.01249, over 3048088.29 frames. ], batch size: 55, lr: 2.46e-02, grad_scale: 32.0 2023-11-18 09:01:17,740 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:01:22,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2023-11-18 09:01:23,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=153560.0, ans=0.05 2023-11-18 09:01:35,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=153626.66666666666, ans=0.0 2023-11-18 09:01:42,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=153626.66666666666, ans=0.125 2023-11-18 09:02:05,288 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:02:07,782 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11050, loss[loss=0.1419, simple_loss=0.1511, pruned_loss=0.05177, audio_tagging_loss=0.01456, over 14918.00 frames. ], tot_loss[loss=0.1267, simple_loss=0.1346, pruned_loss=0.04677, audio_tagging_loss=0.01259, over 3048693.23 frames. ], batch size: 57, lr: 2.45e-02, grad_scale: 32.0 2023-11-18 09:02:09,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2023-11-18 09:02:16,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=153826.66666666666, ans=0.125 2023-11-18 09:02:39,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=153960.0, ans=0.125 2023-11-18 09:02:40,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=154026.66666666666, ans=0.0 2023-11-18 09:02:48,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=154026.66666666666, ans=0.125 2023-11-18 09:02:50,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=154026.66666666666, ans=0.125 2023-11-18 09:02:55,030 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.060e+01 9.808e+01 1.104e+02 1.219e+02 2.392e+02, threshold=2.208e+02, percent-clipped=1.0 2023-11-18 09:03:04,800 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11100, loss[loss=0.08263, simple_loss=0.07856, pruned_loss=0.02706, audio_tagging_loss=0.01629, over 14548.00 frames. ], tot_loss[loss=0.1264, simple_loss=0.1344, pruned_loss=0.04645, audio_tagging_loss=0.01276, over 3045063.16 frames. ], batch size: 56, lr: 2.45e-02, grad_scale: 32.0 2023-11-18 09:03:20,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=154226.66666666666, ans=0.2 2023-11-18 09:03:32,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=154293.33333333334, ans=0.125 2023-11-18 09:03:39,735 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.22 vs. limit=22.5 2023-11-18 09:03:47,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=154360.0, ans=0.0 2023-11-18 09:04:00,573 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11150, loss[loss=0.1405, simple_loss=0.1505, pruned_loss=0.05235, audio_tagging_loss=0.01291, over 15650.00 frames. ], tot_loss[loss=0.126, simple_loss=0.1338, pruned_loss=0.04618, audio_tagging_loss=0.01288, over 3043056.10 frames. ], batch size: 56, lr: 2.45e-02, grad_scale: 32.0 2023-11-18 09:04:03,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=154493.33333333334, ans=0.125 2023-11-18 09:04:10,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2023-11-18 09:04:30,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.16 vs. limit=15.0 2023-11-18 09:04:34,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=154693.33333333334, ans=0.0 2023-11-18 09:04:35,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=154693.33333333334, ans=0.125 2023-11-18 09:04:35,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=154693.33333333334, ans=0.125 2023-11-18 09:04:36,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.90 vs. limit=22.5 2023-11-18 09:04:39,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=154693.33333333334, ans=0.125 2023-11-18 09:04:40,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=154693.33333333334, ans=0.125 2023-11-18 09:04:45,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=154760.0, ans=0.125 2023-11-18 09:04:48,102 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 1.033e+02 1.136e+02 1.301e+02 2.057e+02, threshold=2.273e+02, percent-clipped=0.0 2023-11-18 09:04:57,167 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11200, loss[loss=0.08322, simple_loss=0.09675, pruned_loss=0.02315, audio_tagging_loss=0.01169, over 15339.00 frames. ], tot_loss[loss=0.1249, simple_loss=0.1329, pruned_loss=0.04558, audio_tagging_loss=0.0129, over 3042115.92 frames. ], batch size: 60, lr: 2.45e-02, grad_scale: 32.0 2023-11-18 09:05:01,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=154826.66666666666, ans=0.2 2023-11-18 09:05:01,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=154826.66666666666, ans=0.1 2023-11-18 09:05:32,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=155026.66666666666, ans=0.1 2023-11-18 09:05:38,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=155026.66666666666, ans=0.2 2023-11-18 09:05:53,932 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11250, loss[loss=0.07733, simple_loss=0.07287, pruned_loss=0.0247, audio_tagging_loss=0.01619, over 15316.00 frames. ], tot_loss[loss=0.1233, simple_loss=0.1309, pruned_loss=0.04506, audio_tagging_loss=0.01275, over 3039784.36 frames. ], batch size: 60, lr: 2.44e-02, grad_scale: 32.0 2023-11-18 09:05:54,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=155160.0, ans=0.125 2023-11-18 09:06:10,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=155226.66666666666, ans=0.125 2023-11-18 09:06:27,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=155360.0, ans=0.0 2023-11-18 09:06:38,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=155426.66666666666, ans=12.0 2023-11-18 09:06:41,312 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.020e+01 9.682e+01 1.104e+02 1.218e+02 1.906e+02, threshold=2.209e+02, percent-clipped=0.0 2023-11-18 09:06:41,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=155426.66666666666, ans=0.1 2023-11-18 09:06:43,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=155426.66666666666, ans=0.125 2023-11-18 09:06:48,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=155426.66666666666, ans=0.125 2023-11-18 09:06:49,991 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11300, loss[loss=0.1228, simple_loss=0.1245, pruned_loss=0.04663, audio_tagging_loss=0.01393, over 15640.00 frames. ], tot_loss[loss=0.1232, simple_loss=0.131, pruned_loss=0.04509, audio_tagging_loss=0.01257, over 3038220.65 frames. ], batch size: 59, lr: 2.44e-02, grad_scale: 32.0 2023-11-18 09:07:13,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.71 vs. limit=22.5 2023-11-18 09:07:35,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=155760.0, ans=0.1 2023-11-18 09:07:38,646 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:07:39,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=155760.0, ans=0.2 2023-11-18 09:07:44,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2023-11-18 09:07:45,915 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11350, loss[loss=0.1001, simple_loss=0.1102, pruned_loss=0.03561, audio_tagging_loss=0.009465, over 14480.00 frames. ], tot_loss[loss=0.1222, simple_loss=0.1299, pruned_loss=0.04475, audio_tagging_loss=0.0125, over 3037046.91 frames. ], batch size: 54, lr: 2.44e-02, grad_scale: 32.0 2023-11-18 09:07:48,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=155826.66666666666, ans=0.035 2023-11-18 09:07:49,862 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:08:03,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=155893.33333333334, ans=0.125 2023-11-18 09:08:11,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=155960.0, ans=0.125 2023-11-18 09:08:33,858 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.925e+01 1.029e+02 1.095e+02 1.224e+02 1.585e+02, threshold=2.190e+02, percent-clipped=0.0 2023-11-18 09:08:36,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=156093.33333333334, ans=0.2 2023-11-18 09:08:43,606 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11400, loss[loss=0.09351, simple_loss=0.1034, pruned_loss=0.02991, audio_tagging_loss=0.01188, over 14358.00 frames. ], tot_loss[loss=0.1216, simple_loss=0.1293, pruned_loss=0.04449, audio_tagging_loss=0.01246, over 3029464.68 frames. ], batch size: 54, lr: 2.44e-02, grad_scale: 32.0 2023-11-18 09:08:44,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2023-11-18 09:08:47,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2023-11-18 09:09:06,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=156293.33333333334, ans=0.125 2023-11-18 09:09:39,837 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11450, loss[loss=0.1037, simple_loss=0.1061, pruned_loss=0.0363, audio_tagging_loss=0.01435, over 14837.00 frames. ], tot_loss[loss=0.1225, simple_loss=0.1304, pruned_loss=0.0449, audio_tagging_loss=0.01234, over 3033327.70 frames. ], batch size: 57, lr: 2.43e-02, grad_scale: 32.0 2023-11-18 09:10:03,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.07 vs. limit=10.0 2023-11-18 09:10:07,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=156626.66666666666, ans=0.0 2023-11-18 09:10:26,555 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.266e+01 9.860e+01 1.075e+02 1.215e+02 1.820e+02, threshold=2.151e+02, percent-clipped=0.0 2023-11-18 09:10:28,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=156760.0, ans=0.125 2023-11-18 09:10:35,117 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11500, loss[loss=0.1169, simple_loss=0.119, pruned_loss=0.04572, audio_tagging_loss=0.01173, over 15452.00 frames. ], tot_loss[loss=0.1234, simple_loss=0.1314, pruned_loss=0.04533, audio_tagging_loss=0.01238, over 3032775.32 frames. ], batch size: 61, lr: 2.43e-02, grad_scale: 32.0 2023-11-18 09:10:37,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=156826.66666666666, ans=0.125 2023-11-18 09:10:38,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=156826.66666666666, ans=0.1 2023-11-18 09:11:04,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2023-11-18 09:11:16,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.66 vs. limit=15.0 2023-11-18 09:11:31,796 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11550, loss[loss=0.1432, simple_loss=0.1674, pruned_loss=0.05402, audio_tagging_loss=0.005531, over 15560.00 frames. ], tot_loss[loss=0.1234, simple_loss=0.1314, pruned_loss=0.04532, audio_tagging_loss=0.01237, over 3039044.33 frames. ], batch size: 58, lr: 2.43e-02, grad_scale: 32.0 2023-11-18 09:11:40,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=157160.0, ans=0.0 2023-11-18 09:11:42,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=157226.66666666666, ans=0.1 2023-11-18 09:11:54,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=157293.33333333334, ans=0.125 2023-11-18 09:12:01,703 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:12:08,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=157360.0, ans=0.125 2023-11-18 09:12:13,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=157360.0, ans=0.2 2023-11-18 09:12:19,806 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.118e+01 1.012e+02 1.136e+02 1.340e+02 1.723e+02, threshold=2.272e+02, percent-clipped=0.0 2023-11-18 09:12:28,351 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11600, loss[loss=0.1171, simple_loss=0.1239, pruned_loss=0.04169, audio_tagging_loss=0.01343, over 16749.00 frames. ], tot_loss[loss=0.123, simple_loss=0.1305, pruned_loss=0.0452, audio_tagging_loss=0.01257, over 3043557.79 frames. ], batch size: 63, lr: 2.43e-02, grad_scale: 32.0 2023-11-18 09:12:49,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=157626.66666666666, ans=0.1 2023-11-18 09:12:49,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2023-11-18 09:12:50,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=157626.66666666666, ans=0.125 2023-11-18 09:13:04,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=157693.33333333334, ans=0.125 2023-11-18 09:13:11,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2023-11-18 09:13:12,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=157760.0, ans=0.07 2023-11-18 09:13:18,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=157760.0, ans=0.0 2023-11-18 09:13:23,783 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11650, loss[loss=0.1643, simple_loss=0.1823, pruned_loss=0.06069, audio_tagging_loss=0.01244, over 14925.00 frames. ], tot_loss[loss=0.1239, simple_loss=0.1316, pruned_loss=0.0455, audio_tagging_loss=0.0126, over 3049709.34 frames. ], batch size: 56, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:14:02,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=158026.66666666666, ans=0.125 2023-11-18 09:14:08,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=158093.33333333334, ans=0.04949747468305833 2023-11-18 09:14:10,051 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 1.030e+02 1.124e+02 1.249e+02 1.579e+02, threshold=2.249e+02, percent-clipped=0.0 2023-11-18 09:14:19,008 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11700, loss[loss=0.1111, simple_loss=0.1216, pruned_loss=0.03606, audio_tagging_loss=0.01422, over 16223.00 frames. ], tot_loss[loss=0.1227, simple_loss=0.1306, pruned_loss=0.04486, audio_tagging_loss=0.01259, over 3049221.36 frames. ], batch size: 59, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:14:24,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=158160.0, ans=0.0 2023-11-18 09:14:25,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=158160.0, ans=0.0 2023-11-18 09:14:28,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2023-11-18 09:14:31,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=22.5 2023-11-18 09:14:36,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=15.0 2023-11-18 09:14:41,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=158293.33333333334, ans=0.0 2023-11-18 09:14:53,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=158360.0, ans=0.125 2023-11-18 09:15:04,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=158426.66666666666, ans=0.125 2023-11-18 09:15:12,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=158426.66666666666, ans=0.125 2023-11-18 09:15:15,190 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11750, loss[loss=0.08699, simple_loss=0.08975, pruned_loss=0.02771, audio_tagging_loss=0.01441, over 14856.00 frames. ], tot_loss[loss=0.123, simple_loss=0.1307, pruned_loss=0.04509, audio_tagging_loss=0.01259, over 3051396.79 frames. ], batch size: 56, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:15:26,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=158560.0, ans=0.125 2023-11-18 09:15:36,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=158626.66666666666, ans=0.125 2023-11-18 09:15:42,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=158626.66666666666, ans=0.02 2023-11-18 09:15:46,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=158626.66666666666, ans=0.125 2023-11-18 09:15:49,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=158693.33333333334, ans=0.2 2023-11-18 09:16:01,614 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 9.907e+01 1.124e+02 1.266e+02 1.981e+02, threshold=2.248e+02, percent-clipped=0.0 2023-11-18 09:16:10,014 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11800, loss[loss=0.102, simple_loss=0.1042, pruned_loss=0.03362, audio_tagging_loss=0.01624, over 14160.00 frames. ], tot_loss[loss=0.1235, simple_loss=0.1311, pruned_loss=0.04524, audio_tagging_loss=0.01273, over 3045844.79 frames. ], batch size: 55, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:16:47,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2023-11-18 09:16:56,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=159093.33333333334, ans=0.125 2023-11-18 09:17:05,604 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11850, loss[loss=0.1121, simple_loss=0.1191, pruned_loss=0.03695, audio_tagging_loss=0.01563, over 15329.00 frames. ], tot_loss[loss=0.1226, simple_loss=0.13, pruned_loss=0.04472, audio_tagging_loss=0.01293, over 3048891.81 frames. ], batch size: 58, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:17:07,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=159160.0, ans=0.1 2023-11-18 09:17:48,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=159360.0, ans=0.0 2023-11-18 09:17:52,693 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.196e+01 1.014e+02 1.138e+02 1.282e+02 2.288e+02, threshold=2.275e+02, percent-clipped=1.0 2023-11-18 09:17:57,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.80 vs. limit=6.0 2023-11-18 09:18:01,627 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11900, loss[loss=0.09781, simple_loss=0.1136, pruned_loss=0.02754, audio_tagging_loss=0.01347, over 16016.00 frames. ], tot_loss[loss=0.1228, simple_loss=0.1303, pruned_loss=0.04474, audio_tagging_loss=0.01292, over 3046173.70 frames. ], batch size: 59, lr: 2.41e-02, grad_scale: 32.0 2023-11-18 09:18:24,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=159626.66666666666, ans=0.125 2023-11-18 09:18:27,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=159626.66666666666, ans=0.125 2023-11-18 09:18:30,387 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.24 vs. limit=22.5 2023-11-18 09:18:43,145 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=12.0 2023-11-18 09:18:44,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.33 vs. limit=22.5 2023-11-18 09:18:56,790 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 11950, loss[loss=0.09818, simple_loss=0.09867, pruned_loss=0.03175, audio_tagging_loss=0.0171, over 16753.00 frames. ], tot_loss[loss=0.1227, simple_loss=0.13, pruned_loss=0.04462, audio_tagging_loss=0.01303, over 3052064.20 frames. ], batch size: 66, lr: 2.41e-02, grad_scale: 32.0 2023-11-18 09:19:05,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=159826.66666666666, ans=0.125 2023-11-18 09:19:44,164 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.182e+01 9.874e+01 1.073e+02 1.187e+02 1.717e+02, threshold=2.145e+02, percent-clipped=0.0 2023-11-18 09:19:52,773 INFO [train_asr.py:1115] (1/4) Epoch 2, batch 12000, loss[loss=0.1156, simple_loss=0.1219, pruned_loss=0.04282, audio_tagging_loss=0.01177, over 15193.00 frames. ], tot_loss[loss=0.1226, simple_loss=0.1297, pruned_loss=0.04465, audio_tagging_loss=0.01306, over 3054704.27 frames. ], batch size: 56, lr: 2.41e-02, grad_scale: 32.0 2023-11-18 09:19:52,773 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 09:20:26,773 INFO [train_asr.py:1147] (1/4) Epoch 2, validation: loss=0.08437, simple_loss=0.06733, pruned_loss=0.01363, audio_tagging_loss=0.03708, over 4681554.00 frames. 2023-11-18 09:20:26,774 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 09:20:32,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.84 vs. limit=10.0 2023-11-18 09:20:37,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=160226.66666666666, ans=0.125 2023-11-18 09:21:26,868 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 0, loss[loss=0.1361, simple_loss=0.1324, pruned_loss=0.04047, audio_tagging_loss=0.02946, over 15779.00 frames. ], tot_loss[loss=0.1361, simple_loss=0.1324, pruned_loss=0.04047, audio_tagging_loss=0.02946, over 15779.00 frames. ], batch size: 56, lr: 2.29e-02, grad_scale: 32.0 2023-11-18 09:21:26,868 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 09:21:58,053 INFO [train_asr.py:1147] (1/4) Epoch 3, validation: loss=0.08217, simple_loss=0.06725, pruned_loss=0.01375, audio_tagging_loss=0.03479, over 4681554.00 frames. 2023-11-18 09:21:58,053 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 09:22:03,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=160300.0, ans=0.2 2023-11-18 09:22:03,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=160300.0, ans=0.125 2023-11-18 09:22:07,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=160300.0, ans=0.0 2023-11-18 09:22:10,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=160366.66666666666, ans=0.1 2023-11-18 09:22:14,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=160366.66666666666, ans=0.04949747468305833 2023-11-18 09:22:20,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=160433.33333333334, ans=0.0 2023-11-18 09:22:30,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=160500.0, ans=0.125 2023-11-18 09:22:45,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=160566.66666666666, ans=0.125 2023-11-18 09:22:53,017 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 50, loss[loss=0.1401, simple_loss=0.1388, pruned_loss=0.04763, audio_tagging_loss=0.02302, over 15854.00 frames. ], tot_loss[loss=0.1321, simple_loss=0.1272, pruned_loss=0.04349, audio_tagging_loss=0.02497, over 680808.98 frames. ], batch size: 58, lr: 2.29e-02, grad_scale: 32.0 2023-11-18 09:22:57,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=160633.33333333334, ans=0.125 2023-11-18 09:22:58,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=160633.33333333334, ans=0.125 2023-11-18 09:23:16,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.190e+01 1.036e+02 1.137e+02 1.326e+02 1.917e+02, threshold=2.275e+02, percent-clipped=0.0 2023-11-18 09:23:42,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.56 vs. limit=15.0 2023-11-18 09:23:47,690 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 100, loss[loss=0.08152, simple_loss=0.06313, pruned_loss=0.02121, audio_tagging_loss=0.02875, over 14089.00 frames. ], tot_loss[loss=0.129, simple_loss=0.1263, pruned_loss=0.04198, audio_tagging_loss=0.02385, over 1207494.53 frames. ], batch size: 55, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:23:51,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=160966.66666666666, ans=0.125 2023-11-18 09:23:59,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=161033.33333333334, ans=0.0 2023-11-18 09:24:21,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=161166.66666666666, ans=0.0 2023-11-18 09:24:29,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=12.0 2023-11-18 09:24:33,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2023-11-18 09:24:36,456 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:24:43,641 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 150, loss[loss=0.1172, simple_loss=0.1242, pruned_loss=0.03796, audio_tagging_loss=0.01712, over 14325.00 frames. ], tot_loss[loss=0.1272, simple_loss=0.1275, pruned_loss=0.04236, audio_tagging_loss=0.02105, over 1616545.32 frames. ], batch size: 54, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:24:50,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=161300.0, ans=10.0 2023-11-18 09:25:06,615 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 1.007e+02 1.136e+02 1.298e+02 1.875e+02, threshold=2.273e+02, percent-clipped=0.0 2023-11-18 09:25:08,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=161433.33333333334, ans=0.2 2023-11-18 09:25:29,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=161566.66666666666, ans=0.0 2023-11-18 09:25:39,259 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 200, loss[loss=0.1393, simple_loss=0.1466, pruned_loss=0.05442, audio_tagging_loss=0.01157, over 14475.00 frames. ], tot_loss[loss=0.1273, simple_loss=0.1299, pruned_loss=0.0437, audio_tagging_loss=0.01859, over 1925121.80 frames. ], batch size: 55, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:25:58,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=161700.0, ans=0.125 2023-11-18 09:26:07,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=12.0 2023-11-18 09:26:07,817 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2023-11-18 09:26:08,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=161766.66666666666, ans=0.0 2023-11-18 09:26:33,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2023-11-18 09:26:34,567 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 250, loss[loss=0.1316, simple_loss=0.1444, pruned_loss=0.05118, audio_tagging_loss=0.008238, over 15012.00 frames. ], tot_loss[loss=0.1262, simple_loss=0.1304, pruned_loss=0.04431, audio_tagging_loss=0.01672, over 2176433.14 frames. ], batch size: 58, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:26:45,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.17 vs. limit=6.0 2023-11-18 09:26:46,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=162033.33333333334, ans=0.125 2023-11-18 09:26:57,827 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.024e+01 1.002e+02 1.144e+02 1.310e+02 1.731e+02, threshold=2.288e+02, percent-clipped=0.0 2023-11-18 09:27:03,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=162100.0, ans=0.125 2023-11-18 09:27:07,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=162166.66666666666, ans=0.2 2023-11-18 09:27:10,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=162166.66666666666, ans=0.0 2023-11-18 09:27:18,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=162233.33333333334, ans=0.0 2023-11-18 09:27:27,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=162233.33333333334, ans=0.2 2023-11-18 09:27:28,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=162233.33333333334, ans=0.035 2023-11-18 09:27:28,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=162233.33333333334, ans=0.0 2023-11-18 09:27:30,574 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 300, loss[loss=0.1073, simple_loss=0.1106, pruned_loss=0.03908, audio_tagging_loss=0.01292, over 15545.00 frames. ], tot_loss[loss=0.1256, simple_loss=0.1313, pruned_loss=0.04463, audio_tagging_loss=0.01537, over 2373515.30 frames. ], batch size: 59, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:27:44,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=162366.66666666666, ans=0.1 2023-11-18 09:27:46,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.04 vs. limit=6.0 2023-11-18 09:27:54,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=162433.33333333334, ans=0.125 2023-11-18 09:27:58,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=162433.33333333334, ans=0.125 2023-11-18 09:27:59,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=162433.33333333334, ans=0.125 2023-11-18 09:28:22,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=162566.66666666666, ans=0.2 2023-11-18 09:28:25,387 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 350, loss[loss=0.1394, simple_loss=0.1425, pruned_loss=0.05805, audio_tagging_loss=0.01013, over 15825.00 frames. ], tot_loss[loss=0.1252, simple_loss=0.1321, pruned_loss=0.04474, audio_tagging_loss=0.01436, over 2524631.91 frames. ], batch size: 57, lr: 2.27e-02, grad_scale: 64.0 2023-11-18 09:28:26,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=162633.33333333334, ans=0.1 2023-11-18 09:28:50,147 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.758e+01 9.861e+01 1.085e+02 1.214e+02 1.858e+02, threshold=2.170e+02, percent-clipped=0.0 2023-11-18 09:28:50,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=162766.66666666666, ans=0.125 2023-11-18 09:29:21,424 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 400, loss[loss=0.1403, simple_loss=0.1586, pruned_loss=0.05152, audio_tagging_loss=0.009488, over 15788.00 frames. ], tot_loss[loss=0.1242, simple_loss=0.1322, pruned_loss=0.04443, audio_tagging_loss=0.01373, over 2647122.63 frames. ], batch size: 56, lr: 2.27e-02, grad_scale: 64.0 2023-11-18 09:29:34,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=163033.33333333334, ans=0.09899494936611666 2023-11-18 09:29:43,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=163100.0, ans=0.125 2023-11-18 09:29:52,224 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2023-11-18 09:30:07,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.68 vs. limit=22.5 2023-11-18 09:30:18,235 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 450, loss[loss=0.1299, simple_loss=0.1461, pruned_loss=0.04673, audio_tagging_loss=0.0101, over 15642.00 frames. ], tot_loss[loss=0.1235, simple_loss=0.1317, pruned_loss=0.04434, audio_tagging_loss=0.01332, over 2732967.62 frames. ], batch size: 57, lr: 2.27e-02, grad_scale: 64.0 2023-11-18 09:30:21,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=163300.0, ans=0.125 2023-11-18 09:30:38,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=163433.33333333334, ans=0.125 2023-11-18 09:30:40,028 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.316e+01 9.836e+01 1.125e+02 1.262e+02 2.640e+02, threshold=2.251e+02, percent-clipped=1.0 2023-11-18 09:30:43,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.00 vs. limit=15.0 2023-11-18 09:30:44,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=163433.33333333334, ans=0.125 2023-11-18 09:30:46,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=163433.33333333334, ans=0.0 2023-11-18 09:31:11,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2023-11-18 09:31:12,858 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 500, loss[loss=0.1264, simple_loss=0.1335, pruned_loss=0.04637, audio_tagging_loss=0.01324, over 15069.00 frames. ], tot_loss[loss=0.1226, simple_loss=0.1308, pruned_loss=0.04412, audio_tagging_loss=0.01313, over 2803354.28 frames. ], batch size: 56, lr: 2.27e-02, grad_scale: 64.0 2023-11-18 09:31:25,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.45 vs. limit=15.0 2023-11-18 09:31:27,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=163700.0, ans=0.1 2023-11-18 09:31:32,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=163700.0, ans=0.2 2023-11-18 09:32:07,425 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 550, loss[loss=0.102, simple_loss=0.1009, pruned_loss=0.03525, audio_tagging_loss=0.01628, over 14954.00 frames. ], tot_loss[loss=0.1209, simple_loss=0.1288, pruned_loss=0.04342, audio_tagging_loss=0.01305, over 2859294.05 frames. ], batch size: 57, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:32:12,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=163966.66666666666, ans=0.07 2023-11-18 09:32:27,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=164033.33333333334, ans=0.2 2023-11-18 09:32:29,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=12.0 2023-11-18 09:32:31,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 9.566e+01 1.089e+02 1.252e+02 1.679e+02, threshold=2.177e+02, percent-clipped=0.0 2023-11-18 09:32:45,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=164166.66666666666, ans=0.1 2023-11-18 09:32:46,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.92 vs. limit=6.0 2023-11-18 09:32:53,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=22.5 2023-11-18 09:32:55,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=164233.33333333334, ans=0.07 2023-11-18 09:32:55,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=164233.33333333334, ans=0.125 2023-11-18 09:33:03,671 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 600, loss[loss=0.09847, simple_loss=0.1048, pruned_loss=0.03274, audio_tagging_loss=0.01334, over 14324.00 frames. ], tot_loss[loss=0.1207, simple_loss=0.1287, pruned_loss=0.04342, audio_tagging_loss=0.01295, over 2902239.58 frames. ], batch size: 55, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:33:05,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=164300.0, ans=0.2 2023-11-18 09:33:13,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2023-11-18 09:33:43,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=164500.0, ans=0.2 2023-11-18 09:33:46,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=164566.66666666666, ans=0.125 2023-11-18 09:33:51,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=164566.66666666666, ans=0.2 2023-11-18 09:33:57,686 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 650, loss[loss=0.09987, simple_loss=0.1022, pruned_loss=0.03422, audio_tagging_loss=0.01455, over 14981.00 frames. ], tot_loss[loss=0.1215, simple_loss=0.1298, pruned_loss=0.04365, audio_tagging_loss=0.01291, over 2929765.48 frames. ], batch size: 55, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:34:01,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=164633.33333333334, ans=0.1 2023-11-18 09:34:04,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=164633.33333333334, ans=0.0 2023-11-18 09:34:12,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=164700.0, ans=0.125 2023-11-18 09:34:20,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 9.683e+01 1.100e+02 1.220e+02 1.764e+02, threshold=2.199e+02, percent-clipped=0.0 2023-11-18 09:34:27,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=164766.66666666666, ans=0.2 2023-11-18 09:34:30,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=164833.33333333334, ans=0.125 2023-11-18 09:34:34,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=22.5 2023-11-18 09:34:52,270 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 700, loss[loss=0.1425, simple_loss=0.1562, pruned_loss=0.05085, audio_tagging_loss=0.01357, over 16065.00 frames. ], tot_loss[loss=0.1213, simple_loss=0.1298, pruned_loss=0.04359, audio_tagging_loss=0.01276, over 2956765.11 frames. ], batch size: 58, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:35:07,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=165033.33333333334, ans=0.125 2023-11-18 09:35:30,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=165166.66666666666, ans=0.0 2023-11-18 09:35:34,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=165166.66666666666, ans=0.0 2023-11-18 09:35:37,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=165233.33333333334, ans=0.125 2023-11-18 09:35:39,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=165233.33333333334, ans=0.0 2023-11-18 09:35:48,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=165300.0, ans=0.125 2023-11-18 09:35:49,129 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 750, loss[loss=0.1261, simple_loss=0.1399, pruned_loss=0.04355, audio_tagging_loss=0.01261, over 15632.00 frames. ], tot_loss[loss=0.1221, simple_loss=0.131, pruned_loss=0.04391, audio_tagging_loss=0.01271, over 2981371.88 frames. ], batch size: 58, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:35:51,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=165300.0, ans=0.125 2023-11-18 09:36:05,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=165366.66666666666, ans=0.2 2023-11-18 09:36:11,467 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.593e+01 1.007e+02 1.126e+02 1.277e+02 1.870e+02, threshold=2.252e+02, percent-clipped=0.0 2023-11-18 09:36:16,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=165433.33333333334, ans=0.0 2023-11-18 09:36:44,358 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 800, loss[loss=0.07926, simple_loss=0.09376, pruned_loss=0.01968, audio_tagging_loss=0.01271, over 15407.00 frames. ], tot_loss[loss=0.1219, simple_loss=0.1305, pruned_loss=0.0439, audio_tagging_loss=0.01276, over 2996342.88 frames. ], batch size: 60, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:36:53,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=165633.33333333334, ans=0.125 2023-11-18 09:37:39,013 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 850, loss[loss=0.1431, simple_loss=0.16, pruned_loss=0.04937, audio_tagging_loss=0.01368, over 15376.00 frames. ], tot_loss[loss=0.1211, simple_loss=0.1296, pruned_loss=0.04337, audio_tagging_loss=0.0129, over 3006714.36 frames. ], batch size: 55, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:37:41,294 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:37:55,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=166033.33333333334, ans=0.125 2023-11-18 09:37:59,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=166033.33333333334, ans=0.125 2023-11-18 09:38:02,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=12.0 2023-11-18 09:38:03,430 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.295e+01 1.046e+02 1.125e+02 1.279e+02 2.412e+02, threshold=2.250e+02, percent-clipped=1.0 2023-11-18 09:38:05,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=12.0 2023-11-18 09:38:18,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=166166.66666666666, ans=0.07 2023-11-18 09:38:19,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=166166.66666666666, ans=0.0 2023-11-18 09:38:21,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=166166.66666666666, ans=0.125 2023-11-18 09:38:32,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=166233.33333333334, ans=0.2 2023-11-18 09:38:35,535 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 900, loss[loss=0.1608, simple_loss=0.1783, pruned_loss=0.05971, audio_tagging_loss=0.01199, over 13972.00 frames. ], tot_loss[loss=0.1219, simple_loss=0.1307, pruned_loss=0.04372, audio_tagging_loss=0.01283, over 3025538.44 frames. ], batch size: 55, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:38:44,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=166300.0, ans=0.125 2023-11-18 09:39:00,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=166433.33333333334, ans=0.125 2023-11-18 09:39:00,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=15.0 2023-11-18 09:39:04,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.87 vs. limit=10.0 2023-11-18 09:39:08,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2023-11-18 09:39:12,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=166500.0, ans=0.0 2023-11-18 09:39:13,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=166500.0, ans=0.0 2023-11-18 09:39:31,299 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 950, loss[loss=0.1266, simple_loss=0.1376, pruned_loss=0.04686, audio_tagging_loss=0.01093, over 15202.00 frames. ], tot_loss[loss=0.1216, simple_loss=0.1308, pruned_loss=0.04355, audio_tagging_loss=0.01267, over 3027473.17 frames. ], batch size: 56, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:39:33,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=166633.33333333334, ans=0.0 2023-11-18 09:39:54,131 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 9.509e+01 1.090e+02 1.237e+02 1.820e+02, threshold=2.179e+02, percent-clipped=0.0 2023-11-18 09:39:59,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=166766.66666666666, ans=0.125 2023-11-18 09:40:04,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=166833.33333333334, ans=0.0 2023-11-18 09:40:10,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=166833.33333333334, ans=10.0 2023-11-18 09:40:26,282 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1000, loss[loss=0.1176, simple_loss=0.1311, pruned_loss=0.04107, audio_tagging_loss=0.01094, over 16010.00 frames. ], tot_loss[loss=0.1219, simple_loss=0.1308, pruned_loss=0.04403, audio_tagging_loss=0.01249, over 3026753.49 frames. ], batch size: 61, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:40:44,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=15.0 2023-11-18 09:40:50,182 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:40:54,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=167100.0, ans=0.125 2023-11-18 09:41:22,340 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1050, loss[loss=0.06746, simple_loss=0.05975, pruned_loss=0.02213, audio_tagging_loss=0.01545, over 15022.00 frames. ], tot_loss[loss=0.1218, simple_loss=0.1306, pruned_loss=0.04411, audio_tagging_loss=0.01242, over 3028527.25 frames. ], batch size: 59, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:41:32,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.09 vs. limit=6.0 2023-11-18 09:41:33,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.65 vs. limit=12.0 2023-11-18 09:41:40,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=167366.66666666666, ans=0.125 2023-11-18 09:41:45,461 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.835e+01 9.727e+01 1.056e+02 1.215e+02 1.619e+02, threshold=2.112e+02, percent-clipped=0.0 2023-11-18 09:41:45,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=167433.33333333334, ans=0.2 2023-11-18 09:41:45,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=167433.33333333334, ans=0.125 2023-11-18 09:42:02,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=167500.0, ans=0.2 2023-11-18 09:42:05,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167566.66666666666, ans=0.1 2023-11-18 09:42:18,376 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1100, loss[loss=0.08795, simple_loss=0.08818, pruned_loss=0.02858, audio_tagging_loss=0.01528, over 16060.00 frames. ], tot_loss[loss=0.1222, simple_loss=0.1314, pruned_loss=0.04412, audio_tagging_loss=0.01236, over 3031206.07 frames. ], batch size: 63, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:42:21,529 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:42:22,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.71 vs. limit=15.0 2023-11-18 09:42:33,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=167700.0, ans=0.2 2023-11-18 09:42:43,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=167766.66666666666, ans=0.125 2023-11-18 09:42:44,731 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.285e-01 2023-11-18 09:42:52,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=167833.33333333334, ans=0.025 2023-11-18 09:43:01,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=167900.0, ans=0.125 2023-11-18 09:43:13,561 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1150, loss[loss=0.08959, simple_loss=0.1005, pruned_loss=0.02898, audio_tagging_loss=0.01036, over 14302.00 frames. ], tot_loss[loss=0.1204, simple_loss=0.1292, pruned_loss=0.04344, audio_tagging_loss=0.0124, over 3029466.87 frames. ], batch size: 53, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:43:20,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=167966.66666666666, ans=0.125 2023-11-18 09:43:37,785 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 9.862e+01 1.112e+02 1.270e+02 2.649e+02, threshold=2.225e+02, percent-clipped=1.0 2023-11-18 09:43:42,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=168100.0, ans=0.09899494936611666 2023-11-18 09:43:50,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=168166.66666666666, ans=0.125 2023-11-18 09:44:00,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=168233.33333333334, ans=0.0 2023-11-18 09:44:08,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=168300.0, ans=0.1 2023-11-18 09:44:09,518 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1200, loss[loss=0.09101, simple_loss=0.1028, pruned_loss=0.02888, audio_tagging_loss=0.01073, over 14303.00 frames. ], tot_loss[loss=0.1202, simple_loss=0.1292, pruned_loss=0.04332, audio_tagging_loss=0.01227, over 3029807.40 frames. ], batch size: 55, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:44:23,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=168366.66666666666, ans=0.1 2023-11-18 09:44:32,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=168433.33333333334, ans=0.2 2023-11-18 09:44:52,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=22.5 2023-11-18 09:44:57,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=168566.66666666666, ans=0.125 2023-11-18 09:45:05,695 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1250, loss[loss=0.1005, simple_loss=0.108, pruned_loss=0.03021, audio_tagging_loss=0.01624, over 15529.00 frames. ], tot_loss[loss=0.1197, simple_loss=0.1286, pruned_loss=0.04313, audio_tagging_loss=0.01224, over 3033003.19 frames. ], batch size: 59, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:45:07,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2023-11-18 09:45:23,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.70 vs. limit=10.0 2023-11-18 09:45:26,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2023-11-18 09:45:28,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.825e+01 1.002e+02 1.131e+02 1.253e+02 1.979e+02, threshold=2.263e+02, percent-clipped=0.0 2023-11-18 09:45:43,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=168833.33333333334, ans=0.1 2023-11-18 09:45:44,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=168833.33333333334, ans=0.125 2023-11-18 09:46:00,854 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1300, loss[loss=0.1251, simple_loss=0.1449, pruned_loss=0.0425, audio_tagging_loss=0.01016, over 16187.00 frames. ], tot_loss[loss=0.1185, simple_loss=0.1274, pruned_loss=0.04259, audio_tagging_loss=0.01217, over 3036994.14 frames. ], batch size: 59, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:46:06,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=168966.66666666666, ans=0.125 2023-11-18 09:46:06,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=168966.66666666666, ans=0.1 2023-11-18 09:46:55,924 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1350, loss[loss=0.1499, simple_loss=0.1579, pruned_loss=0.05969, audio_tagging_loss=0.01123, over 15683.00 frames. ], tot_loss[loss=0.1197, simple_loss=0.1286, pruned_loss=0.04324, audio_tagging_loss=0.01219, over 3039265.12 frames. ], batch size: 57, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:47:02,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=169300.0, ans=0.125 2023-11-18 09:47:17,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=169366.66666666666, ans=0.125 2023-11-18 09:47:18,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=169433.33333333334, ans=0.1 2023-11-18 09:47:19,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=169433.33333333334, ans=0.125 2023-11-18 09:47:19,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.798e+01 9.482e+01 1.049e+02 1.147e+02 1.889e+02, threshold=2.098e+02, percent-clipped=0.0 2023-11-18 09:47:25,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=15.0 2023-11-18 09:47:27,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=169433.33333333334, ans=0.1 2023-11-18 09:47:37,377 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:47:44,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=169566.66666666666, ans=0.0 2023-11-18 09:47:50,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=169566.66666666666, ans=0.125 2023-11-18 09:47:52,633 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1400, loss[loss=0.1512, simple_loss=0.1588, pruned_loss=0.05507, audio_tagging_loss=0.01667, over 14553.00 frames. ], tot_loss[loss=0.1212, simple_loss=0.1302, pruned_loss=0.04385, audio_tagging_loss=0.01219, over 3041186.93 frames. ], batch size: 55, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:48:06,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=169700.0, ans=0.0 2023-11-18 09:48:21,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2023-11-18 09:48:47,402 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1450, loss[loss=0.09091, simple_loss=0.09405, pruned_loss=0.03136, audio_tagging_loss=0.01252, over 14954.00 frames. ], tot_loss[loss=0.1206, simple_loss=0.1297, pruned_loss=0.04341, audio_tagging_loss=0.01231, over 3042708.48 frames. ], batch size: 58, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:48:51,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=169966.66666666666, ans=0.125 2023-11-18 09:48:52,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=169966.66666666666, ans=0.125 2023-11-18 09:48:53,486 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:49:11,546 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.160e+01 9.587e+01 1.090e+02 1.197e+02 1.611e+02, threshold=2.181e+02, percent-clipped=0.0 2023-11-18 09:49:13,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=170100.0, ans=0.0 2023-11-18 09:49:36,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=170233.33333333334, ans=0.2 2023-11-18 09:49:42,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=15.0 2023-11-18 09:49:42,817 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1500, loss[loss=0.1381, simple_loss=0.1473, pruned_loss=0.05037, audio_tagging_loss=0.01413, over 14495.00 frames. ], tot_loss[loss=0.1203, simple_loss=0.1296, pruned_loss=0.04308, audio_tagging_loss=0.01241, over 3043238.18 frames. ], batch size: 58, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:49:50,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.52 vs. limit=10.0 2023-11-18 09:50:02,022 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.68 vs. limit=22.5 2023-11-18 09:50:04,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=170433.33333333334, ans=0.2 2023-11-18 09:50:05,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=170433.33333333334, ans=0.1 2023-11-18 09:50:24,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=170500.0, ans=0.0 2023-11-18 09:50:27,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=170566.66666666666, ans=0.1 2023-11-18 09:50:32,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=170566.66666666666, ans=0.0 2023-11-18 09:50:39,363 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1550, loss[loss=0.08257, simple_loss=0.07808, pruned_loss=0.02754, audio_tagging_loss=0.01599, over 16549.00 frames. ], tot_loss[loss=0.1204, simple_loss=0.1295, pruned_loss=0.0432, audio_tagging_loss=0.01245, over 3039796.67 frames. ], batch size: 65, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:50:59,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=170766.66666666666, ans=0.125 2023-11-18 09:50:59,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=170766.66666666666, ans=0.125 2023-11-18 09:51:02,091 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 1.007e+02 1.092e+02 1.205e+02 1.689e+02, threshold=2.183e+02, percent-clipped=0.0 2023-11-18 09:51:09,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=170766.66666666666, ans=0.125 2023-11-18 09:51:21,677 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:51:24,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=170900.0, ans=0.125 2023-11-18 09:51:32,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=170900.0, ans=0.125 2023-11-18 09:51:34,220 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1600, loss[loss=0.1209, simple_loss=0.1336, pruned_loss=0.04257, audio_tagging_loss=0.01148, over 16569.00 frames. ], tot_loss[loss=0.1195, simple_loss=0.1283, pruned_loss=0.04278, audio_tagging_loss=0.01262, over 3042842.89 frames. ], batch size: 62, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:51:58,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=171100.0, ans=0.125 2023-11-18 09:52:29,521 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1650, loss[loss=0.1515, simple_loss=0.1667, pruned_loss=0.05557, audio_tagging_loss=0.01261, over 14833.00 frames. ], tot_loss[loss=0.1198, simple_loss=0.1285, pruned_loss=0.04284, audio_tagging_loss=0.01275, over 3045875.92 frames. ], batch size: 56, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:52:30,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=171300.0, ans=0.0 2023-11-18 09:52:31,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=171300.0, ans=0.125 2023-11-18 09:52:50,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=171366.66666666666, ans=22.5 2023-11-18 09:52:53,342 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.843e+01 9.689e+01 1.063e+02 1.242e+02 1.763e+02, threshold=2.126e+02, percent-clipped=0.0 2023-11-18 09:52:53,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=171433.33333333334, ans=0.2 2023-11-18 09:52:57,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.98 vs. limit=15.0 2023-11-18 09:52:57,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.53 vs. limit=22.5 2023-11-18 09:53:03,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=171500.0, ans=0.125 2023-11-18 09:53:08,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.14 vs. limit=22.5 2023-11-18 09:53:14,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=171566.66666666666, ans=0.0 2023-11-18 09:53:18,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=171566.66666666666, ans=0.0 2023-11-18 09:53:26,187 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1700, loss[loss=0.1261, simple_loss=0.1351, pruned_loss=0.04749, audio_tagging_loss=0.01105, over 16231.00 frames. ], tot_loss[loss=0.12, simple_loss=0.1286, pruned_loss=0.04304, audio_tagging_loss=0.0127, over 3047753.22 frames. ], batch size: 61, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:53:29,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2023-11-18 09:53:32,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=171633.33333333334, ans=0.125 2023-11-18 09:53:35,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=171700.0, ans=0.0 2023-11-18 09:53:38,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.61 vs. limit=22.5 2023-11-18 09:53:44,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.56 vs. limit=10.0 2023-11-18 09:53:48,423 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:53:51,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2023-11-18 09:54:17,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=171900.0, ans=0.125 2023-11-18 09:54:18,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-11-18 09:54:20,928 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1750, loss[loss=0.1096, simple_loss=0.1218, pruned_loss=0.03546, audio_tagging_loss=0.0132, over 14760.00 frames. ], tot_loss[loss=0.1199, simple_loss=0.1288, pruned_loss=0.04295, audio_tagging_loss=0.0126, over 3047984.49 frames. ], batch size: 54, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:54:28,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=12.0 2023-11-18 09:54:44,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 9.833e+01 1.116e+02 1.265e+02 1.757e+02, threshold=2.232e+02, percent-clipped=0.0 2023-11-18 09:54:46,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=172100.0, ans=0.125 2023-11-18 09:55:03,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=172166.66666666666, ans=0.0 2023-11-18 09:55:16,037 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1800, loss[loss=0.1274, simple_loss=0.1414, pruned_loss=0.04597, audio_tagging_loss=0.01074, over 15942.00 frames. ], tot_loss[loss=0.1208, simple_loss=0.1297, pruned_loss=0.0436, audio_tagging_loss=0.01231, over 3047159.01 frames. ], batch size: 60, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:55:20,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=172300.0, ans=0.125 2023-11-18 09:55:24,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2023-11-18 09:55:39,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=172433.33333333334, ans=0.0 2023-11-18 09:56:12,366 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1850, loss[loss=0.1077, simple_loss=0.1116, pruned_loss=0.03918, audio_tagging_loss=0.01268, over 15619.00 frames. ], tot_loss[loss=0.1204, simple_loss=0.1297, pruned_loss=0.04339, audio_tagging_loss=0.01215, over 3049700.75 frames. ], batch size: 60, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:56:29,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=172700.0, ans=0.1 2023-11-18 09:56:31,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=172700.0, ans=0.05 2023-11-18 09:56:34,450 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.476e+01 9.349e+01 1.016e+02 1.150e+02 1.872e+02, threshold=2.031e+02, percent-clipped=0.0 2023-11-18 09:56:51,169 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.410e+00 2023-11-18 09:56:55,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=172900.0, ans=0.2 2023-11-18 09:56:58,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172900.0, ans=0.1 2023-11-18 09:56:59,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=172900.0, ans=0.0 2023-11-18 09:57:04,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=172900.0, ans=0.125 2023-11-18 09:57:07,223 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1900, loss[loss=0.1206, simple_loss=0.1327, pruned_loss=0.04476, audio_tagging_loss=0.009499, over 16024.00 frames. ], tot_loss[loss=0.1204, simple_loss=0.1297, pruned_loss=0.04339, audio_tagging_loss=0.01214, over 3050077.36 frames. ], batch size: 58, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:57:19,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=173033.33333333334, ans=0.2 2023-11-18 09:57:25,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2023-11-18 09:57:25,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.62 vs. limit=22.5 2023-11-18 09:57:42,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=173166.66666666666, ans=0.025 2023-11-18 09:57:46,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=173166.66666666666, ans=0.125 2023-11-18 09:57:51,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=173233.33333333334, ans=0.1 2023-11-18 09:57:58,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=173233.33333333334, ans=0.2 2023-11-18 09:58:02,651 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 1950, loss[loss=0.1185, simple_loss=0.1242, pruned_loss=0.04259, audio_tagging_loss=0.01382, over 15152.00 frames. ], tot_loss[loss=0.1199, simple_loss=0.1289, pruned_loss=0.04319, audio_tagging_loss=0.01224, over 3049268.26 frames. ], batch size: 56, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:58:08,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=173300.0, ans=0.2 2023-11-18 09:58:22,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=173366.66666666666, ans=0.0 2023-11-18 09:58:26,728 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.049e+01 9.559e+01 1.056e+02 1.197e+02 1.715e+02, threshold=2.112e+02, percent-clipped=0.0 2023-11-18 09:58:26,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=173433.33333333334, ans=0.125 2023-11-18 09:58:49,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2023-11-18 09:58:58,537 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2000, loss[loss=0.1228, simple_loss=0.1378, pruned_loss=0.04491, audio_tagging_loss=0.009009, over 15243.00 frames. ], tot_loss[loss=0.1188, simple_loss=0.1276, pruned_loss=0.04267, audio_tagging_loss=0.01235, over 3040491.33 frames. ], batch size: 58, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:59:10,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.30 vs. limit=22.5 2023-11-18 09:59:14,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.25 vs. limit=15.0 2023-11-18 09:59:18,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.92 vs. limit=22.5 2023-11-18 09:59:44,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=173900.0, ans=0.125 2023-11-18 09:59:48,965 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:59:48,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=173900.0, ans=0.125 2023-11-18 09:59:53,991 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2050, loss[loss=0.09568, simple_loss=0.09469, pruned_loss=0.03112, audio_tagging_loss=0.01721, over 15369.00 frames. ], tot_loss[loss=0.1189, simple_loss=0.1274, pruned_loss=0.04285, audio_tagging_loss=0.01236, over 3033292.77 frames. ], batch size: 60, lr: 2.20e-02, grad_scale: 64.0 2023-11-18 10:00:09,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=174033.33333333334, ans=0.04949747468305833 2023-11-18 10:00:15,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=174100.0, ans=0.125 2023-11-18 10:00:16,276 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.215e+01 1.049e+02 1.194e+02 1.365e+02 2.043e+02, threshold=2.387e+02, percent-clipped=0.0 2023-11-18 10:00:48,869 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2100, loss[loss=0.1051, simple_loss=0.1063, pruned_loss=0.03766, audio_tagging_loss=0.01428, over 14015.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1275, pruned_loss=0.04238, audio_tagging_loss=0.01229, over 3031256.81 frames. ], batch size: 54, lr: 2.20e-02, grad_scale: 128.0 2023-11-18 10:01:01,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=174366.66666666666, ans=0.2 2023-11-18 10:01:15,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.05 vs. limit=15.0 2023-11-18 10:01:18,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=174433.33333333334, ans=0.125 2023-11-18 10:01:27,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=174500.0, ans=0.125 2023-11-18 10:01:30,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=174500.0, ans=10.0 2023-11-18 10:01:32,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=174566.66666666666, ans=0.125 2023-11-18 10:01:44,282 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2150, loss[loss=0.1218, simple_loss=0.1347, pruned_loss=0.04288, audio_tagging_loss=0.01156, over 15074.00 frames. ], tot_loss[loss=0.1174, simple_loss=0.1264, pruned_loss=0.04188, audio_tagging_loss=0.01232, over 3040176.31 frames. ], batch size: 56, lr: 2.20e-02, grad_scale: 128.0 2023-11-18 10:01:45,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.71 vs. limit=15.0 2023-11-18 10:02:08,229 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.984e+01 9.849e+01 1.118e+02 1.250e+02 1.648e+02, threshold=2.236e+02, percent-clipped=0.0 2023-11-18 10:02:13,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=174766.66666666666, ans=0.2 2023-11-18 10:02:18,835 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:02:32,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.79 vs. limit=15.0 2023-11-18 10:02:41,109 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2200, loss[loss=0.1302, simple_loss=0.1393, pruned_loss=0.04652, audio_tagging_loss=0.01399, over 15219.00 frames. ], tot_loss[loss=0.1186, simple_loss=0.1278, pruned_loss=0.04242, audio_tagging_loss=0.0123, over 3043660.22 frames. ], batch size: 56, lr: 2.20e-02, grad_scale: 128.0 2023-11-18 10:02:44,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=174966.66666666666, ans=0.07 2023-11-18 10:02:49,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=174966.66666666666, ans=0.125 2023-11-18 10:03:16,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=175166.66666666666, ans=0.1 2023-11-18 10:03:22,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=175166.66666666666, ans=0.125 2023-11-18 10:03:22,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=175166.66666666666, ans=0.0 2023-11-18 10:03:27,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=175233.33333333334, ans=0.2 2023-11-18 10:03:36,346 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2250, loss[loss=0.118, simple_loss=0.1341, pruned_loss=0.04026, audio_tagging_loss=0.01065, over 15059.00 frames. ], tot_loss[loss=0.1197, simple_loss=0.1295, pruned_loss=0.04277, audio_tagging_loss=0.01223, over 3046238.37 frames. ], batch size: 56, lr: 2.20e-02, grad_scale: 64.0 2023-11-18 10:03:51,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.52 vs. limit=15.0 2023-11-18 10:04:00,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=175433.33333333334, ans=0.0 2023-11-18 10:04:01,347 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 9.758e+01 1.067e+02 1.178e+02 1.415e+02, threshold=2.133e+02, percent-clipped=0.0 2023-11-18 10:04:32,099 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2300, loss[loss=0.1207, simple_loss=0.1273, pruned_loss=0.04576, audio_tagging_loss=0.0113, over 15662.00 frames. ], tot_loss[loss=0.1194, simple_loss=0.1288, pruned_loss=0.04266, audio_tagging_loss=0.01233, over 3042799.29 frames. ], batch size: 60, lr: 2.19e-02, grad_scale: 64.0 2023-11-18 10:04:34,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=175633.33333333334, ans=0.125 2023-11-18 10:04:35,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=175633.33333333334, ans=0.1 2023-11-18 10:04:43,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=175700.0, ans=0.125 2023-11-18 10:05:15,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=175900.0, ans=0.1 2023-11-18 10:05:17,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=175900.0, ans=0.1 2023-11-18 10:05:22,649 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:05:24,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.99 vs. limit=22.5 2023-11-18 10:05:27,915 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2350, loss[loss=0.1145, simple_loss=0.1162, pruned_loss=0.04209, audio_tagging_loss=0.01433, over 14084.00 frames. ], tot_loss[loss=0.1208, simple_loss=0.1301, pruned_loss=0.0433, audio_tagging_loss=0.01247, over 3041153.08 frames. ], batch size: 54, lr: 2.19e-02, grad_scale: 64.0 2023-11-18 10:05:32,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=175966.66666666666, ans=0.125 2023-11-18 10:05:37,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=175966.66666666666, ans=0.125 2023-11-18 10:05:51,954 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.821e+01 9.805e+01 1.113e+02 1.261e+02 1.707e+02, threshold=2.226e+02, percent-clipped=0.0 2023-11-18 10:06:12,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=176233.33333333334, ans=0.0 2023-11-18 10:06:23,641 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2400, loss[loss=0.1377, simple_loss=0.1416, pruned_loss=0.05293, audio_tagging_loss=0.01395, over 15557.00 frames. ], tot_loss[loss=0.1206, simple_loss=0.1297, pruned_loss=0.04319, audio_tagging_loss=0.01259, over 3040104.71 frames. ], batch size: 60, lr: 2.19e-02, grad_scale: 32.0 2023-11-18 10:06:23,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=176300.0, ans=0.0 2023-11-18 10:06:24,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=176300.0, ans=0.2 2023-11-18 10:06:50,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.73 vs. limit=6.0 2023-11-18 10:06:53,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=176433.33333333334, ans=0.125 2023-11-18 10:07:19,258 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2450, loss[loss=0.1129, simple_loss=0.122, pruned_loss=0.03997, audio_tagging_loss=0.0119, over 15143.00 frames. ], tot_loss[loss=0.1212, simple_loss=0.1306, pruned_loss=0.04326, audio_tagging_loss=0.01265, over 3043093.10 frames. ], batch size: 58, lr: 2.19e-02, grad_scale: 32.0 2023-11-18 10:07:45,105 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 1.013e+02 1.126e+02 1.298e+02 2.274e+02, threshold=2.253e+02, percent-clipped=1.0 2023-11-18 10:07:54,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=176833.33333333334, ans=0.125 2023-11-18 10:08:03,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.62 vs. limit=22.5 2023-11-18 10:08:15,425 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2500, loss[loss=0.1117, simple_loss=0.1168, pruned_loss=0.04229, audio_tagging_loss=0.01096, over 14071.00 frames. ], tot_loss[loss=0.1212, simple_loss=0.1304, pruned_loss=0.04329, audio_tagging_loss=0.01269, over 3040704.13 frames. ], batch size: 53, lr: 2.19e-02, grad_scale: 32.0 2023-11-18 10:08:16,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=176966.66666666666, ans=0.125 2023-11-18 10:08:28,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=177033.33333333334, ans=0.125 2023-11-18 10:08:54,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2023-11-18 10:09:06,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=177233.33333333334, ans=0.95 2023-11-18 10:09:06,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=177233.33333333334, ans=0.5 2023-11-18 10:09:10,976 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2550, loss[loss=0.1378, simple_loss=0.1461, pruned_loss=0.05391, audio_tagging_loss=0.01081, over 14938.00 frames. ], tot_loss[loss=0.1211, simple_loss=0.1305, pruned_loss=0.04339, audio_tagging_loss=0.01247, over 3042070.85 frames. ], batch size: 56, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:09:18,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=177300.0, ans=0.125 2023-11-18 10:09:36,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=177433.33333333334, ans=0.125 2023-11-18 10:09:37,077 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.896e+01 9.703e+01 1.094e+02 1.267e+02 1.679e+02, threshold=2.187e+02, percent-clipped=0.0 2023-11-18 10:09:43,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=177500.0, ans=0.125 2023-11-18 10:09:43,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=177500.0, ans=0.1 2023-11-18 10:09:44,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=177500.0, ans=0.125 2023-11-18 10:09:49,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=177500.0, ans=0.0 2023-11-18 10:10:03,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2023-11-18 10:10:06,248 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2600, loss[loss=0.09593, simple_loss=0.09754, pruned_loss=0.03223, audio_tagging_loss=0.01494, over 14225.00 frames. ], tot_loss[loss=0.1203, simple_loss=0.1295, pruned_loss=0.04311, audio_tagging_loss=0.0124, over 3036664.28 frames. ], batch size: 55, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:10:09,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=177633.33333333334, ans=0.125 2023-11-18 10:10:36,771 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.560e-03 2023-11-18 10:10:41,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=177833.33333333334, ans=0.125 2023-11-18 10:10:45,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.91 vs. limit=22.5 2023-11-18 10:10:52,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=177900.0, ans=0.125 2023-11-18 10:11:03,107 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2650, loss[loss=0.1028, simple_loss=0.1175, pruned_loss=0.03251, audio_tagging_loss=0.01156, over 14557.00 frames. ], tot_loss[loss=0.1209, simple_loss=0.1303, pruned_loss=0.04358, audio_tagging_loss=0.0122, over 3043064.82 frames. ], batch size: 56, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:11:15,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=178033.33333333334, ans=0.125 2023-11-18 10:11:20,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=178033.33333333334, ans=0.0 2023-11-18 10:11:27,750 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 9.789e+01 1.066e+02 1.192e+02 1.496e+02, threshold=2.133e+02, percent-clipped=0.0 2023-11-18 10:11:30,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=178100.0, ans=0.125 2023-11-18 10:11:44,800 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2023-11-18 10:11:48,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=178233.33333333334, ans=0.125 2023-11-18 10:11:57,882 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2700, loss[loss=0.1211, simple_loss=0.1317, pruned_loss=0.04184, audio_tagging_loss=0.0134, over 15449.00 frames. ], tot_loss[loss=0.1215, simple_loss=0.131, pruned_loss=0.04387, audio_tagging_loss=0.01214, over 3036851.47 frames. ], batch size: 57, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:11:58,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=178300.0, ans=0.1 2023-11-18 10:12:12,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=178366.66666666666, ans=0.0 2023-11-18 10:12:53,267 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2750, loss[loss=0.127, simple_loss=0.1339, pruned_loss=0.04619, audio_tagging_loss=0.01384, over 14400.00 frames. ], tot_loss[loss=0.1199, simple_loss=0.1294, pruned_loss=0.04304, audio_tagging_loss=0.01218, over 3043730.85 frames. ], batch size: 56, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:13:05,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=178700.0, ans=0.125 2023-11-18 10:13:19,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 1.008e+02 1.122e+02 1.241e+02 2.001e+02, threshold=2.244e+02, percent-clipped=0.0 2023-11-18 10:13:41,761 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:13:50,103 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2800, loss[loss=0.1128, simple_loss=0.12, pruned_loss=0.0384, audio_tagging_loss=0.01438, over 15348.00 frames. ], tot_loss[loss=0.1199, simple_loss=0.1292, pruned_loss=0.04306, audio_tagging_loss=0.01219, over 3040398.70 frames. ], batch size: 57, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:13:50,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=178966.66666666666, ans=0.1 2023-11-18 10:14:13,293 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:14:28,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.11 vs. limit=15.0 2023-11-18 10:14:44,619 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2850, loss[loss=0.1149, simple_loss=0.1107, pruned_loss=0.04775, audio_tagging_loss=0.01185, over 15289.00 frames. ], tot_loss[loss=0.1194, simple_loss=0.1287, pruned_loss=0.04286, audio_tagging_loss=0.01216, over 3041594.88 frames. ], batch size: 59, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:14:45,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=12.0 2023-11-18 10:14:52,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=179300.0, ans=0.1 2023-11-18 10:15:03,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=179366.66666666666, ans=0.125 2023-11-18 10:15:10,730 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 9.812e+01 1.069e+02 1.186e+02 1.678e+02, threshold=2.137e+02, percent-clipped=0.0 2023-11-18 10:15:13,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=179433.33333333334, ans=0.5 2023-11-18 10:15:35,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=179566.66666666666, ans=0.125 2023-11-18 10:15:35,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=179566.66666666666, ans=0.1 2023-11-18 10:15:39,681 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2900, loss[loss=0.1109, simple_loss=0.118, pruned_loss=0.03772, audio_tagging_loss=0.01416, over 15361.00 frames. ], tot_loss[loss=0.1199, simple_loss=0.1295, pruned_loss=0.04298, audio_tagging_loss=0.01219, over 3049838.29 frames. ], batch size: 56, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:15:46,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=179633.33333333334, ans=0.125 2023-11-18 10:15:46,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.52 vs. limit=22.5 2023-11-18 10:15:48,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.67 vs. limit=22.5 2023-11-18 10:15:51,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=179700.0, ans=0.0 2023-11-18 10:15:55,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=179700.0, ans=0.035 2023-11-18 10:16:08,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=179766.66666666666, ans=0.1 2023-11-18 10:16:35,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=179966.66666666666, ans=0.1 2023-11-18 10:16:36,724 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 2950, loss[loss=0.08538, simple_loss=0.08606, pruned_loss=0.02879, audio_tagging_loss=0.01356, over 14976.00 frames. ], tot_loss[loss=0.1209, simple_loss=0.1302, pruned_loss=0.04351, audio_tagging_loss=0.01231, over 3047458.39 frames. ], batch size: 59, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:17:01,086 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.805e+01 9.762e+01 1.073e+02 1.254e+02 1.837e+02, threshold=2.146e+02, percent-clipped=0.0 2023-11-18 10:17:32,029 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3000, loss[loss=0.1019, simple_loss=0.1175, pruned_loss=0.03221, audio_tagging_loss=0.01096, over 15103.00 frames. ], tot_loss[loss=0.1213, simple_loss=0.1307, pruned_loss=0.04374, audio_tagging_loss=0.01221, over 3042639.71 frames. ], batch size: 55, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:17:32,030 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 10:17:46,033 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.4669, 6.3677, 6.3430, 6.1691], device='cuda:1') 2023-11-18 10:17:56,473 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.2568, 4.9989, 4.9666, 5.1390], device='cuda:1') 2023-11-18 10:17:57,173 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9647, 5.8962, 5.9293, 5.5753], device='cuda:1') 2023-11-18 10:17:57,388 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1977, 2.4148, 5.2548, 2.3170], device='cuda:1') 2023-11-18 10:18:05,507 INFO [train_asr.py:1147] (1/4) Epoch 3, validation: loss=0.08163, simple_loss=0.06585, pruned_loss=0.01265, audio_tagging_loss=0.03605, over 4681554.00 frames. 2023-11-18 10:18:05,507 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 10:18:07,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=180300.0, ans=0.0 2023-11-18 10:18:25,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=180366.66666666666, ans=0.125 2023-11-18 10:18:31,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.89 vs. limit=22.5 2023-11-18 10:18:33,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=180433.33333333334, ans=0.0 2023-11-18 10:18:48,128 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.56 vs. limit=22.5 2023-11-18 10:19:01,887 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3050, loss[loss=0.1445, simple_loss=0.1528, pruned_loss=0.05655, audio_tagging_loss=0.01151, over 15751.00 frames. ], tot_loss[loss=0.1214, simple_loss=0.1312, pruned_loss=0.04362, audio_tagging_loss=0.01216, over 3039703.62 frames. ], batch size: 56, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:19:11,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=180700.0, ans=0.125 2023-11-18 10:19:16,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=180700.0, ans=0.125 2023-11-18 10:19:26,064 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 9.626e+01 1.059e+02 1.215e+02 1.726e+02, threshold=2.118e+02, percent-clipped=0.0 2023-11-18 10:19:32,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.75 vs. limit=5.0 2023-11-18 10:19:33,995 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:19:35,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=180833.33333333334, ans=0.0 2023-11-18 10:19:46,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=180900.0, ans=0.0 2023-11-18 10:19:49,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.33 vs. limit=22.5 2023-11-18 10:19:53,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=180900.0, ans=0.0 2023-11-18 10:19:56,864 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3100, loss[loss=0.07701, simple_loss=0.07944, pruned_loss=0.02279, audio_tagging_loss=0.0145, over 14970.00 frames. ], tot_loss[loss=0.1217, simple_loss=0.1313, pruned_loss=0.04372, audio_tagging_loss=0.01231, over 3044039.02 frames. ], batch size: 57, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:20:35,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.35 vs. limit=12.0 2023-11-18 10:20:39,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=181166.66666666666, ans=0.125 2023-11-18 10:20:42,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-18 10:20:43,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2023-11-18 10:20:43,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=181233.33333333334, ans=0.125 2023-11-18 10:20:51,956 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3150, loss[loss=0.08935, simple_loss=0.09239, pruned_loss=0.029, audio_tagging_loss=0.01417, over 15405.00 frames. ], tot_loss[loss=0.1222, simple_loss=0.1318, pruned_loss=0.04385, audio_tagging_loss=0.01239, over 3042578.95 frames. ], batch size: 59, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:21:08,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=181366.66666666666, ans=0.1 2023-11-18 10:21:10,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=181366.66666666666, ans=0.0 2023-11-18 10:21:16,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.11 vs. limit=10.0 2023-11-18 10:21:18,119 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.858e+01 9.872e+01 1.154e+02 1.398e+02 2.452e+02, threshold=2.308e+02, percent-clipped=3.0 2023-11-18 10:21:27,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=181500.0, ans=0.125 2023-11-18 10:21:29,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181500.0, ans=0.1 2023-11-18 10:21:31,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=181500.0, ans=0.1 2023-11-18 10:21:32,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=181500.0, ans=0.125 2023-11-18 10:21:48,265 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3200, loss[loss=0.1121, simple_loss=0.1173, pruned_loss=0.03958, audio_tagging_loss=0.01385, over 14246.00 frames. ], tot_loss[loss=0.1216, simple_loss=0.1312, pruned_loss=0.04354, audio_tagging_loss=0.01245, over 3035481.61 frames. ], batch size: 54, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:22:13,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=181766.66666666666, ans=0.125 2023-11-18 10:22:15,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.77 vs. limit=15.0 2023-11-18 10:22:42,989 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3250, loss[loss=0.1085, simple_loss=0.1095, pruned_loss=0.0426, audio_tagging_loss=0.01117, over 14809.00 frames. ], tot_loss[loss=0.12, simple_loss=0.1292, pruned_loss=0.04276, audio_tagging_loss=0.01269, over 3036204.78 frames. ], batch size: 55, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:22:44,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2023-11-18 10:22:46,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=181966.66666666666, ans=0.125 2023-11-18 10:23:03,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=182100.0, ans=0.07 2023-11-18 10:23:05,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.36 vs. limit=10.0 2023-11-18 10:23:05,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=182100.0, ans=10.0 2023-11-18 10:23:07,865 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 9.562e+01 1.039e+02 1.190e+02 1.635e+02, threshold=2.078e+02, percent-clipped=0.0 2023-11-18 10:23:32,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=182233.33333333334, ans=0.0 2023-11-18 10:23:33,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=182233.33333333334, ans=0.0 2023-11-18 10:23:34,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=182233.33333333334, ans=0.1 2023-11-18 10:23:37,518 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3300, loss[loss=0.09175, simple_loss=0.102, pruned_loss=0.02881, audio_tagging_loss=0.01193, over 14427.00 frames. ], tot_loss[loss=0.1194, simple_loss=0.1279, pruned_loss=0.04254, audio_tagging_loss=0.01285, over 3037960.98 frames. ], batch size: 56, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:23:40,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=182300.0, ans=0.0 2023-11-18 10:23:45,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=182300.0, ans=0.125 2023-11-18 10:24:00,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=182433.33333333334, ans=0.1 2023-11-18 10:24:19,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=182500.0, ans=0.0 2023-11-18 10:24:26,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=182566.66666666666, ans=0.0 2023-11-18 10:24:29,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=182566.66666666666, ans=0.125 2023-11-18 10:24:33,224 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3350, loss[loss=0.08776, simple_loss=0.1029, pruned_loss=0.02497, audio_tagging_loss=0.01135, over 14793.00 frames. ], tot_loss[loss=0.1192, simple_loss=0.1282, pruned_loss=0.04251, audio_tagging_loss=0.01262, over 3041461.89 frames. ], batch size: 55, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:24:58,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 9.499e+01 1.070e+02 1.220e+02 2.186e+02, threshold=2.139e+02, percent-clipped=1.0 2023-11-18 10:25:09,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.43 vs. limit=15.0 2023-11-18 10:25:13,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=182833.33333333334, ans=0.0 2023-11-18 10:25:13,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=182833.33333333334, ans=0.125 2023-11-18 10:25:19,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=182900.0, ans=0.1 2023-11-18 10:25:29,560 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3400, loss[loss=0.1247, simple_loss=0.1309, pruned_loss=0.04419, audio_tagging_loss=0.01502, over 15620.00 frames. ], tot_loss[loss=0.1193, simple_loss=0.1284, pruned_loss=0.04249, audio_tagging_loss=0.01257, over 3048771.29 frames. ], batch size: 60, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:25:29,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=182966.66666666666, ans=0.125 2023-11-18 10:25:44,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=183033.33333333334, ans=0.0 2023-11-18 10:25:50,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=183100.0, ans=0.2 2023-11-18 10:25:53,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=183100.0, ans=0.2 2023-11-18 10:26:00,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.41 vs. limit=22.5 2023-11-18 10:26:09,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=183166.66666666666, ans=0.05 2023-11-18 10:26:19,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=183233.33333333334, ans=0.125 2023-11-18 10:26:24,425 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3450, loss[loss=0.09862, simple_loss=0.0943, pruned_loss=0.03224, audio_tagging_loss=0.01922, over 15595.00 frames. ], tot_loss[loss=0.1191, simple_loss=0.1285, pruned_loss=0.04227, audio_tagging_loss=0.01258, over 3049067.56 frames. ], batch size: 57, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:26:35,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=183366.66666666666, ans=0.1 2023-11-18 10:26:48,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=183433.33333333334, ans=0.125 2023-11-18 10:26:50,541 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 9.538e+01 1.062e+02 1.197e+02 2.158e+02, threshold=2.124e+02, percent-clipped=1.0 2023-11-18 10:26:53,979 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:26:56,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.84 vs. limit=15.0 2023-11-18 10:26:57,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=183500.0, ans=0.125 2023-11-18 10:27:10,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=183566.66666666666, ans=0.2 2023-11-18 10:27:20,066 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3500, loss[loss=0.1292, simple_loss=0.1466, pruned_loss=0.04897, audio_tagging_loss=0.006931, over 15720.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.1294, pruned_loss=0.04252, audio_tagging_loss=0.01235, over 3042129.28 frames. ], batch size: 56, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:27:30,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=183700.0, ans=0.0 2023-11-18 10:27:35,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=183700.0, ans=0.1 2023-11-18 10:27:36,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2023-11-18 10:27:48,623 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:27:53,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=183833.33333333334, ans=0.1 2023-11-18 10:28:03,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=183900.0, ans=0.035 2023-11-18 10:28:13,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=183900.0, ans=0.125 2023-11-18 10:28:15,712 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3550, loss[loss=0.1329, simple_loss=0.1499, pruned_loss=0.04606, audio_tagging_loss=0.01192, over 14061.00 frames. ], tot_loss[loss=0.1188, simple_loss=0.1289, pruned_loss=0.04208, audio_tagging_loss=0.01226, over 3045576.95 frames. ], batch size: 54, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:28:24,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=183966.66666666666, ans=22.5 2023-11-18 10:28:37,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=184100.0, ans=0.0 2023-11-18 10:28:41,223 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.814e+01 9.739e+01 1.081e+02 1.236e+02 3.784e+02, threshold=2.163e+02, percent-clipped=1.0 2023-11-18 10:28:51,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=184166.66666666666, ans=0.125 2023-11-18 10:28:52,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=184166.66666666666, ans=0.0 2023-11-18 10:29:07,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=184233.33333333334, ans=0.125 2023-11-18 10:29:11,451 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3600, loss[loss=0.1016, simple_loss=0.1141, pruned_loss=0.03286, audio_tagging_loss=0.01172, over 15190.00 frames. ], tot_loss[loss=0.1181, simple_loss=0.1279, pruned_loss=0.0419, audio_tagging_loss=0.01221, over 3050040.09 frames. ], batch size: 58, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:29:16,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=184300.0, ans=0.0 2023-11-18 10:29:22,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=184366.66666666666, ans=0.125 2023-11-18 10:29:49,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=184500.0, ans=0.04949747468305833 2023-11-18 10:29:52,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=184500.0, ans=10.0 2023-11-18 10:30:00,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=184566.66666666666, ans=0.2 2023-11-18 10:30:06,995 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3650, loss[loss=0.1262, simple_loss=0.1459, pruned_loss=0.04121, audio_tagging_loss=0.01209, over 14874.00 frames. ], tot_loss[loss=0.119, simple_loss=0.129, pruned_loss=0.04229, audio_tagging_loss=0.01218, over 3046213.74 frames. ], batch size: 54, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:30:07,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=184633.33333333334, ans=0.025 2023-11-18 10:30:13,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=184633.33333333334, ans=0.125 2023-11-18 10:30:13,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=184633.33333333334, ans=0.0 2023-11-18 10:30:16,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=184633.33333333334, ans=0.2 2023-11-18 10:30:21,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=184700.0, ans=0.125 2023-11-18 10:30:24,299 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2023-11-18 10:30:26,523 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.74 vs. limit=22.5 2023-11-18 10:30:31,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=184766.66666666666, ans=0.125 2023-11-18 10:30:32,824 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.231e+01 1.003e+02 1.085e+02 1.205e+02 1.999e+02, threshold=2.169e+02, percent-clipped=0.0 2023-11-18 10:30:33,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=184766.66666666666, ans=0.07 2023-11-18 10:31:02,922 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3700, loss[loss=0.09956, simple_loss=0.1038, pruned_loss=0.03479, audio_tagging_loss=0.01287, over 15319.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.1295, pruned_loss=0.04261, audio_tagging_loss=0.01228, over 3045567.85 frames. ], batch size: 58, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:31:42,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=185166.66666666666, ans=0.125 2023-11-18 10:31:46,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=185233.33333333334, ans=0.125 2023-11-18 10:31:58,477 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3750, loss[loss=0.08678, simple_loss=0.0858, pruned_loss=0.02905, audio_tagging_loss=0.01483, over 16170.00 frames. ], tot_loss[loss=0.1202, simple_loss=0.1301, pruned_loss=0.04298, audio_tagging_loss=0.01219, over 3053098.04 frames. ], batch size: 63, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:32:05,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=185300.0, ans=0.125 2023-11-18 10:32:09,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=185366.66666666666, ans=0.125 2023-11-18 10:32:23,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=185433.33333333334, ans=0.1 2023-11-18 10:32:24,708 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.955e+01 9.670e+01 1.084e+02 1.200e+02 2.427e+02, threshold=2.168e+02, percent-clipped=1.0 2023-11-18 10:32:37,371 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:32:50,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=185566.66666666666, ans=0.125 2023-11-18 10:32:54,169 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3800, loss[loss=0.1965, simple_loss=0.2173, pruned_loss=0.07525, audio_tagging_loss=0.01258, over 15357.00 frames. ], tot_loss[loss=0.1227, simple_loss=0.1331, pruned_loss=0.04408, audio_tagging_loss=0.01209, over 3053909.88 frames. ], batch size: 56, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:33:02,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=22.5 2023-11-18 10:33:07,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=185700.0, ans=0.125 2023-11-18 10:33:20,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=185766.66666666666, ans=0.1 2023-11-18 10:33:21,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2023-11-18 10:33:36,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=185833.33333333334, ans=0.125 2023-11-18 10:33:36,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2023-11-18 10:33:42,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=185900.0, ans=0.125 2023-11-18 10:33:48,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=185900.0, ans=0.0 2023-11-18 10:33:50,171 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3850, loss[loss=0.1542, simple_loss=0.1599, pruned_loss=0.06288, audio_tagging_loss=0.01142, over 14671.00 frames. ], tot_loss[loss=0.1231, simple_loss=0.1333, pruned_loss=0.04426, audio_tagging_loss=0.01218, over 3054845.91 frames. ], batch size: 53, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:33:57,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=185966.66666666666, ans=0.2 2023-11-18 10:34:15,529 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 1.009e+02 1.117e+02 1.269e+02 1.869e+02, threshold=2.233e+02, percent-clipped=0.0 2023-11-18 10:34:21,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=186100.0, ans=0.125 2023-11-18 10:34:45,165 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3900, loss[loss=0.09542, simple_loss=0.1087, pruned_loss=0.0287, audio_tagging_loss=0.01238, over 14387.00 frames. ], tot_loss[loss=0.1227, simple_loss=0.1327, pruned_loss=0.04411, audio_tagging_loss=0.01224, over 3041957.33 frames. ], batch size: 54, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:34:45,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=186300.0, ans=0.1 2023-11-18 10:35:08,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2023-11-18 10:35:23,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.03 vs. limit=15.0 2023-11-18 10:35:25,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.68 vs. limit=10.0 2023-11-18 10:35:40,985 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 3950, loss[loss=0.08423, simple_loss=0.07752, pruned_loss=0.0301, audio_tagging_loss=0.01537, over 15374.00 frames. ], tot_loss[loss=0.1214, simple_loss=0.1312, pruned_loss=0.04341, audio_tagging_loss=0.01242, over 3040555.25 frames. ], batch size: 63, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:35:55,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=186700.0, ans=0.0 2023-11-18 10:35:57,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=186700.0, ans=0.0 2023-11-18 10:35:58,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=186700.0, ans=0.125 2023-11-18 10:36:00,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=186700.0, ans=0.125 2023-11-18 10:36:08,764 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 9.703e+01 1.079e+02 1.244e+02 1.846e+02, threshold=2.158e+02, percent-clipped=0.0 2023-11-18 10:36:11,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=186766.66666666666, ans=0.125 2023-11-18 10:36:23,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=186833.33333333334, ans=0.125 2023-11-18 10:36:28,040 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=22.5 2023-11-18 10:36:39,212 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4000, loss[loss=0.09953, simple_loss=0.1027, pruned_loss=0.03347, audio_tagging_loss=0.01469, over 14631.00 frames. ], tot_loss[loss=0.1216, simple_loss=0.1315, pruned_loss=0.04338, audio_tagging_loss=0.01247, over 3044038.06 frames. ], batch size: 57, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:36:50,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=187033.33333333334, ans=10.0 2023-11-18 10:37:17,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=187166.66666666666, ans=0.0 2023-11-18 10:37:19,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=187166.66666666666, ans=0.0 2023-11-18 10:37:22,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.83 vs. limit=10.0 2023-11-18 10:37:34,004 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4050, loss[loss=0.1527, simple_loss=0.1534, pruned_loss=0.06449, audio_tagging_loss=0.0115, over 15159.00 frames. ], tot_loss[loss=0.1222, simple_loss=0.1322, pruned_loss=0.04367, audio_tagging_loss=0.01245, over 3045034.05 frames. ], batch size: 56, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:37:37,182 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:37:51,188 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:38:00,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.024e+01 9.492e+01 1.077e+02 1.184e+02 1.546e+02, threshold=2.154e+02, percent-clipped=0.0 2023-11-18 10:38:06,940 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.931e+00 2023-11-18 10:38:11,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=187500.0, ans=0.1 2023-11-18 10:38:30,032 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4100, loss[loss=0.1081, simple_loss=0.1078, pruned_loss=0.04002, audio_tagging_loss=0.01419, over 14647.00 frames. ], tot_loss[loss=0.1207, simple_loss=0.1306, pruned_loss=0.04299, audio_tagging_loss=0.01241, over 3043518.62 frames. ], batch size: 55, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:38:30,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.15 vs. limit=10.0 2023-11-18 10:38:44,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=187700.0, ans=0.0 2023-11-18 10:39:26,027 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4150, loss[loss=0.08896, simple_loss=0.09253, pruned_loss=0.02828, audio_tagging_loss=0.01441, over 15588.00 frames. ], tot_loss[loss=0.1199, simple_loss=0.13, pruned_loss=0.04265, audio_tagging_loss=0.01226, over 3041491.19 frames. ], batch size: 59, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:39:31,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=187966.66666666666, ans=0.125 2023-11-18 10:39:50,381 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.052e+01 9.719e+01 1.039e+02 1.185e+02 1.497e+02, threshold=2.077e+02, percent-clipped=0.0 2023-11-18 10:39:56,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=188100.0, ans=0.125 2023-11-18 10:39:57,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=15.0 2023-11-18 10:39:59,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=188166.66666666666, ans=0.1 2023-11-18 10:40:03,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=188166.66666666666, ans=0.0 2023-11-18 10:40:07,374 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:40:21,052 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4200, loss[loss=0.08947, simple_loss=0.08897, pruned_loss=0.02829, audio_tagging_loss=0.0167, over 15660.00 frames. ], tot_loss[loss=0.1191, simple_loss=0.1291, pruned_loss=0.04243, audio_tagging_loss=0.01211, over 3042134.55 frames. ], batch size: 60, lr: 2.12e-02, grad_scale: 32.0 2023-11-18 10:40:24,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=188300.0, ans=0.1 2023-11-18 10:40:36,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=188366.66666666666, ans=0.0 2023-11-18 10:40:54,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=15.0 2023-11-18 10:40:58,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=188500.0, ans=0.2 2023-11-18 10:41:03,793 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:41:05,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=188566.66666666666, ans=0.0 2023-11-18 10:41:15,115 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4250, loss[loss=0.132, simple_loss=0.1493, pruned_loss=0.04499, audio_tagging_loss=0.01231, over 15019.00 frames. ], tot_loss[loss=0.1198, simple_loss=0.1302, pruned_loss=0.04275, audio_tagging_loss=0.01199, over 3039828.16 frames. ], batch size: 55, lr: 2.12e-02, grad_scale: 32.0 2023-11-18 10:41:16,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=188633.33333333334, ans=0.0 2023-11-18 10:41:24,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=188633.33333333334, ans=0.125 2023-11-18 10:41:38,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=188766.66666666666, ans=0.125 2023-11-18 10:41:41,655 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.045e+01 9.813e+01 1.062e+02 1.234e+02 2.396e+02, threshold=2.125e+02, percent-clipped=1.0 2023-11-18 10:42:12,236 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4300, loss[loss=0.13, simple_loss=0.1432, pruned_loss=0.04938, audio_tagging_loss=0.008976, over 16058.00 frames. ], tot_loss[loss=0.1204, simple_loss=0.131, pruned_loss=0.04315, audio_tagging_loss=0.01176, over 3052137.31 frames. ], batch size: 57, lr: 2.12e-02, grad_scale: 32.0 2023-11-18 10:42:15,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=188966.66666666666, ans=0.125 2023-11-18 10:42:20,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=188966.66666666666, ans=0.2 2023-11-18 10:42:27,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=189033.33333333334, ans=0.0 2023-11-18 10:42:29,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=189033.33333333334, ans=0.125 2023-11-18 10:43:04,228 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:43:07,146 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4350, loss[loss=0.1408, simple_loss=0.1638, pruned_loss=0.0504, audio_tagging_loss=0.008512, over 15279.00 frames. ], tot_loss[loss=0.1207, simple_loss=0.1315, pruned_loss=0.04319, audio_tagging_loss=0.01182, over 3051417.89 frames. ], batch size: 56, lr: 2.12e-02, grad_scale: 32.0 2023-11-18 10:43:15,987 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:43:22,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=189366.66666666666, ans=0.0 2023-11-18 10:43:26,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=189366.66666666666, ans=0.125 2023-11-18 10:43:33,110 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 9.769e+01 1.106e+02 1.188e+02 1.814e+02, threshold=2.212e+02, percent-clipped=0.0 2023-11-18 10:43:40,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=189500.0, ans=0.125 2023-11-18 10:43:57,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=189566.66666666666, ans=0.125 2023-11-18 10:43:58,364 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-18 10:44:01,989 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4400, loss[loss=0.1108, simple_loss=0.127, pruned_loss=0.03739, audio_tagging_loss=0.009879, over 15066.00 frames. ], tot_loss[loss=0.1208, simple_loss=0.1313, pruned_loss=0.04329, audio_tagging_loss=0.01183, over 3046400.87 frames. ], batch size: 55, lr: 2.12e-02, grad_scale: 64.0 2023-11-18 10:44:16,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=189700.0, ans=0.125 2023-11-18 10:44:24,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=189766.66666666666, ans=0.125 2023-11-18 10:44:29,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2023-11-18 10:44:58,516 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4450, loss[loss=0.1171, simple_loss=0.1238, pruned_loss=0.04471, audio_tagging_loss=0.01048, over 14906.00 frames. ], tot_loss[loss=0.1216, simple_loss=0.1322, pruned_loss=0.04377, audio_tagging_loss=0.01179, over 3053868.45 frames. ], batch size: 56, lr: 2.12e-02, grad_scale: 64.0 2023-11-18 10:45:03,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=15.0 2023-11-18 10:45:05,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=189966.66666666666, ans=0.125 2023-11-18 10:45:14,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=190033.33333333334, ans=0.125 2023-11-18 10:45:19,338 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.10 vs. limit=22.5 2023-11-18 10:45:21,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=190100.0, ans=0.1 2023-11-18 10:45:24,127 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 9.777e+01 1.062e+02 1.165e+02 1.734e+02, threshold=2.124e+02, percent-clipped=0.0 2023-11-18 10:45:27,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.64 vs. limit=6.0 2023-11-18 10:45:33,368 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.739e-03 2023-11-18 10:45:38,094 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:45:46,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=190233.33333333334, ans=0.125 2023-11-18 10:45:49,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=190233.33333333334, ans=0.1 2023-11-18 10:45:49,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=190233.33333333334, ans=0.0 2023-11-18 10:45:49,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=190233.33333333334, ans=0.1 2023-11-18 10:45:51,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=190233.33333333334, ans=0.125 2023-11-18 10:45:53,623 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4500, loss[loss=0.1417, simple_loss=0.1601, pruned_loss=0.04867, audio_tagging_loss=0.01297, over 16637.00 frames. ], tot_loss[loss=0.1208, simple_loss=0.1316, pruned_loss=0.0432, audio_tagging_loss=0.01178, over 3055857.78 frames. ], batch size: 61, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:46:08,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=190366.66666666666, ans=0.07 2023-11-18 10:46:28,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=190500.0, ans=0.0 2023-11-18 10:46:42,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=190566.66666666666, ans=0.0 2023-11-18 10:46:48,215 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4550, loss[loss=0.08041, simple_loss=0.07702, pruned_loss=0.02001, audio_tagging_loss=0.02189, over 15229.00 frames. ], tot_loss[loss=0.1202, simple_loss=0.1307, pruned_loss=0.04287, audio_tagging_loss=0.01197, over 3053360.06 frames. ], batch size: 59, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:47:15,824 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 9.411e+01 1.047e+02 1.182e+02 1.787e+02, threshold=2.094e+02, percent-clipped=0.0 2023-11-18 10:47:26,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=190833.33333333334, ans=10.0 2023-11-18 10:47:30,620 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:47:44,407 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4600, loss[loss=0.09883, simple_loss=0.1026, pruned_loss=0.0353, audio_tagging_loss=0.01221, over 13985.00 frames. ], tot_loss[loss=0.1198, simple_loss=0.1301, pruned_loss=0.04267, audio_tagging_loss=0.01208, over 3058102.48 frames. ], batch size: 55, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:47:44,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.50 vs. limit=22.5 2023-11-18 10:47:51,287 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.49 vs. limit=15.0 2023-11-18 10:47:56,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=191033.33333333334, ans=0.0 2023-11-18 10:47:57,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=191033.33333333334, ans=0.125 2023-11-18 10:48:05,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=191100.0, ans=0.1 2023-11-18 10:48:16,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=191166.66666666666, ans=0.125 2023-11-18 10:48:20,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=191166.66666666666, ans=10.0 2023-11-18 10:48:32,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=191233.33333333334, ans=0.125 2023-11-18 10:48:34,504 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.73 vs. limit=22.5 2023-11-18 10:48:40,140 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4650, loss[loss=0.1236, simple_loss=0.1316, pruned_loss=0.04714, audio_tagging_loss=0.01062, over 15530.00 frames. ], tot_loss[loss=0.1187, simple_loss=0.1286, pruned_loss=0.04202, audio_tagging_loss=0.01236, over 3060456.09 frames. ], batch size: 59, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:48:55,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2023-11-18 10:49:06,099 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 9.958e+01 1.111e+02 1.228e+02 2.300e+02, threshold=2.222e+02, percent-clipped=1.0 2023-11-18 10:49:13,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=191500.0, ans=0.0 2023-11-18 10:49:34,890 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4700, loss[loss=0.09273, simple_loss=0.09821, pruned_loss=0.02836, audio_tagging_loss=0.01527, over 15664.00 frames. ], tot_loss[loss=0.1194, simple_loss=0.1295, pruned_loss=0.0422, audio_tagging_loss=0.01241, over 3059563.93 frames. ], batch size: 61, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:49:44,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=191633.33333333334, ans=0.0 2023-11-18 10:49:45,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.84 vs. limit=22.5 2023-11-18 10:50:03,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=191766.66666666666, ans=0.0 2023-11-18 10:50:03,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=191766.66666666666, ans=0.2 2023-11-18 10:50:15,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=191833.33333333334, ans=0.125 2023-11-18 10:50:22,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=191900.0, ans=0.1 2023-11-18 10:50:23,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2023-11-18 10:50:30,213 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4750, loss[loss=0.1332, simple_loss=0.1536, pruned_loss=0.04726, audio_tagging_loss=0.009145, over 15723.00 frames. ], tot_loss[loss=0.1192, simple_loss=0.1293, pruned_loss=0.04219, audio_tagging_loss=0.01234, over 3055860.27 frames. ], batch size: 56, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:50:40,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=191966.66666666666, ans=0.125 2023-11-18 10:50:41,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.08 vs. limit=15.0 2023-11-18 10:50:49,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=192033.33333333334, ans=15.0 2023-11-18 10:50:57,122 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 9.880e+01 1.110e+02 1.323e+02 1.950e+02, threshold=2.220e+02, percent-clipped=0.0 2023-11-18 10:51:04,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=192166.66666666666, ans=0.0 2023-11-18 10:51:26,446 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4800, loss[loss=0.09425, simple_loss=0.09195, pruned_loss=0.03114, audio_tagging_loss=0.01713, over 15429.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1283, pruned_loss=0.04183, audio_tagging_loss=0.01243, over 3054639.06 frames. ], batch size: 62, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:51:27,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.22 vs. limit=22.5 2023-11-18 10:51:29,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=192300.0, ans=0.125 2023-11-18 10:51:38,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=192366.66666666666, ans=0.0 2023-11-18 10:51:38,574 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-11-18 10:51:57,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=192433.33333333334, ans=0.2 2023-11-18 10:52:15,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=21.19 vs. limit=15.0 2023-11-18 10:52:17,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=192566.66666666666, ans=0.125 2023-11-18 10:52:20,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=192633.33333333334, ans=0.025 2023-11-18 10:52:20,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=192633.33333333334, ans=0.125 2023-11-18 10:52:21,043 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4850, loss[loss=0.1306, simple_loss=0.1338, pruned_loss=0.05116, audio_tagging_loss=0.01256, over 14378.00 frames. ], tot_loss[loss=0.1192, simple_loss=0.129, pruned_loss=0.04219, audio_tagging_loss=0.0125, over 3046364.72 frames. ], batch size: 56, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:52:21,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=192633.33333333334, ans=0.125 2023-11-18 10:52:21,769 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.58 vs. limit=22.5 2023-11-18 10:52:23,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=192633.33333333334, ans=0.0 2023-11-18 10:52:23,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.00 vs. limit=22.5 2023-11-18 10:52:42,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=192766.66666666666, ans=0.125 2023-11-18 10:52:47,762 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.311e+01 9.557e+01 1.060e+02 1.196e+02 2.281e+02, threshold=2.120e+02, percent-clipped=1.0 2023-11-18 10:53:15,991 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4900, loss[loss=0.1485, simple_loss=0.1585, pruned_loss=0.05896, audio_tagging_loss=0.01028, over 15042.00 frames. ], tot_loss[loss=0.1187, simple_loss=0.1287, pruned_loss=0.042, audio_tagging_loss=0.01237, over 3050073.49 frames. ], batch size: 58, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:53:16,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=192966.66666666666, ans=0.2 2023-11-18 10:53:19,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2023-11-18 10:53:39,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=193100.0, ans=0.1 2023-11-18 10:53:42,760 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:53:42,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=193100.0, ans=0.125 2023-11-18 10:53:43,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=193100.0, ans=0.125 2023-11-18 10:53:49,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=193166.66666666666, ans=0.0 2023-11-18 10:53:54,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=193166.66666666666, ans=0.0 2023-11-18 10:53:58,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=193166.66666666666, ans=10.0 2023-11-18 10:54:11,445 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 4950, loss[loss=0.096, simple_loss=0.1024, pruned_loss=0.03398, audio_tagging_loss=0.01082, over 14385.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1286, pruned_loss=0.04187, audio_tagging_loss=0.01224, over 3051415.42 frames. ], batch size: 56, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:54:12,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=193300.0, ans=0.125 2023-11-18 10:54:35,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=193433.33333333334, ans=0.125 2023-11-18 10:54:37,971 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 9.494e+01 1.131e+02 1.249e+02 1.755e+02, threshold=2.261e+02, percent-clipped=0.0 2023-11-18 10:54:50,651 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.237e-01 2023-11-18 10:54:52,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2023-11-18 10:54:55,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2023-11-18 10:54:59,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=193566.66666666666, ans=0.125 2023-11-18 10:55:05,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.07 vs. limit=22.5 2023-11-18 10:55:06,963 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5000, loss[loss=0.1279, simple_loss=0.1441, pruned_loss=0.04158, audio_tagging_loss=0.01433, over 15816.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1287, pruned_loss=0.04194, audio_tagging_loss=0.0121, over 3054686.57 frames. ], batch size: 57, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:55:16,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=193700.0, ans=0.125 2023-11-18 10:55:19,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=193700.0, ans=0.0 2023-11-18 10:56:02,084 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5050, loss[loss=0.09634, simple_loss=0.1022, pruned_loss=0.03146, audio_tagging_loss=0.01377, over 14944.00 frames. ], tot_loss[loss=0.1174, simple_loss=0.1277, pruned_loss=0.04159, audio_tagging_loss=0.01198, over 3051582.16 frames. ], batch size: 57, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:56:08,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=193966.66666666666, ans=0.125 2023-11-18 10:56:12,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=194033.33333333334, ans=0.0 2023-11-18 10:56:21,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=194033.33333333334, ans=0.125 2023-11-18 10:56:28,903 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 1.006e+02 1.111e+02 1.230e+02 2.145e+02, threshold=2.223e+02, percent-clipped=0.0 2023-11-18 10:56:46,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=194233.33333333334, ans=0.1 2023-11-18 10:56:57,806 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5100, loss[loss=0.08913, simple_loss=0.09287, pruned_loss=0.02728, audio_tagging_loss=0.01542, over 15519.00 frames. ], tot_loss[loss=0.1173, simple_loss=0.1276, pruned_loss=0.04144, audio_tagging_loss=0.01202, over 3046484.61 frames. ], batch size: 60, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:56:57,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=194300.0, ans=0.125 2023-11-18 10:57:15,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=194366.66666666666, ans=0.0 2023-11-18 10:57:18,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.24 vs. limit=22.5 2023-11-18 10:57:19,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=194433.33333333334, ans=0.0 2023-11-18 10:57:41,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=194566.66666666666, ans=0.04949747468305833 2023-11-18 10:57:42,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=194566.66666666666, ans=0.1 2023-11-18 10:57:52,348 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5150, loss[loss=0.1194, simple_loss=0.1297, pruned_loss=0.04382, audio_tagging_loss=0.01077, over 15001.00 frames. ], tot_loss[loss=0.1166, simple_loss=0.1268, pruned_loss=0.04102, audio_tagging_loss=0.01211, over 3045639.43 frames. ], batch size: 57, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:58:20,059 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.587e+01 9.533e+01 1.047e+02 1.145e+02 1.744e+02, threshold=2.094e+02, percent-clipped=0.0 2023-11-18 10:58:48,374 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5200, loss[loss=0.1099, simple_loss=0.1199, pruned_loss=0.03762, audio_tagging_loss=0.01229, over 16457.00 frames. ], tot_loss[loss=0.1176, simple_loss=0.1283, pruned_loss=0.0415, audio_tagging_loss=0.01194, over 3045949.45 frames. ], batch size: 63, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:59:10,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2023-11-18 10:59:15,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=195100.0, ans=0.125 2023-11-18 10:59:28,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=195166.66666666666, ans=0.125 2023-11-18 10:59:31,615 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:59:44,056 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5250, loss[loss=0.1395, simple_loss=0.1599, pruned_loss=0.04988, audio_tagging_loss=0.009658, over 14792.00 frames. ], tot_loss[loss=0.118, simple_loss=0.1289, pruned_loss=0.0417, audio_tagging_loss=0.0119, over 3046232.44 frames. ], batch size: 53, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 11:00:09,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=195433.33333333334, ans=0.125 2023-11-18 11:00:09,747 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.048e+01 9.706e+01 1.086e+02 1.165e+02 1.723e+02, threshold=2.171e+02, percent-clipped=0.0 2023-11-18 11:00:13,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=195433.33333333334, ans=0.125 2023-11-18 11:00:16,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=195500.0, ans=0.125 2023-11-18 11:00:25,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=195500.0, ans=0.0 2023-11-18 11:00:29,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=195566.66666666666, ans=0.125 2023-11-18 11:00:38,722 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5300, loss[loss=0.1146, simple_loss=0.1286, pruned_loss=0.03833, audio_tagging_loss=0.01196, over 15649.00 frames. ], tot_loss[loss=0.1187, simple_loss=0.1296, pruned_loss=0.04203, audio_tagging_loss=0.01192, over 3046151.12 frames. ], batch size: 59, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 11:00:38,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=195633.33333333334, ans=0.0 2023-11-18 11:00:49,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=195700.0, ans=0.2 2023-11-18 11:00:51,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=195700.0, ans=0.125 2023-11-18 11:00:58,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=195700.0, ans=0.02 2023-11-18 11:01:07,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.54 vs. limit=15.0 2023-11-18 11:01:24,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2023-11-18 11:01:26,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=195900.0, ans=0.05 2023-11-18 11:01:27,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=195900.0, ans=0.1 2023-11-18 11:01:32,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.55 vs. limit=22.5 2023-11-18 11:01:33,816 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5350, loss[loss=0.09513, simple_loss=0.1002, pruned_loss=0.02991, audio_tagging_loss=0.01511, over 15531.00 frames. ], tot_loss[loss=0.119, simple_loss=0.1298, pruned_loss=0.04212, audio_tagging_loss=0.01195, over 3045719.44 frames. ], batch size: 58, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:01:36,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=195966.66666666666, ans=0.0 2023-11-18 11:01:41,283 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.74 vs. limit=15.0 2023-11-18 11:02:00,850 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 9.740e+01 1.103e+02 1.236e+02 1.942e+02, threshold=2.206e+02, percent-clipped=0.0 2023-11-18 11:02:08,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=196166.66666666666, ans=0.125 2023-11-18 11:02:10,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=196166.66666666666, ans=0.0 2023-11-18 11:02:11,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=196166.66666666666, ans=0.2 2023-11-18 11:02:15,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=196166.66666666666, ans=0.5 2023-11-18 11:02:30,251 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5400, loss[loss=0.1342, simple_loss=0.1465, pruned_loss=0.0496, audio_tagging_loss=0.01134, over 15015.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.129, pruned_loss=0.04189, audio_tagging_loss=0.01203, over 3047058.82 frames. ], batch size: 58, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:02:35,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.03 vs. limit=10.0 2023-11-18 11:02:54,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=196433.33333333334, ans=0.125 2023-11-18 11:03:24,824 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5450, loss[loss=0.1218, simple_loss=0.1258, pruned_loss=0.04495, audio_tagging_loss=0.01396, over 15217.00 frames. ], tot_loss[loss=0.1186, simple_loss=0.1288, pruned_loss=0.04201, audio_tagging_loss=0.01217, over 3039208.48 frames. ], batch size: 59, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:03:28,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.77 vs. limit=12.0 2023-11-18 11:03:38,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=196700.0, ans=0.2 2023-11-18 11:03:43,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=196700.0, ans=0.125 2023-11-18 11:03:51,182 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 9.475e+01 1.043e+02 1.232e+02 1.692e+02, threshold=2.085e+02, percent-clipped=0.0 2023-11-18 11:03:55,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=196766.66666666666, ans=0.1 2023-11-18 11:04:19,144 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5500, loss[loss=0.09267, simple_loss=0.09022, pruned_loss=0.03581, audio_tagging_loss=0.01174, over 14153.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1269, pruned_loss=0.04132, audio_tagging_loss=0.01215, over 3038136.30 frames. ], batch size: 56, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:04:27,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2023-11-18 11:04:27,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196966.66666666666, ans=0.1 2023-11-18 11:04:32,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=197033.33333333334, ans=0.125 2023-11-18 11:05:03,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=197233.33333333334, ans=0.2 2023-11-18 11:05:13,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=197233.33333333334, ans=0.125 2023-11-18 11:05:15,098 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5550, loss[loss=0.1113, simple_loss=0.1134, pruned_loss=0.03925, audio_tagging_loss=0.01535, over 15323.00 frames. ], tot_loss[loss=0.1167, simple_loss=0.1268, pruned_loss=0.04115, audio_tagging_loss=0.01216, over 3035887.49 frames. ], batch size: 59, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:05:15,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=197300.0, ans=0.0 2023-11-18 11:05:22,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=197300.0, ans=0.1 2023-11-18 11:05:29,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=197366.66666666666, ans=0.125 2023-11-18 11:05:38,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=197433.33333333334, ans=0.125 2023-11-18 11:05:41,238 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.765e+01 9.422e+01 1.021e+02 1.116e+02 1.524e+02, threshold=2.042e+02, percent-clipped=0.0 2023-11-18 11:05:56,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=197500.0, ans=0.125 2023-11-18 11:06:06,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=197566.66666666666, ans=0.1 2023-11-18 11:06:09,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=197633.33333333334, ans=0.2 2023-11-18 11:06:10,809 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5600, loss[loss=0.1415, simple_loss=0.144, pruned_loss=0.05759, audio_tagging_loss=0.01194, over 14855.00 frames. ], tot_loss[loss=0.1167, simple_loss=0.1266, pruned_loss=0.04102, audio_tagging_loss=0.01235, over 3040446.86 frames. ], batch size: 56, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:06:38,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=197766.66666666666, ans=0.0 2023-11-18 11:06:39,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=197766.66666666666, ans=0.0 2023-11-18 11:06:42,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2023-11-18 11:06:49,902 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:06:54,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=197900.0, ans=0.125 2023-11-18 11:07:05,550 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5650, loss[loss=0.07919, simple_loss=0.08094, pruned_loss=0.02487, audio_tagging_loss=0.01386, over 14004.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1263, pruned_loss=0.04076, audio_tagging_loss=0.01238, over 3043679.71 frames. ], batch size: 54, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:07:08,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.95 vs. limit=10.0 2023-11-18 11:07:19,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=198033.33333333334, ans=0.0 2023-11-18 11:07:23,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=198033.33333333334, ans=0.125 2023-11-18 11:07:25,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=198033.33333333334, ans=0.2 2023-11-18 11:07:32,365 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.586e+01 9.472e+01 1.054e+02 1.179e+02 1.784e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 11:07:44,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198166.66666666666, ans=0.1 2023-11-18 11:07:47,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2023-11-18 11:08:01,316 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5700, loss[loss=0.1431, simple_loss=0.1544, pruned_loss=0.05603, audio_tagging_loss=0.009812, over 14420.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1259, pruned_loss=0.04084, audio_tagging_loss=0.01247, over 3036453.87 frames. ], batch size: 54, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:08:23,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=198433.33333333334, ans=0.125 2023-11-18 11:08:40,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=198500.0, ans=0.0 2023-11-18 11:08:51,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=198566.66666666666, ans=0.125 2023-11-18 11:08:52,433 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:08:56,304 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5750, loss[loss=0.1157, simple_loss=0.1339, pruned_loss=0.04025, audio_tagging_loss=0.008489, over 13457.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.1262, pruned_loss=0.04074, audio_tagging_loss=0.01227, over 3043156.05 frames. ], batch size: 53, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:09:02,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=198633.33333333334, ans=0.0 2023-11-18 11:09:08,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=198700.0, ans=0.1 2023-11-18 11:09:14,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=198700.0, ans=0.125 2023-11-18 11:09:16,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2023-11-18 11:09:18,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.84 vs. limit=10.0 2023-11-18 11:09:22,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.260e+01 9.937e+01 1.145e+02 1.295e+02 2.386e+02, threshold=2.290e+02, percent-clipped=2.0 2023-11-18 11:09:28,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=198766.66666666666, ans=0.1 2023-11-18 11:09:33,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=198833.33333333334, ans=0.125 2023-11-18 11:09:39,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=198900.0, ans=0.125 2023-11-18 11:09:41,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=12.0 2023-11-18 11:09:43,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=198900.0, ans=0.2 2023-11-18 11:09:45,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=198900.0, ans=0.2 2023-11-18 11:09:46,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.39 vs. limit=10.0 2023-11-18 11:09:50,858 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5800, loss[loss=0.1382, simple_loss=0.1562, pruned_loss=0.04901, audio_tagging_loss=0.01112, over 14061.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.1266, pruned_loss=0.04095, audio_tagging_loss=0.0121, over 3042089.28 frames. ], batch size: 53, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:10:19,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=199100.0, ans=0.125 2023-11-18 11:10:32,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=199166.66666666666, ans=0.5 2023-11-18 11:10:35,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=199233.33333333334, ans=0.125 2023-11-18 11:10:45,923 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5850, loss[loss=0.1137, simple_loss=0.1221, pruned_loss=0.03871, audio_tagging_loss=0.01393, over 14383.00 frames. ], tot_loss[loss=0.1174, simple_loss=0.1277, pruned_loss=0.04149, audio_tagging_loss=0.01204, over 3036970.42 frames. ], batch size: 56, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:10:47,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=199300.0, ans=0.125 2023-11-18 11:10:54,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=199300.0, ans=0.125 2023-11-18 11:10:54,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=199300.0, ans=0.0 2023-11-18 11:11:06,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=199366.66666666666, ans=0.0 2023-11-18 11:11:12,884 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 9.939e+01 1.135e+02 1.295e+02 1.954e+02, threshold=2.270e+02, percent-clipped=0.0 2023-11-18 11:11:32,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=199566.66666666666, ans=0.2 2023-11-18 11:11:42,586 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5900, loss[loss=0.08696, simple_loss=0.07838, pruned_loss=0.0312, audio_tagging_loss=0.01658, over 14269.00 frames. ], tot_loss[loss=0.1176, simple_loss=0.1275, pruned_loss=0.04163, audio_tagging_loss=0.01217, over 3038802.07 frames. ], batch size: 56, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:11:46,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=199633.33333333334, ans=0.1 2023-11-18 11:11:59,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=199700.0, ans=0.2 2023-11-18 11:11:59,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=199700.0, ans=0.125 2023-11-18 11:12:09,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=199766.66666666666, ans=0.1 2023-11-18 11:12:13,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=199766.66666666666, ans=0.07 2023-11-18 11:12:25,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=199900.0, ans=0.0 2023-11-18 11:12:26,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=199900.0, ans=0.1 2023-11-18 11:12:32,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=199900.0, ans=0.125 2023-11-18 11:12:33,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.82 vs. limit=10.0 2023-11-18 11:12:36,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=199966.66666666666, ans=0.125 2023-11-18 11:12:37,063 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 5950, loss[loss=0.1681, simple_loss=0.196, pruned_loss=0.06219, audio_tagging_loss=0.007844, over 16748.00 frames. ], tot_loss[loss=0.1167, simple_loss=0.127, pruned_loss=0.04118, audio_tagging_loss=0.01208, over 3041012.88 frames. ], batch size: 59, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:12:44,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=199966.66666666666, ans=0.0 2023-11-18 11:12:50,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=200033.33333333334, ans=0.0 2023-11-18 11:13:04,192 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 9.491e+01 1.040e+02 1.180e+02 1.802e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-18 11:13:30,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=200233.33333333334, ans=0.1 2023-11-18 11:13:32,553 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6000, loss[loss=0.1351, simple_loss=0.1489, pruned_loss=0.04979, audio_tagging_loss=0.01089, over 15209.00 frames. ], tot_loss[loss=0.1176, simple_loss=0.128, pruned_loss=0.04155, audio_tagging_loss=0.01205, over 3042962.89 frames. ], batch size: 55, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:13:32,554 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 11:13:46,900 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.9171, 0.7760, 2.3571, 2.2862, 2.5686, 1.9546, 2.5128, 2.3757], device='cuda:1') 2023-11-18 11:14:05,624 INFO [train_asr.py:1147] (1/4) Epoch 3, validation: loss=0.08054, simple_loss=0.06533, pruned_loss=0.01225, audio_tagging_loss=0.03562, over 4681554.00 frames. 2023-11-18 11:14:05,624 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 11:14:20,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=200366.66666666666, ans=0.0 2023-11-18 11:14:27,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=200433.33333333334, ans=0.05 2023-11-18 11:14:45,158 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:14:52,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.26 vs. limit=12.0 2023-11-18 11:15:00,402 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6050, loss[loss=0.09842, simple_loss=0.1071, pruned_loss=0.03334, audio_tagging_loss=0.01154, over 15604.00 frames. ], tot_loss[loss=0.117, simple_loss=0.1276, pruned_loss=0.04129, audio_tagging_loss=0.01192, over 3048100.32 frames. ], batch size: 57, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:15:15,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=200700.0, ans=0.125 2023-11-18 11:15:15,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.26 vs. limit=15.0 2023-11-18 11:15:24,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=200766.66666666666, ans=0.125 2023-11-18 11:15:27,197 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 9.505e+01 1.054e+02 1.196e+02 1.657e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 11:15:32,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=200766.66666666666, ans=0.0 2023-11-18 11:15:33,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=200833.33333333334, ans=0.125 2023-11-18 11:15:47,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=200900.0, ans=0.125 2023-11-18 11:15:56,273 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6100, loss[loss=0.1097, simple_loss=0.1169, pruned_loss=0.03929, audio_tagging_loss=0.01198, over 14485.00 frames. ], tot_loss[loss=0.117, simple_loss=0.1274, pruned_loss=0.04136, audio_tagging_loss=0.01198, over 3057115.01 frames. ], batch size: 57, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:16:09,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=201033.33333333334, ans=0.0 2023-11-18 11:16:14,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=201033.33333333334, ans=0.125 2023-11-18 11:16:27,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=201100.0, ans=0.2 2023-11-18 11:16:40,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201233.33333333334, ans=0.1 2023-11-18 11:16:44,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=201233.33333333334, ans=0.125 2023-11-18 11:16:51,859 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6150, loss[loss=0.09656, simple_loss=0.1029, pruned_loss=0.03137, audio_tagging_loss=0.01376, over 15580.00 frames. ], tot_loss[loss=0.1183, simple_loss=0.1288, pruned_loss=0.04184, audio_tagging_loss=0.01204, over 3055824.48 frames. ], batch size: 59, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:17:03,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=201366.66666666666, ans=0.2 2023-11-18 11:17:05,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=201366.66666666666, ans=0.0 2023-11-18 11:17:06,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=201366.66666666666, ans=0.125 2023-11-18 11:17:15,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=201433.33333333334, ans=0.125 2023-11-18 11:17:18,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.824e+01 1.100e+02 1.227e+02 1.879e+02, threshold=2.200e+02, percent-clipped=0.0 2023-11-18 11:17:37,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=201566.66666666666, ans=0.1 2023-11-18 11:17:47,687 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6200, loss[loss=0.1474, simple_loss=0.1664, pruned_loss=0.05431, audio_tagging_loss=0.00989, over 15318.00 frames. ], tot_loss[loss=0.1181, simple_loss=0.1282, pruned_loss=0.04191, audio_tagging_loss=0.0121, over 3058111.40 frames. ], batch size: 55, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:17:51,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-11-18 11:17:57,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=201633.33333333334, ans=0.0 2023-11-18 11:17:58,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.81 vs. limit=12.0 2023-11-18 11:18:03,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=201700.0, ans=0.1 2023-11-18 11:18:06,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=201700.0, ans=0.125 2023-11-18 11:18:22,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2023-11-18 11:18:26,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=201833.33333333334, ans=0.2 2023-11-18 11:18:43,323 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6250, loss[loss=0.1439, simple_loss=0.1632, pruned_loss=0.0555, audio_tagging_loss=0.006831, over 15735.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1256, pruned_loss=0.04075, audio_tagging_loss=0.01235, over 3044488.88 frames. ], batch size: 57, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:18:52,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=201966.66666666666, ans=0.0 2023-11-18 11:19:02,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=202033.33333333334, ans=0.0 2023-11-18 11:19:02,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=202033.33333333334, ans=0.125 2023-11-18 11:19:02,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2023-11-18 11:19:03,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202033.33333333334, ans=0.1 2023-11-18 11:19:04,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=202100.0, ans=0.125 2023-11-18 11:19:10,111 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 9.428e+01 1.017e+02 1.154e+02 1.739e+02, threshold=2.034e+02, percent-clipped=0.0 2023-11-18 11:19:29,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=202233.33333333334, ans=0.1 2023-11-18 11:19:34,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.88 vs. limit=22.5 2023-11-18 11:19:37,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=202233.33333333334, ans=0.125 2023-11-18 11:19:39,067 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6300, loss[loss=0.1321, simple_loss=0.1533, pruned_loss=0.04595, audio_tagging_loss=0.009514, over 14749.00 frames. ], tot_loss[loss=0.1165, simple_loss=0.1264, pruned_loss=0.04091, audio_tagging_loss=0.01236, over 3041544.64 frames. ], batch size: 55, lr: 2.05e-02, grad_scale: 32.0 2023-11-18 11:19:51,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=202366.66666666666, ans=0.125 2023-11-18 11:19:53,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202366.66666666666, ans=0.1 2023-11-18 11:20:14,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=202500.0, ans=0.125 2023-11-18 11:20:25,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=202566.66666666666, ans=0.0 2023-11-18 11:20:29,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=202566.66666666666, ans=0.125 2023-11-18 11:20:32,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=202566.66666666666, ans=0.125 2023-11-18 11:20:33,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=202633.33333333334, ans=0.125 2023-11-18 11:20:34,518 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6350, loss[loss=0.1219, simple_loss=0.147, pruned_loss=0.04065, audio_tagging_loss=0.007781, over 15655.00 frames. ], tot_loss[loss=0.1168, simple_loss=0.1269, pruned_loss=0.04102, audio_tagging_loss=0.01235, over 3039785.78 frames. ], batch size: 57, lr: 2.05e-02, grad_scale: 32.0 2023-11-18 11:20:40,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202633.33333333334, ans=0.1 2023-11-18 11:20:41,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=202633.33333333334, ans=0.125 2023-11-18 11:20:42,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=202633.33333333334, ans=0.125 2023-11-18 11:20:49,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=202700.0, ans=0.125 2023-11-18 11:21:01,566 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 9.831e+01 1.084e+02 1.220e+02 1.699e+02, threshold=2.169e+02, percent-clipped=0.0 2023-11-18 11:21:01,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=202766.66666666666, ans=0.05 2023-11-18 11:21:04,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=202766.66666666666, ans=0.0 2023-11-18 11:21:05,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=202766.66666666666, ans=0.125 2023-11-18 11:21:10,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202833.33333333334, ans=0.1 2023-11-18 11:21:11,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=202833.33333333334, ans=0.0 2023-11-18 11:21:12,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=12.0 2023-11-18 11:21:14,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=202833.33333333334, ans=0.2 2023-11-18 11:21:29,927 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6400, loss[loss=0.1264, simple_loss=0.1311, pruned_loss=0.0478, audio_tagging_loss=0.01306, over 14309.00 frames. ], tot_loss[loss=0.1174, simple_loss=0.1275, pruned_loss=0.04115, audio_tagging_loss=0.01246, over 3039687.60 frames. ], batch size: 56, lr: 2.05e-02, grad_scale: 32.0 2023-11-18 11:21:31,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=202966.66666666666, ans=0.0 2023-11-18 11:21:56,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=203100.0, ans=0.125 2023-11-18 11:22:25,996 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6450, loss[loss=0.1502, simple_loss=0.151, pruned_loss=0.06269, audio_tagging_loss=0.01205, over 14295.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.1261, pruned_loss=0.04073, audio_tagging_loss=0.01264, over 3037282.47 frames. ], batch size: 54, lr: 2.05e-02, grad_scale: 32.0 2023-11-18 11:22:31,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=203300.0, ans=10.0 2023-11-18 11:22:41,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=203366.66666666666, ans=0.0 2023-11-18 11:22:52,367 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 9.763e+01 1.082e+02 1.171e+02 1.453e+02, threshold=2.164e+02, percent-clipped=0.0 2023-11-18 11:23:08,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=203500.0, ans=0.95 2023-11-18 11:23:10,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=203566.66666666666, ans=0.125 2023-11-18 11:23:21,038 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6500, loss[loss=0.1526, simple_loss=0.1694, pruned_loss=0.05645, audio_tagging_loss=0.01143, over 16919.00 frames. ], tot_loss[loss=0.1175, simple_loss=0.1277, pruned_loss=0.04118, audio_tagging_loss=0.0125, over 3039371.56 frames. ], batch size: 60, lr: 2.05e-02, grad_scale: 64.0 2023-11-18 11:23:21,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=203633.33333333334, ans=0.125 2023-11-18 11:23:26,459 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:23:32,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=203700.0, ans=0.07 2023-11-18 11:23:32,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=12.0 2023-11-18 11:23:34,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=203700.0, ans=0.125 2023-11-18 11:23:47,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=203766.66666666666, ans=0.125 2023-11-18 11:23:56,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.96 vs. limit=10.0 2023-11-18 11:23:58,206 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:23:58,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=203833.33333333334, ans=0.0 2023-11-18 11:24:17,220 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6550, loss[loss=0.1075, simple_loss=0.1114, pruned_loss=0.03584, audio_tagging_loss=0.01598, over 16595.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1271, pruned_loss=0.04106, audio_tagging_loss=0.01231, over 3040477.84 frames. ], batch size: 65, lr: 2.05e-02, grad_scale: 64.0 2023-11-18 11:24:17,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=203966.66666666666, ans=0.0 2023-11-18 11:24:32,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=204033.33333333334, ans=0.1 2023-11-18 11:24:34,713 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:24:35,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.70 vs. limit=15.0 2023-11-18 11:24:43,887 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.972e+01 9.621e+01 1.067e+02 1.227e+02 1.729e+02, threshold=2.134e+02, percent-clipped=0.0 2023-11-18 11:24:45,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=204100.0, ans=0.125 2023-11-18 11:25:07,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=204233.33333333334, ans=0.125 2023-11-18 11:25:09,775 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:25:11,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=12.0 2023-11-18 11:25:12,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=204300.0, ans=0.0 2023-11-18 11:25:13,365 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6600, loss[loss=0.1078, simple_loss=0.1136, pruned_loss=0.03442, audio_tagging_loss=0.01658, over 14063.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.1261, pruned_loss=0.04072, audio_tagging_loss=0.01233, over 3040935.91 frames. ], batch size: 53, lr: 2.04e-02, grad_scale: 64.0 2023-11-18 11:25:17,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.03 vs. limit=22.5 2023-11-18 11:25:19,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=204300.0, ans=0.2 2023-11-18 11:25:27,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=204366.66666666666, ans=0.125 2023-11-18 11:25:30,751 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2023-11-18 11:25:32,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2023-11-18 11:25:35,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=204433.33333333334, ans=0.125 2023-11-18 11:25:40,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=204433.33333333334, ans=0.125 2023-11-18 11:25:58,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.77 vs. limit=15.0 2023-11-18 11:26:08,405 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6650, loss[loss=0.1061, simple_loss=0.1069, pruned_loss=0.03473, audio_tagging_loss=0.01791, over 16191.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1254, pruned_loss=0.04034, audio_tagging_loss=0.01229, over 3049260.51 frames. ], batch size: 61, lr: 2.04e-02, grad_scale: 64.0 2023-11-18 11:26:20,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=204700.0, ans=0.2 2023-11-18 11:26:21,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=204700.0, ans=0.0 2023-11-18 11:26:26,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.45 vs. limit=15.0 2023-11-18 11:26:35,312 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 9.475e+01 1.025e+02 1.163e+02 1.869e+02, threshold=2.050e+02, percent-clipped=0.0 2023-11-18 11:26:35,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=204766.66666666666, ans=0.1 2023-11-18 11:26:39,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=204766.66666666666, ans=0.0 2023-11-18 11:26:43,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=204833.33333333334, ans=0.1 2023-11-18 11:26:45,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=204833.33333333334, ans=0.125 2023-11-18 11:26:47,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=204833.33333333334, ans=0.1 2023-11-18 11:26:47,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=204833.33333333334, ans=0.125 2023-11-18 11:27:03,167 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6700, loss[loss=0.1067, simple_loss=0.113, pruned_loss=0.03792, audio_tagging_loss=0.01224, over 14183.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1263, pruned_loss=0.0407, audio_tagging_loss=0.01219, over 3051886.59 frames. ], batch size: 55, lr: 2.04e-02, grad_scale: 64.0 2023-11-18 11:27:19,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=205033.33333333334, ans=0.2 2023-11-18 11:27:35,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.12 vs. limit=15.0 2023-11-18 11:27:49,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=205233.33333333334, ans=0.1 2023-11-18 11:27:59,109 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6750, loss[loss=0.09735, simple_loss=0.108, pruned_loss=0.03014, audio_tagging_loss=0.01318, over 15294.00 frames. ], tot_loss[loss=0.1152, simple_loss=0.1256, pruned_loss=0.04023, audio_tagging_loss=0.01216, over 3042853.88 frames. ], batch size: 57, lr: 2.04e-02, grad_scale: 64.0 2023-11-18 11:28:05,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=205300.0, ans=0.125 2023-11-18 11:28:06,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=205300.0, ans=0.0 2023-11-18 11:28:09,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=205366.66666666666, ans=0.1 2023-11-18 11:28:18,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=205366.66666666666, ans=0.125 2023-11-18 11:28:18,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=205366.66666666666, ans=0.125 2023-11-18 11:28:24,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=205433.33333333334, ans=0.95 2023-11-18 11:28:25,175 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 9.647e+01 1.086e+02 1.295e+02 2.076e+02, threshold=2.172e+02, percent-clipped=1.0 2023-11-18 11:28:31,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=205500.0, ans=0.1 2023-11-18 11:28:35,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=205500.0, ans=0.0 2023-11-18 11:28:46,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=205566.66666666666, ans=0.0 2023-11-18 11:28:47,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=205566.66666666666, ans=0.125 2023-11-18 11:28:49,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=205566.66666666666, ans=0.125 2023-11-18 11:28:53,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=205633.33333333334, ans=0.0 2023-11-18 11:28:54,655 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6800, loss[loss=0.1315, simple_loss=0.1503, pruned_loss=0.04868, audio_tagging_loss=0.00763, over 15379.00 frames. ], tot_loss[loss=0.1149, simple_loss=0.1255, pruned_loss=0.04014, audio_tagging_loss=0.01199, over 3047324.38 frames. ], batch size: 57, lr: 2.04e-02, grad_scale: 32.0 2023-11-18 11:29:01,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.34 vs. limit=10.0 2023-11-18 11:29:06,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.69 vs. limit=10.0 2023-11-18 11:29:11,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=205700.0, ans=0.0 2023-11-18 11:29:24,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=205766.66666666666, ans=0.125 2023-11-18 11:29:27,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=205833.33333333334, ans=0.1 2023-11-18 11:29:35,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=205833.33333333334, ans=0.125 2023-11-18 11:29:44,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=205900.0, ans=0.0 2023-11-18 11:29:49,785 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6850, loss[loss=0.127, simple_loss=0.1407, pruned_loss=0.0454, audio_tagging_loss=0.01127, over 15559.00 frames. ], tot_loss[loss=0.1157, simple_loss=0.1262, pruned_loss=0.04058, audio_tagging_loss=0.01198, over 3053534.78 frames. ], batch size: 57, lr: 2.04e-02, grad_scale: 32.0 2023-11-18 11:30:17,590 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 9.332e+01 1.055e+02 1.143e+02 1.752e+02, threshold=2.109e+02, percent-clipped=0.0 2023-11-18 11:30:17,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=206100.0, ans=0.125 2023-11-18 11:30:22,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=206166.66666666666, ans=0.0 2023-11-18 11:30:25,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=206166.66666666666, ans=0.1 2023-11-18 11:30:45,648 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6900, loss[loss=0.1151, simple_loss=0.131, pruned_loss=0.03982, audio_tagging_loss=0.009749, over 14938.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1263, pruned_loss=0.04046, audio_tagging_loss=0.01196, over 3055417.24 frames. ], batch size: 56, lr: 2.04e-02, grad_scale: 32.0 2023-11-18 11:30:53,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=206300.0, ans=0.0 2023-11-18 11:30:57,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=15.0 2023-11-18 11:31:03,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=8.0 2023-11-18 11:31:11,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206433.33333333334, ans=0.1 2023-11-18 11:31:16,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=206433.33333333334, ans=0.0 2023-11-18 11:31:25,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=206500.0, ans=0.125 2023-11-18 11:31:27,817 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:31:30,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=206566.66666666666, ans=0.0 2023-11-18 11:31:32,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=206566.66666666666, ans=0.0 2023-11-18 11:31:35,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=206566.66666666666, ans=0.125 2023-11-18 11:31:40,908 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 6950, loss[loss=0.1097, simple_loss=0.1237, pruned_loss=0.0361, audio_tagging_loss=0.01172, over 14633.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1271, pruned_loss=0.04049, audio_tagging_loss=0.01199, over 3050819.15 frames. ], batch size: 53, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:32:08,297 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.140e+01 9.386e+01 1.046e+02 1.149e+02 1.697e+02, threshold=2.092e+02, percent-clipped=0.0 2023-11-18 11:32:12,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=206766.66666666666, ans=0.125 2023-11-18 11:32:25,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.57 vs. limit=10.0 2023-11-18 11:32:35,653 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7000, loss[loss=0.1407, simple_loss=0.1561, pruned_loss=0.04975, audio_tagging_loss=0.01294, over 15853.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.127, pruned_loss=0.04064, audio_tagging_loss=0.01203, over 3054483.83 frames. ], batch size: 57, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:32:46,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2023-11-18 11:32:49,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=207033.33333333334, ans=0.125 2023-11-18 11:32:53,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=207033.33333333334, ans=0.1 2023-11-18 11:32:57,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=207100.0, ans=0.125 2023-11-18 11:33:11,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.35 vs. limit=15.0 2023-11-18 11:33:14,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.22 vs. limit=10.0 2023-11-18 11:33:26,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=207233.33333333334, ans=0.1 2023-11-18 11:33:31,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=207300.0, ans=0.0 2023-11-18 11:33:31,927 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7050, loss[loss=0.1266, simple_loss=0.126, pruned_loss=0.04952, audio_tagging_loss=0.01412, over 14872.00 frames. ], tot_loss[loss=0.1175, simple_loss=0.1281, pruned_loss=0.04137, audio_tagging_loss=0.01206, over 3058091.63 frames. ], batch size: 56, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:33:44,347 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:33:57,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=207433.33333333334, ans=0.125 2023-11-18 11:33:58,909 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.932e+01 9.545e+01 1.049e+02 1.197e+02 1.734e+02, threshold=2.097e+02, percent-clipped=0.0 2023-11-18 11:34:10,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=207500.0, ans=0.1 2023-11-18 11:34:27,344 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7100, loss[loss=0.1012, simple_loss=0.1155, pruned_loss=0.03347, audio_tagging_loss=0.00994, over 15424.00 frames. ], tot_loss[loss=0.1179, simple_loss=0.1289, pruned_loss=0.04137, audio_tagging_loss=0.01209, over 3054260.95 frames. ], batch size: 56, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:34:52,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=207766.66666666666, ans=0.125 2023-11-18 11:35:22,610 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7150, loss[loss=0.1364, simple_loss=0.1519, pruned_loss=0.04978, audio_tagging_loss=0.01065, over 16118.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1276, pruned_loss=0.04089, audio_tagging_loss=0.01218, over 3052755.77 frames. ], batch size: 60, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:35:34,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=208033.33333333334, ans=0.125 2023-11-18 11:35:49,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=208100.0, ans=0.1 2023-11-18 11:35:51,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 9.617e+01 1.079e+02 1.252e+02 1.872e+02, threshold=2.157e+02, percent-clipped=0.0 2023-11-18 11:36:16,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=208233.33333333334, ans=0.125 2023-11-18 11:36:19,129 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7200, loss[loss=0.1086, simple_loss=0.1211, pruned_loss=0.0368, audio_tagging_loss=0.01127, over 16018.00 frames. ], tot_loss[loss=0.1181, simple_loss=0.1288, pruned_loss=0.04144, audio_tagging_loss=0.01227, over 3056990.77 frames. ], batch size: 59, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:36:27,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=208300.0, ans=0.1 2023-11-18 11:36:38,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=15.0 2023-11-18 11:36:39,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=12.0 2023-11-18 11:36:48,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.20 vs. limit=22.5 2023-11-18 11:37:14,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=208633.33333333334, ans=0.125 2023-11-18 11:37:15,134 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7250, loss[loss=0.1045, simple_loss=0.1242, pruned_loss=0.03189, audio_tagging_loss=0.0105, over 15341.00 frames. ], tot_loss[loss=0.1173, simple_loss=0.1275, pruned_loss=0.04108, audio_tagging_loss=0.01248, over 3046493.97 frames. ], batch size: 57, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:37:41,783 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 9.494e+01 1.040e+02 1.201e+02 1.952e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-18 11:37:44,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2023-11-18 11:37:58,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=208900.0, ans=0.0 2023-11-18 11:37:59,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.86 vs. limit=6.0 2023-11-18 11:38:03,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=208900.0, ans=0.125 2023-11-18 11:38:09,859 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7300, loss[loss=0.117, simple_loss=0.1318, pruned_loss=0.0386, audio_tagging_loss=0.0125, over 14605.00 frames. ], tot_loss[loss=0.115, simple_loss=0.1251, pruned_loss=0.04004, audio_tagging_loss=0.01236, over 3037987.02 frames. ], batch size: 52, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:38:45,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=209166.66666666666, ans=0.035 2023-11-18 11:38:46,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.06 vs. limit=15.0 2023-11-18 11:38:53,817 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2023-11-18 11:38:59,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.04 vs. limit=22.5 2023-11-18 11:39:04,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2023-11-18 11:39:05,493 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7350, loss[loss=0.09374, simple_loss=0.1015, pruned_loss=0.03162, audio_tagging_loss=0.01138, over 15606.00 frames. ], tot_loss[loss=0.1147, simple_loss=0.125, pruned_loss=0.04006, audio_tagging_loss=0.01209, over 3042714.99 frames. ], batch size: 57, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:39:21,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=209366.66666666666, ans=0.0 2023-11-18 11:39:28,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=209433.33333333334, ans=0.0 2023-11-18 11:39:33,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 9.639e+01 1.055e+02 1.233e+02 1.941e+02, threshold=2.110e+02, percent-clipped=0.0 2023-11-18 11:39:36,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=209433.33333333334, ans=0.125 2023-11-18 11:39:37,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=209433.33333333334, ans=0.2 2023-11-18 11:39:51,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=209566.66666666666, ans=0.1 2023-11-18 11:40:01,551 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7400, loss[loss=0.08405, simple_loss=0.08243, pruned_loss=0.0252, audio_tagging_loss=0.01764, over 14918.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1263, pruned_loss=0.04042, audio_tagging_loss=0.01201, over 3040217.25 frames. ], batch size: 59, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:40:02,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=209633.33333333334, ans=0.1 2023-11-18 11:40:06,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=209633.33333333334, ans=0.125 2023-11-18 11:40:40,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=209833.33333333334, ans=0.0 2023-11-18 11:40:43,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.82 vs. limit=22.5 2023-11-18 11:40:45,932 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:40:55,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=209966.66666666666, ans=0.1 2023-11-18 11:40:56,854 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7450, loss[loss=0.1312, simple_loss=0.1421, pruned_loss=0.04815, audio_tagging_loss=0.01196, over 16015.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1273, pruned_loss=0.04088, audio_tagging_loss=0.01181, over 3042486.11 frames. ], batch size: 59, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:41:09,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.14 vs. limit=10.0 2023-11-18 11:41:24,895 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.797e+01 9.734e+01 1.062e+02 1.217e+02 1.649e+02, threshold=2.124e+02, percent-clipped=0.0 2023-11-18 11:41:52,360 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7500, loss[loss=0.1395, simple_loss=0.1552, pruned_loss=0.05356, audio_tagging_loss=0.008304, over 14675.00 frames. ], tot_loss[loss=0.1162, simple_loss=0.1269, pruned_loss=0.04091, audio_tagging_loss=0.0118, over 3040099.62 frames. ], batch size: 54, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:41:53,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=210300.0, ans=0.125 2023-11-18 11:41:56,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2023-11-18 11:41:59,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=210300.0, ans=0.0 2023-11-18 11:42:04,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=210366.66666666666, ans=0.0 2023-11-18 11:42:04,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=210366.66666666666, ans=0.2 2023-11-18 11:42:10,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=210366.66666666666, ans=0.125 2023-11-18 11:42:30,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.03 vs. limit=6.0 2023-11-18 11:42:40,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=210566.66666666666, ans=0.07 2023-11-18 11:42:48,180 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7550, loss[loss=0.1352, simple_loss=0.1438, pruned_loss=0.05137, audio_tagging_loss=0.01189, over 15194.00 frames. ], tot_loss[loss=0.1162, simple_loss=0.1273, pruned_loss=0.04079, audio_tagging_loss=0.01177, over 3044886.42 frames. ], batch size: 56, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:42:50,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2023-11-18 11:42:56,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=210633.33333333334, ans=0.125 2023-11-18 11:43:15,905 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 1.017e+02 1.103e+02 1.286e+02 2.062e+02, threshold=2.206e+02, percent-clipped=0.0 2023-11-18 11:43:18,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=210766.66666666666, ans=0.125 2023-11-18 11:43:19,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=210766.66666666666, ans=0.1 2023-11-18 11:43:43,331 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7600, loss[loss=0.1073, simple_loss=0.1305, pruned_loss=0.03157, audio_tagging_loss=0.01046, over 15449.00 frames. ], tot_loss[loss=0.115, simple_loss=0.1258, pruned_loss=0.04023, audio_tagging_loss=0.01186, over 3049426.22 frames. ], batch size: 55, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:43:43,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=210966.66666666666, ans=0.125 2023-11-18 11:43:50,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=210966.66666666666, ans=0.5 2023-11-18 11:43:55,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=211033.33333333334, ans=0.0 2023-11-18 11:43:56,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.25 vs. limit=15.0 2023-11-18 11:44:10,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=211100.0, ans=0.025 2023-11-18 11:44:22,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=211166.66666666666, ans=0.125 2023-11-18 11:44:24,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=211166.66666666666, ans=0.5 2023-11-18 11:44:39,637 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7650, loss[loss=0.09663, simple_loss=0.1001, pruned_loss=0.03209, audio_tagging_loss=0.01451, over 15249.00 frames. ], tot_loss[loss=0.1154, simple_loss=0.1261, pruned_loss=0.04046, audio_tagging_loss=0.01192, over 3043287.18 frames. ], batch size: 58, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:44:48,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=12.0 2023-11-18 11:44:49,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.59 vs. limit=15.0 2023-11-18 11:44:49,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=211366.66666666666, ans=10.0 2023-11-18 11:45:06,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=211433.33333333334, ans=0.125 2023-11-18 11:45:07,184 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.141e+01 9.868e+01 1.071e+02 1.213e+02 1.962e+02, threshold=2.142e+02, percent-clipped=0.0 2023-11-18 11:45:31,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=211566.66666666666, ans=0.125 2023-11-18 11:45:35,751 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7700, loss[loss=0.07483, simple_loss=0.07585, pruned_loss=0.02278, audio_tagging_loss=0.01413, over 14252.00 frames. ], tot_loss[loss=0.1152, simple_loss=0.1259, pruned_loss=0.04029, audio_tagging_loss=0.01198, over 3039486.49 frames. ], batch size: 55, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:45:39,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=211633.33333333334, ans=0.2 2023-11-18 11:45:45,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=211700.0, ans=0.125 2023-11-18 11:45:50,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=211700.0, ans=0.125 2023-11-18 11:46:07,069 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2023-11-18 11:46:24,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=211900.0, ans=0.125 2023-11-18 11:46:30,611 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7750, loss[loss=0.1404, simple_loss=0.148, pruned_loss=0.05739, audio_tagging_loss=0.009002, over 14777.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1266, pruned_loss=0.04034, audio_tagging_loss=0.01196, over 3039878.47 frames. ], batch size: 56, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:46:35,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=211966.66666666666, ans=0.1 2023-11-18 11:46:35,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=211966.66666666666, ans=0.125 2023-11-18 11:46:42,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2023-11-18 11:46:43,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=212033.33333333334, ans=0.125 2023-11-18 11:46:50,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=15.0 2023-11-18 11:46:57,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=212100.0, ans=0.07 2023-11-18 11:46:58,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=212100.0, ans=0.1 2023-11-18 11:46:59,434 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 9.456e+01 1.068e+02 1.204e+02 1.685e+02, threshold=2.136e+02, percent-clipped=0.0 2023-11-18 11:47:26,653 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7800, loss[loss=0.141, simple_loss=0.1449, pruned_loss=0.05429, audio_tagging_loss=0.01422, over 15956.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.127, pruned_loss=0.04043, audio_tagging_loss=0.01199, over 3039963.50 frames. ], batch size: 61, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:47:38,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=212366.66666666666, ans=0.0 2023-11-18 11:47:48,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=212433.33333333334, ans=0.05 2023-11-18 11:47:59,742 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:48:17,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=212566.66666666666, ans=0.0 2023-11-18 11:48:21,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=212633.33333333334, ans=0.125 2023-11-18 11:48:21,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.68 vs. limit=10.0 2023-11-18 11:48:22,974 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7850, loss[loss=0.1226, simple_loss=0.1345, pruned_loss=0.04332, audio_tagging_loss=0.01202, over 14451.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.1273, pruned_loss=0.0407, audio_tagging_loss=0.01203, over 3044306.94 frames. ], batch size: 54, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:48:28,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=212633.33333333334, ans=0.0 2023-11-18 11:48:30,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=212633.33333333334, ans=0.09899494936611666 2023-11-18 11:48:38,151 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.72 vs. limit=6.0 2023-11-18 11:48:40,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=212700.0, ans=0.0 2023-11-18 11:48:40,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=212700.0, ans=0.2 2023-11-18 11:48:49,653 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.949e+01 1.011e+02 1.139e+02 1.309e+02 2.076e+02, threshold=2.278e+02, percent-clipped=0.0 2023-11-18 11:48:54,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=212833.33333333334, ans=0.125 2023-11-18 11:48:54,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=212833.33333333334, ans=0.1 2023-11-18 11:49:00,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=212833.33333333334, ans=0.2 2023-11-18 11:49:01,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=212833.33333333334, ans=0.125 2023-11-18 11:49:08,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=212900.0, ans=0.0 2023-11-18 11:49:08,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2023-11-18 11:49:17,815 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7900, loss[loss=0.101, simple_loss=0.1, pruned_loss=0.03628, audio_tagging_loss=0.01472, over 13905.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.1266, pruned_loss=0.04066, audio_tagging_loss=0.01214, over 3044579.06 frames. ], batch size: 53, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:49:19,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=212966.66666666666, ans=0.125 2023-11-18 11:49:30,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=213033.33333333334, ans=0.0 2023-11-18 11:49:35,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=213033.33333333334, ans=0.04949747468305833 2023-11-18 11:50:11,966 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 7950, loss[loss=0.1174, simple_loss=0.126, pruned_loss=0.04104, audio_tagging_loss=0.01334, over 15461.00 frames. ], tot_loss[loss=0.115, simple_loss=0.1251, pruned_loss=0.04018, audio_tagging_loss=0.01223, over 3049059.18 frames. ], batch size: 58, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:50:23,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=213300.0, ans=0.2 2023-11-18 11:50:28,564 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:50:42,605 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 9.499e+01 1.075e+02 1.220e+02 1.746e+02, threshold=2.150e+02, percent-clipped=0.0 2023-11-18 11:50:53,678 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.52 vs. limit=12.0 2023-11-18 11:51:11,123 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8000, loss[loss=0.1177, simple_loss=0.1216, pruned_loss=0.04356, audio_tagging_loss=0.01336, over 15769.00 frames. ], tot_loss[loss=0.1148, simple_loss=0.1248, pruned_loss=0.04006, audio_tagging_loss=0.01232, over 3040642.83 frames. ], batch size: 60, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:51:19,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=213633.33333333334, ans=0.125 2023-11-18 11:51:32,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=15.0 2023-11-18 11:51:35,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=213766.66666666666, ans=0.2 2023-11-18 11:52:03,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=213900.0, ans=0.125 2023-11-18 11:52:03,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=213900.0, ans=0.1 2023-11-18 11:52:05,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=213966.66666666666, ans=0.1 2023-11-18 11:52:05,881 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8050, loss[loss=0.1367, simple_loss=0.1575, pruned_loss=0.04715, audio_tagging_loss=0.01075, over 16078.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.1265, pruned_loss=0.04056, audio_tagging_loss=0.0123, over 3043407.93 frames. ], batch size: 56, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:52:09,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=213966.66666666666, ans=0.2 2023-11-18 11:52:23,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=214033.33333333334, ans=0.125 2023-11-18 11:52:33,824 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.767e+01 9.594e+01 1.075e+02 1.227e+02 1.823e+02, threshold=2.150e+02, percent-clipped=0.0 2023-11-18 11:52:48,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=214166.66666666666, ans=0.125 2023-11-18 11:52:52,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=214233.33333333334, ans=0.125 2023-11-18 11:53:00,860 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8100, loss[loss=0.1029, simple_loss=0.1178, pruned_loss=0.03216, audio_tagging_loss=0.01182, over 15823.00 frames. ], tot_loss[loss=0.1152, simple_loss=0.1255, pruned_loss=0.0402, audio_tagging_loss=0.01227, over 3047309.40 frames. ], batch size: 60, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:53:06,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=214300.0, ans=0.2 2023-11-18 11:53:13,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214366.66666666666, ans=0.1 2023-11-18 11:53:19,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=214366.66666666666, ans=15.0 2023-11-18 11:53:47,356 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2023-11-18 11:53:49,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=214566.66666666666, ans=0.0 2023-11-18 11:53:49,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2023-11-18 11:53:56,997 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8150, loss[loss=0.1106, simple_loss=0.1288, pruned_loss=0.03516, audio_tagging_loss=0.01104, over 15306.00 frames. ], tot_loss[loss=0.1166, simple_loss=0.1272, pruned_loss=0.041, audio_tagging_loss=0.01203, over 3044740.71 frames. ], batch size: 57, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:54:15,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=214700.0, ans=0.0 2023-11-18 11:54:18,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=214766.66666666666, ans=0.125 2023-11-18 11:54:24,398 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 9.659e+01 1.081e+02 1.221e+02 1.815e+02, threshold=2.163e+02, percent-clipped=0.0 2023-11-18 11:54:26,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214766.66666666666, ans=0.1 2023-11-18 11:54:53,070 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8200, loss[loss=0.08559, simple_loss=0.1006, pruned_loss=0.02506, audio_tagging_loss=0.01025, over 16405.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1277, pruned_loss=0.04117, audio_tagging_loss=0.01186, over 3048514.46 frames. ], batch size: 61, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:54:53,093 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:54:57,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214966.66666666666, ans=0.1 2023-11-18 11:55:00,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=214966.66666666666, ans=0.125 2023-11-18 11:55:18,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=215100.0, ans=0.125 2023-11-18 11:55:47,999 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8250, loss[loss=0.1478, simple_loss=0.1665, pruned_loss=0.05587, audio_tagging_loss=0.008622, over 14538.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1269, pruned_loss=0.04071, audio_tagging_loss=0.01187, over 3047643.69 frames. ], batch size: 54, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:55:56,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.79 vs. limit=22.5 2023-11-18 11:56:05,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=215366.66666666666, ans=10.0 2023-11-18 11:56:12,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=215433.33333333334, ans=0.0 2023-11-18 11:56:16,461 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 9.340e+01 1.070e+02 1.193e+02 1.705e+02, threshold=2.140e+02, percent-clipped=0.0 2023-11-18 11:56:20,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.10 vs. limit=15.0 2023-11-18 11:56:22,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215500.0, ans=0.1 2023-11-18 11:56:25,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=215500.0, ans=0.0 2023-11-18 11:56:27,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=215500.0, ans=0.0 2023-11-18 11:56:28,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=12.0 2023-11-18 11:56:43,853 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8300, loss[loss=0.1113, simple_loss=0.1312, pruned_loss=0.03611, audio_tagging_loss=0.009561, over 15778.00 frames. ], tot_loss[loss=0.1155, simple_loss=0.1266, pruned_loss=0.04037, audio_tagging_loss=0.01185, over 3045504.26 frames. ], batch size: 59, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:56:52,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=215633.33333333334, ans=0.125 2023-11-18 11:57:09,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=215766.66666666666, ans=0.0 2023-11-18 11:57:36,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=215900.0, ans=0.125 2023-11-18 11:57:36,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2023-11-18 11:57:39,799 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8350, loss[loss=0.09269, simple_loss=0.0974, pruned_loss=0.02963, audio_tagging_loss=0.01436, over 14712.00 frames. ], tot_loss[loss=0.115, simple_loss=0.1257, pruned_loss=0.04019, audio_tagging_loss=0.01194, over 3040757.15 frames. ], batch size: 55, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:57:40,182 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2023-11-18 11:57:46,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215966.66666666666, ans=0.1 2023-11-18 11:58:07,653 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 9.952e+01 1.113e+02 1.251e+02 3.254e+02, threshold=2.227e+02, percent-clipped=1.0 2023-11-18 11:58:13,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=216166.66666666666, ans=0.0 2023-11-18 11:58:35,200 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8400, loss[loss=0.1071, simple_loss=0.115, pruned_loss=0.03899, audio_tagging_loss=0.01058, over 15402.00 frames. ], tot_loss[loss=0.1144, simple_loss=0.1251, pruned_loss=0.03983, audio_tagging_loss=0.01197, over 3039018.73 frames. ], batch size: 57, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:58:40,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=216300.0, ans=0.05 2023-11-18 11:58:48,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=216366.66666666666, ans=0.125 2023-11-18 11:58:55,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=216366.66666666666, ans=0.125 2023-11-18 11:59:11,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=216500.0, ans=0.2 2023-11-18 11:59:17,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=216500.0, ans=0.125 2023-11-18 11:59:24,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2023-11-18 11:59:30,809 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8450, loss[loss=0.1235, simple_loss=0.1403, pruned_loss=0.04231, audio_tagging_loss=0.01108, over 16115.00 frames. ], tot_loss[loss=0.1142, simple_loss=0.1247, pruned_loss=0.03982, audio_tagging_loss=0.01206, over 3036849.78 frames. ], batch size: 57, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:59:35,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=216633.33333333334, ans=0.125 2023-11-18 11:59:37,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.38 vs. limit=22.5 2023-11-18 11:59:45,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=216700.0, ans=0.1 2023-11-18 11:59:45,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.33 vs. limit=15.0 2023-11-18 11:59:58,325 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 9.450e+01 1.064e+02 1.181e+02 2.171e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-18 12:00:06,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=216833.33333333334, ans=0.125 2023-11-18 12:00:09,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=216833.33333333334, ans=0.1 2023-11-18 12:00:22,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2023-11-18 12:00:26,225 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8500, loss[loss=0.127, simple_loss=0.1457, pruned_loss=0.04413, audio_tagging_loss=0.009983, over 14633.00 frames. ], tot_loss[loss=0.1148, simple_loss=0.1254, pruned_loss=0.04007, audio_tagging_loss=0.012, over 3041843.25 frames. ], batch size: 54, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 12:00:34,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=216966.66666666666, ans=0.0 2023-11-18 12:00:47,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2023-11-18 12:00:49,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=217100.0, ans=0.1 2023-11-18 12:01:01,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=217166.66666666666, ans=0.0 2023-11-18 12:01:06,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=217166.66666666666, ans=0.035 2023-11-18 12:01:08,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217166.66666666666, ans=0.1 2023-11-18 12:01:17,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=217233.33333333334, ans=0.09899494936611666 2023-11-18 12:01:19,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=217233.33333333334, ans=0.125 2023-11-18 12:01:21,544 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8550, loss[loss=0.1227, simple_loss=0.1359, pruned_loss=0.04243, audio_tagging_loss=0.01232, over 14283.00 frames. ], tot_loss[loss=0.114, simple_loss=0.1244, pruned_loss=0.03976, audio_tagging_loss=0.01204, over 3040497.11 frames. ], batch size: 54, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 12:01:31,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2023-11-18 12:01:43,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=217433.33333333334, ans=0.125 2023-11-18 12:01:48,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=217433.33333333334, ans=0.125 2023-11-18 12:01:49,702 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 1.000e+02 1.079e+02 1.274e+02 1.597e+02, threshold=2.158e+02, percent-clipped=0.0 2023-11-18 12:01:59,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.74 vs. limit=22.5 2023-11-18 12:02:13,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.93 vs. limit=15.0 2023-11-18 12:02:17,142 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8600, loss[loss=0.1021, simple_loss=0.1133, pruned_loss=0.03253, audio_tagging_loss=0.01297, over 16039.00 frames. ], tot_loss[loss=0.1147, simple_loss=0.1252, pruned_loss=0.03999, audio_tagging_loss=0.01205, over 3042340.69 frames. ], batch size: 60, lr: 1.98e-02, grad_scale: 32.0 2023-11-18 12:02:17,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=217633.33333333334, ans=0.025 2023-11-18 12:02:40,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=217766.66666666666, ans=0.5 2023-11-18 12:02:43,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=217766.66666666666, ans=0.125 2023-11-18 12:02:46,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=217766.66666666666, ans=0.05 2023-11-18 12:02:46,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=217766.66666666666, ans=0.1 2023-11-18 12:02:50,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.97 vs. limit=10.0 2023-11-18 12:03:08,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.18 vs. limit=22.5 2023-11-18 12:03:13,311 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8650, loss[loss=0.1156, simple_loss=0.1354, pruned_loss=0.03822, audio_tagging_loss=0.009688, over 14795.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1262, pruned_loss=0.04029, audio_tagging_loss=0.01217, over 3041042.67 frames. ], batch size: 56, lr: 1.98e-02, grad_scale: 32.0 2023-11-18 12:03:15,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2023-11-18 12:03:28,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=218033.33333333334, ans=0.125 2023-11-18 12:03:32,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2023-11-18 12:03:40,755 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 9.473e+01 1.061e+02 1.180e+02 2.111e+02, threshold=2.123e+02, percent-clipped=0.0 2023-11-18 12:03:53,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=218166.66666666666, ans=0.125 2023-11-18 12:04:08,818 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8700, loss[loss=0.1394, simple_loss=0.143, pruned_loss=0.05262, audio_tagging_loss=0.01527, over 15720.00 frames. ], tot_loss[loss=0.1155, simple_loss=0.1262, pruned_loss=0.04019, audio_tagging_loss=0.01227, over 3043580.46 frames. ], batch size: 62, lr: 1.98e-02, grad_scale: 32.0 2023-11-18 12:04:14,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=218300.0, ans=0.1 2023-11-18 12:04:15,318 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:04:17,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=218300.0, ans=0.125 2023-11-18 12:04:44,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=218500.0, ans=0.07 2023-11-18 12:04:46,931 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2023-11-18 12:04:54,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=218566.66666666666, ans=0.125 2023-11-18 12:05:04,360 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8750, loss[loss=0.1103, simple_loss=0.1236, pruned_loss=0.03533, audio_tagging_loss=0.01313, over 15375.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1262, pruned_loss=0.04048, audio_tagging_loss=0.01235, over 3050126.95 frames. ], batch size: 56, lr: 1.98e-02, grad_scale: 32.0 2023-11-18 12:05:27,276 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=15.0 2023-11-18 12:05:28,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=218766.66666666666, ans=0.125 2023-11-18 12:05:31,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=218766.66666666666, ans=0.125 2023-11-18 12:05:32,026 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 9.349e+01 1.069e+02 1.188e+02 1.662e+02, threshold=2.138e+02, percent-clipped=0.0 2023-11-18 12:05:32,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=218766.66666666666, ans=0.0 2023-11-18 12:05:56,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=218900.0, ans=0.125 2023-11-18 12:06:00,708 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8800, loss[loss=0.147, simple_loss=0.1607, pruned_loss=0.05522, audio_tagging_loss=0.01145, over 15554.00 frames. ], tot_loss[loss=0.1168, simple_loss=0.1274, pruned_loss=0.04081, audio_tagging_loss=0.01233, over 3046081.79 frames. ], batch size: 57, lr: 1.98e-02, grad_scale: 64.0 2023-11-18 12:06:05,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=218966.66666666666, ans=0.125 2023-11-18 12:06:20,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=219033.33333333334, ans=0.1 2023-11-18 12:06:21,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.62 vs. limit=15.0 2023-11-18 12:06:55,584 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8850, loss[loss=0.09791, simple_loss=0.1034, pruned_loss=0.03212, audio_tagging_loss=0.01407, over 15008.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.1263, pruned_loss=0.04053, audio_tagging_loss=0.01241, over 3044518.41 frames. ], batch size: 57, lr: 1.98e-02, grad_scale: 64.0 2023-11-18 12:07:05,663 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:07:14,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.36 vs. limit=10.0 2023-11-18 12:07:24,030 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 9.378e+01 1.055e+02 1.190e+02 1.653e+02, threshold=2.110e+02, percent-clipped=0.0 2023-11-18 12:07:25,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=219433.33333333334, ans=0.0 2023-11-18 12:07:32,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=219500.0, ans=0.1 2023-11-18 12:07:41,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=219566.66666666666, ans=0.125 2023-11-18 12:07:44,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=219566.66666666666, ans=0.125 2023-11-18 12:07:50,924 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8900, loss[loss=0.07402, simple_loss=0.0763, pruned_loss=0.02218, audio_tagging_loss=0.01369, over 14061.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1263, pruned_loss=0.0402, audio_tagging_loss=0.01222, over 3045184.31 frames. ], batch size: 56, lr: 1.98e-02, grad_scale: 64.0 2023-11-18 12:08:03,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=219700.0, ans=0.0 2023-11-18 12:08:06,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=219700.0, ans=0.0 2023-11-18 12:08:39,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=219900.0, ans=0.07 2023-11-18 12:08:47,584 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 8950, loss[loss=0.1468, simple_loss=0.1657, pruned_loss=0.05126, audio_tagging_loss=0.0127, over 14645.00 frames. ], tot_loss[loss=0.1155, simple_loss=0.1269, pruned_loss=0.04006, audio_tagging_loss=0.01197, over 3048013.76 frames. ], batch size: 55, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:08:48,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=219966.66666666666, ans=0.0 2023-11-18 12:08:54,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=219966.66666666666, ans=0.125 2023-11-18 12:09:04,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=220033.33333333334, ans=0.02 2023-11-18 12:09:06,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=220033.33333333334, ans=0.125 2023-11-18 12:09:07,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=220100.0, ans=0.125 2023-11-18 12:09:09,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=220100.0, ans=0.125 2023-11-18 12:09:09,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=220100.0, ans=10.0 2023-11-18 12:09:09,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220100.0, ans=0.1 2023-11-18 12:09:14,044 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.540e+01 9.669e+01 1.057e+02 1.154e+02 1.635e+02, threshold=2.114e+02, percent-clipped=0.0 2023-11-18 12:09:17,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2023-11-18 12:09:41,983 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9000, loss[loss=0.112, simple_loss=0.1242, pruned_loss=0.03581, audio_tagging_loss=0.01408, over 14628.00 frames. ], tot_loss[loss=0.1146, simple_loss=0.1257, pruned_loss=0.03969, audio_tagging_loss=0.01207, over 3044609.97 frames. ], batch size: 58, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:09:41,984 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 12:10:14,809 INFO [train_asr.py:1147] (1/4) Epoch 3, validation: loss=0.07901, simple_loss=0.06429, pruned_loss=0.01152, audio_tagging_loss=0.03534, over 4681554.00 frames. 2023-11-18 12:10:14,809 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 12:10:16,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=220300.0, ans=0.2 2023-11-18 12:10:17,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=220300.0, ans=0.0 2023-11-18 12:10:20,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=220300.0, ans=0.0 2023-11-18 12:10:22,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=220300.0, ans=0.125 2023-11-18 12:10:48,068 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:10:48,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=220500.0, ans=0.0 2023-11-18 12:10:48,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=220500.0, ans=0.2 2023-11-18 12:11:06,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=220566.66666666666, ans=0.125 2023-11-18 12:11:08,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=220633.33333333334, ans=0.025 2023-11-18 12:11:09,374 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9050, loss[loss=0.1086, simple_loss=0.1206, pruned_loss=0.0397, audio_tagging_loss=0.008536, over 14933.00 frames. ], tot_loss[loss=0.1158, simple_loss=0.1273, pruned_loss=0.04023, audio_tagging_loss=0.01194, over 3044463.07 frames. ], batch size: 55, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:11:13,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=220633.33333333334, ans=0.125 2023-11-18 12:11:14,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.65 vs. limit=22.5 2023-11-18 12:11:15,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=220633.33333333334, ans=0.0 2023-11-18 12:11:34,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=220766.66666666666, ans=0.125 2023-11-18 12:11:36,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 9.522e+01 1.061e+02 1.198e+02 2.427e+02, threshold=2.123e+02, percent-clipped=1.0 2023-11-18 12:12:03,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2023-11-18 12:12:04,410 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9100, loss[loss=0.09839, simple_loss=0.09922, pruned_loss=0.03488, audio_tagging_loss=0.0139, over 14879.00 frames. ], tot_loss[loss=0.1147, simple_loss=0.1256, pruned_loss=0.03986, audio_tagging_loss=0.01205, over 3044226.62 frames. ], batch size: 56, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:12:08,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.42 vs. limit=12.0 2023-11-18 12:12:19,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=221033.33333333334, ans=0.1 2023-11-18 12:12:30,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=221100.0, ans=0.125 2023-11-18 12:12:38,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=221166.66666666666, ans=0.125 2023-11-18 12:12:47,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=221166.66666666666, ans=0.1 2023-11-18 12:12:57,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=221233.33333333334, ans=0.0 2023-11-18 12:12:57,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2023-11-18 12:13:00,117 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9150, loss[loss=0.1057, simple_loss=0.1028, pruned_loss=0.03766, audio_tagging_loss=0.0166, over 14789.00 frames. ], tot_loss[loss=0.1146, simple_loss=0.1252, pruned_loss=0.03986, audio_tagging_loss=0.01217, over 3037049.26 frames. ], batch size: 57, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:13:05,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=221300.0, ans=0.125 2023-11-18 12:13:09,375 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=12.0 2023-11-18 12:13:14,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2023-11-18 12:13:15,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2023-11-18 12:13:19,076 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:13:26,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=221433.33333333334, ans=0.125 2023-11-18 12:13:28,252 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.044e+01 9.509e+01 1.025e+02 1.123e+02 1.698e+02, threshold=2.050e+02, percent-clipped=0.0 2023-11-18 12:13:29,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=12.0 2023-11-18 12:13:57,070 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9200, loss[loss=0.1638, simple_loss=0.1902, pruned_loss=0.05966, audio_tagging_loss=0.009016, over 15692.00 frames. ], tot_loss[loss=0.1146, simple_loss=0.1255, pruned_loss=0.0399, audio_tagging_loss=0.01202, over 3039057.59 frames. ], batch size: 58, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:14:01,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=221633.33333333334, ans=0.125 2023-11-18 12:14:03,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=221633.33333333334, ans=0.125 2023-11-18 12:14:08,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=221700.0, ans=0.05 2023-11-18 12:14:18,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=221766.66666666666, ans=0.125 2023-11-18 12:14:19,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=221766.66666666666, ans=0.1 2023-11-18 12:14:29,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=221833.33333333334, ans=0.05 2023-11-18 12:14:32,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=221833.33333333334, ans=0.5 2023-11-18 12:14:37,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=221833.33333333334, ans=0.1 2023-11-18 12:14:51,680 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9250, loss[loss=0.1096, simple_loss=0.1196, pruned_loss=0.03931, audio_tagging_loss=0.01052, over 15432.00 frames. ], tot_loss[loss=0.1149, simple_loss=0.1258, pruned_loss=0.04011, audio_tagging_loss=0.01191, over 3042800.34 frames. ], batch size: 57, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:15:15,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=222100.0, ans=0.0 2023-11-18 12:15:20,392 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.922e+01 9.718e+01 1.095e+02 1.245e+02 2.428e+02, threshold=2.190e+02, percent-clipped=1.0 2023-11-18 12:15:22,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=222100.0, ans=0.125 2023-11-18 12:15:23,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=222100.0, ans=0.125 2023-11-18 12:15:29,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=222166.66666666666, ans=0.0 2023-11-18 12:15:38,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=222233.33333333334, ans=0.1 2023-11-18 12:15:39,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.49 vs. limit=6.0 2023-11-18 12:15:46,834 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9300, loss[loss=0.1311, simple_loss=0.1499, pruned_loss=0.04412, audio_tagging_loss=0.01197, over 15657.00 frames. ], tot_loss[loss=0.1147, simple_loss=0.1255, pruned_loss=0.03993, audio_tagging_loss=0.01201, over 3040627.17 frames. ], batch size: 56, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:16:05,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=222366.66666666666, ans=0.125 2023-11-18 12:16:15,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=222433.33333333334, ans=0.125 2023-11-18 12:16:15,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=222433.33333333334, ans=0.2 2023-11-18 12:16:17,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=222433.33333333334, ans=0.2 2023-11-18 12:16:26,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=222500.0, ans=0.1 2023-11-18 12:16:29,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=222500.0, ans=0.125 2023-11-18 12:16:30,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=222566.66666666666, ans=0.125 2023-11-18 12:16:43,447 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9350, loss[loss=0.1208, simple_loss=0.1272, pruned_loss=0.04242, audio_tagging_loss=0.01477, over 16342.00 frames. ], tot_loss[loss=0.1145, simple_loss=0.1253, pruned_loss=0.03985, audio_tagging_loss=0.01201, over 3047057.82 frames. ], batch size: 60, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:16:55,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=222700.0, ans=0.1 2023-11-18 12:17:08,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2023-11-18 12:17:10,516 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.296e+01 9.866e+01 1.128e+02 1.276e+02 1.788e+02, threshold=2.257e+02, percent-clipped=0.0 2023-11-18 12:17:18,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=222833.33333333334, ans=0.1 2023-11-18 12:17:38,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=222966.66666666666, ans=0.0 2023-11-18 12:17:39,425 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9400, loss[loss=0.1467, simple_loss=0.1647, pruned_loss=0.05253, audio_tagging_loss=0.01183, over 15036.00 frames. ], tot_loss[loss=0.1145, simple_loss=0.1251, pruned_loss=0.03982, audio_tagging_loss=0.01219, over 3050529.04 frames. ], batch size: 56, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:18:31,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=223233.33333333334, ans=0.0 2023-11-18 12:18:32,261 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:18:34,369 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9450, loss[loss=0.1102, simple_loss=0.1143, pruned_loss=0.04078, audio_tagging_loss=0.01224, over 16032.00 frames. ], tot_loss[loss=0.1141, simple_loss=0.1245, pruned_loss=0.03946, audio_tagging_loss=0.01232, over 3052468.35 frames. ], batch size: 62, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:18:53,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=223366.66666666666, ans=0.0 2023-11-18 12:19:02,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=223433.33333333334, ans=0.125 2023-11-18 12:19:02,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=223433.33333333334, ans=0.125 2023-11-18 12:19:03,041 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 9.898e+01 1.046e+02 1.175e+02 1.737e+02, threshold=2.092e+02, percent-clipped=0.0 2023-11-18 12:19:08,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=223500.0, ans=0.2 2023-11-18 12:19:31,342 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9500, loss[loss=0.1343, simple_loss=0.1433, pruned_loss=0.05346, audio_tagging_loss=0.009195, over 15718.00 frames. ], tot_loss[loss=0.1136, simple_loss=0.1238, pruned_loss=0.03925, audio_tagging_loss=0.01249, over 3056775.59 frames. ], batch size: 61, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:19:31,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=223633.33333333334, ans=0.2 2023-11-18 12:19:41,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=223700.0, ans=0.2 2023-11-18 12:20:05,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2023-11-18 12:20:27,317 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9550, loss[loss=0.1449, simple_loss=0.1582, pruned_loss=0.05517, audio_tagging_loss=0.01063, over 15019.00 frames. ], tot_loss[loss=0.1142, simple_loss=0.1246, pruned_loss=0.03936, audio_tagging_loss=0.01252, over 3049091.65 frames. ], batch size: 58, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:20:30,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=223966.66666666666, ans=0.07 2023-11-18 12:20:55,688 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 1.006e+02 1.117e+02 1.248e+02 1.898e+02, threshold=2.233e+02, percent-clipped=0.0 2023-11-18 12:21:01,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=224166.66666666666, ans=0.125 2023-11-18 12:21:19,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-11-18 12:21:22,535 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9600, loss[loss=0.09163, simple_loss=0.1036, pruned_loss=0.02618, audio_tagging_loss=0.01366, over 15862.00 frames. ], tot_loss[loss=0.1148, simple_loss=0.1252, pruned_loss=0.03966, audio_tagging_loss=0.01252, over 3047698.38 frames. ], batch size: 60, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:21:25,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=224300.0, ans=0.05 2023-11-18 12:21:25,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=224300.0, ans=0.125 2023-11-18 12:21:32,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=224300.0, ans=0.1 2023-11-18 12:21:33,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=224366.66666666666, ans=0.05 2023-11-18 12:21:40,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.31 vs. limit=22.5 2023-11-18 12:22:18,512 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9650, loss[loss=0.09798, simple_loss=0.106, pruned_loss=0.03077, audio_tagging_loss=0.0142, over 15499.00 frames. ], tot_loss[loss=0.1145, simple_loss=0.1251, pruned_loss=0.03962, audio_tagging_loss=0.01236, over 3048373.49 frames. ], batch size: 59, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:22:20,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=224633.33333333334, ans=0.125 2023-11-18 12:22:36,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=224700.0, ans=0.1 2023-11-18 12:22:45,985 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.215e+01 1.048e+02 1.170e+02 1.955e+02, threshold=2.095e+02, percent-clipped=0.0 2023-11-18 12:22:46,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.39 vs. limit=15.0 2023-11-18 12:22:47,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=224766.66666666666, ans=0.125 2023-11-18 12:23:01,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=224833.33333333334, ans=0.0 2023-11-18 12:23:14,156 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9700, loss[loss=0.1102, simple_loss=0.1104, pruned_loss=0.04059, audio_tagging_loss=0.01442, over 13920.00 frames. ], tot_loss[loss=0.1144, simple_loss=0.1254, pruned_loss=0.03953, audio_tagging_loss=0.01211, over 3047685.89 frames. ], batch size: 54, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:24:09,672 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9750, loss[loss=0.08948, simple_loss=0.09658, pruned_loss=0.02693, audio_tagging_loss=0.01426, over 15095.00 frames. ], tot_loss[loss=0.1141, simple_loss=0.1252, pruned_loss=0.03946, audio_tagging_loss=0.01199, over 3050690.13 frames. ], batch size: 57, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:24:15,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=225300.0, ans=0.07 2023-11-18 12:24:19,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2023-11-18 12:24:25,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=225366.66666666666, ans=0.125 2023-11-18 12:24:29,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=225366.66666666666, ans=0.0 2023-11-18 12:24:38,100 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 9.876e+01 1.096e+02 1.307e+02 1.863e+02, threshold=2.192e+02, percent-clipped=0.0 2023-11-18 12:24:44,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=225500.0, ans=0.0 2023-11-18 12:25:02,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=225566.66666666666, ans=10.0 2023-11-18 12:25:02,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=225566.66666666666, ans=0.125 2023-11-18 12:25:06,221 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9800, loss[loss=0.1267, simple_loss=0.1545, pruned_loss=0.03902, audio_tagging_loss=0.01045, over 15622.00 frames. ], tot_loss[loss=0.1135, simple_loss=0.1248, pruned_loss=0.03923, audio_tagging_loss=0.01191, over 3051272.88 frames. ], batch size: 57, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:25:08,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=225633.33333333334, ans=0.0 2023-11-18 12:25:08,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.06 vs. limit=22.5 2023-11-18 12:25:38,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=12.0 2023-11-18 12:25:55,523 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:25:59,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=12.0 2023-11-18 12:26:01,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=225966.66666666666, ans=0.1 2023-11-18 12:26:01,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.63 vs. limit=22.5 2023-11-18 12:26:01,871 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9850, loss[loss=0.1228, simple_loss=0.137, pruned_loss=0.04475, audio_tagging_loss=0.009517, over 15029.00 frames. ], tot_loss[loss=0.1144, simple_loss=0.126, pruned_loss=0.03958, audio_tagging_loss=0.01179, over 3055442.86 frames. ], batch size: 57, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:26:10,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=225966.66666666666, ans=0.125 2023-11-18 12:26:14,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=226033.33333333334, ans=0.125 2023-11-18 12:26:15,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=226033.33333333334, ans=0.0 2023-11-18 12:26:16,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=226033.33333333334, ans=0.125 2023-11-18 12:26:16,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=226033.33333333334, ans=0.2 2023-11-18 12:26:19,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=226033.33333333334, ans=0.125 2023-11-18 12:26:20,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=226033.33333333334, ans=0.2 2023-11-18 12:26:21,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=226033.33333333334, ans=0.05 2023-11-18 12:26:23,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.74 vs. limit=10.0 2023-11-18 12:26:30,006 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.673e+01 9.358e+01 1.054e+02 1.143e+02 1.553e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 12:26:48,968 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.77 vs. limit=10.0 2023-11-18 12:26:55,438 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2023-11-18 12:26:57,535 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9900, loss[loss=0.1296, simple_loss=0.1302, pruned_loss=0.05013, audio_tagging_loss=0.01435, over 16101.00 frames. ], tot_loss[loss=0.1141, simple_loss=0.1256, pruned_loss=0.03951, audio_tagging_loss=0.01179, over 3057572.67 frames. ], batch size: 61, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:27:08,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=226366.66666666666, ans=0.125 2023-11-18 12:27:11,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=226366.66666666666, ans=0.125 2023-11-18 12:27:31,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2023-11-18 12:27:53,591 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 9950, loss[loss=0.108, simple_loss=0.1231, pruned_loss=0.03367, audio_tagging_loss=0.01277, over 15037.00 frames. ], tot_loss[loss=0.1135, simple_loss=0.125, pruned_loss=0.03926, audio_tagging_loss=0.0118, over 3050939.17 frames. ], batch size: 56, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:28:01,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=226633.33333333334, ans=0.0 2023-11-18 12:28:14,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=226766.66666666666, ans=0.125 2023-11-18 12:28:20,833 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.751e+01 9.736e+01 1.077e+02 1.172e+02 1.969e+02, threshold=2.153e+02, percent-clipped=0.0 2023-11-18 12:28:31,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=226833.33333333334, ans=0.125 2023-11-18 12:28:38,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=226900.0, ans=0.125 2023-11-18 12:28:46,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=226900.0, ans=0.125 2023-11-18 12:28:49,507 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10000, loss[loss=0.08709, simple_loss=0.0948, pruned_loss=0.02394, audio_tagging_loss=0.01575, over 16302.00 frames. ], tot_loss[loss=0.1136, simple_loss=0.125, pruned_loss=0.03936, audio_tagging_loss=0.01175, over 3047797.06 frames. ], batch size: 63, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:28:51,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=226966.66666666666, ans=0.125 2023-11-18 12:29:09,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=227033.33333333334, ans=0.125 2023-11-18 12:29:44,571 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10050, loss[loss=0.1287, simple_loss=0.1458, pruned_loss=0.04294, audio_tagging_loss=0.0129, over 15180.00 frames. ], tot_loss[loss=0.114, simple_loss=0.1257, pruned_loss=0.03946, audio_tagging_loss=0.01172, over 3051866.40 frames. ], batch size: 56, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:29:51,088 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2023-11-18 12:29:56,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=227366.66666666666, ans=0.125 2023-11-18 12:30:00,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=227366.66666666666, ans=0.125 2023-11-18 12:30:00,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.31 vs. limit=22.5 2023-11-18 12:30:01,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=227366.66666666666, ans=15.0 2023-11-18 12:30:13,277 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 9.408e+01 1.046e+02 1.122e+02 1.934e+02, threshold=2.091e+02, percent-clipped=0.0 2023-11-18 12:30:14,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=227433.33333333334, ans=0.125 2023-11-18 12:30:15,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=227433.33333333334, ans=0.125 2023-11-18 12:30:15,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2023-11-18 12:30:41,428 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10100, loss[loss=0.1501, simple_loss=0.1706, pruned_loss=0.05569, audio_tagging_loss=0.00913, over 15557.00 frames. ], tot_loss[loss=0.1139, simple_loss=0.1253, pruned_loss=0.03937, audio_tagging_loss=0.01187, over 3050924.31 frames. ], batch size: 55, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:30:56,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.44 vs. limit=15.0 2023-11-18 12:31:13,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=227833.33333333334, ans=0.0 2023-11-18 12:31:23,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=15.0 2023-11-18 12:31:25,793 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:31:35,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=227900.0, ans=10.0 2023-11-18 12:31:36,832 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10150, loss[loss=0.1169, simple_loss=0.1344, pruned_loss=0.03733, audio_tagging_loss=0.0124, over 15697.00 frames. ], tot_loss[loss=0.1147, simple_loss=0.1265, pruned_loss=0.03963, audio_tagging_loss=0.01182, over 3054339.94 frames. ], batch size: 57, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:31:43,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.75 vs. limit=22.5 2023-11-18 12:32:01,844 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:32:04,447 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 9.918e+01 1.075e+02 1.229e+02 2.012e+02, threshold=2.150e+02, percent-clipped=0.0 2023-11-18 12:32:32,043 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10200, loss[loss=0.1564, simple_loss=0.1705, pruned_loss=0.06056, audio_tagging_loss=0.01061, over 16162.00 frames. ], tot_loss[loss=0.1151, simple_loss=0.1266, pruned_loss=0.03976, audio_tagging_loss=0.01198, over 3058847.35 frames. ], batch size: 57, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:32:40,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=228300.0, ans=0.95 2023-11-18 12:32:46,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2023-11-18 12:32:52,223 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:33:02,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=228433.33333333334, ans=0.125 2023-11-18 12:33:04,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=228500.0, ans=0.125 2023-11-18 12:33:16,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.18 vs. limit=15.0 2023-11-18 12:33:26,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.23 vs. limit=6.0 2023-11-18 12:33:27,633 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10250, loss[loss=0.1689, simple_loss=0.1844, pruned_loss=0.06384, audio_tagging_loss=0.01284, over 15209.00 frames. ], tot_loss[loss=0.1143, simple_loss=0.1256, pruned_loss=0.03935, audio_tagging_loss=0.01209, over 3051925.46 frames. ], batch size: 56, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:33:29,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2023-11-18 12:33:30,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=228633.33333333334, ans=0.125 2023-11-18 12:33:55,205 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 9.506e+01 1.026e+02 1.188e+02 1.534e+02, threshold=2.052e+02, percent-clipped=0.0 2023-11-18 12:34:23,494 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10300, loss[loss=0.1203, simple_loss=0.1317, pruned_loss=0.04323, audio_tagging_loss=0.01121, over 14448.00 frames. ], tot_loss[loss=0.1137, simple_loss=0.1248, pruned_loss=0.03898, audio_tagging_loss=0.01228, over 3047193.32 frames. ], batch size: 55, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:34:32,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=228966.66666666666, ans=0.0 2023-11-18 12:34:33,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=229033.33333333334, ans=0.1 2023-11-18 12:34:42,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=229033.33333333334, ans=0.0 2023-11-18 12:34:46,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=229100.0, ans=0.07 2023-11-18 12:34:58,482 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.73 vs. limit=12.0 2023-11-18 12:35:09,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=229233.33333333334, ans=0.0 2023-11-18 12:35:11,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=229233.33333333334, ans=0.125 2023-11-18 12:35:18,585 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10350, loss[loss=0.1147, simple_loss=0.1237, pruned_loss=0.04003, audio_tagging_loss=0.01283, over 15045.00 frames. ], tot_loss[loss=0.1136, simple_loss=0.1249, pruned_loss=0.0388, audio_tagging_loss=0.01231, over 3052564.60 frames. ], batch size: 58, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:35:24,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=229300.0, ans=0.0 2023-11-18 12:35:47,377 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.929e+01 9.749e+01 1.144e+02 1.287e+02 1.806e+02, threshold=2.288e+02, percent-clipped=0.0 2023-11-18 12:36:02,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=229566.66666666666, ans=0.125 2023-11-18 12:36:14,267 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10400, loss[loss=0.1104, simple_loss=0.13, pruned_loss=0.036, audio_tagging_loss=0.009437, over 15652.00 frames. ], tot_loss[loss=0.1135, simple_loss=0.1244, pruned_loss=0.03892, audio_tagging_loss=0.01236, over 3051534.70 frames. ], batch size: 57, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:36:14,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=229633.33333333334, ans=0.125 2023-11-18 12:36:24,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=229633.33333333334, ans=0.125 2023-11-18 12:36:24,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=229700.0, ans=0.0 2023-11-18 12:36:33,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=229700.0, ans=0.125 2023-11-18 12:37:03,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.56 vs. limit=15.0 2023-11-18 12:37:10,521 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10450, loss[loss=0.1022, simple_loss=0.1039, pruned_loss=0.03582, audio_tagging_loss=0.01442, over 13891.00 frames. ], tot_loss[loss=0.115, simple_loss=0.1263, pruned_loss=0.03968, audio_tagging_loss=0.01218, over 3049733.43 frames. ], batch size: 52, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:37:19,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=229966.66666666666, ans=10.0 2023-11-18 12:37:31,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=230100.0, ans=0.125 2023-11-18 12:37:33,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.11 vs. limit=22.5 2023-11-18 12:37:33,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=230100.0, ans=0.125 2023-11-18 12:37:37,310 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.707e+01 9.254e+01 9.863e+01 1.141e+02 1.786e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 12:37:44,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.04 vs. limit=10.0 2023-11-18 12:37:47,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=230166.66666666666, ans=0.125 2023-11-18 12:38:05,266 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10500, loss[loss=0.1115, simple_loss=0.1283, pruned_loss=0.03645, audio_tagging_loss=0.01095, over 16025.00 frames. ], tot_loss[loss=0.1147, simple_loss=0.126, pruned_loss=0.0397, audio_tagging_loss=0.01199, over 3048843.79 frames. ], batch size: 60, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:38:27,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2023-11-18 12:38:46,880 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:38:59,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=230633.33333333334, ans=0.125 2023-11-18 12:39:00,317 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10550, loss[loss=0.1027, simple_loss=0.12, pruned_loss=0.03196, audio_tagging_loss=0.01075, over 16510.00 frames. ], tot_loss[loss=0.1143, simple_loss=0.1252, pruned_loss=0.03968, audio_tagging_loss=0.01195, over 3053584.34 frames. ], batch size: 62, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:39:04,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=230633.33333333334, ans=0.0 2023-11-18 12:39:04,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=230633.33333333334, ans=0.125 2023-11-18 12:39:10,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=230633.33333333334, ans=0.0 2023-11-18 12:39:29,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 9.555e+01 1.067e+02 1.229e+02 1.948e+02, threshold=2.135e+02, percent-clipped=0.0 2023-11-18 12:39:31,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.40 vs. limit=6.0 2023-11-18 12:39:55,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=230900.0, ans=0.125 2023-11-18 12:39:56,905 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10600, loss[loss=0.1091, simple_loss=0.1173, pruned_loss=0.03908, audio_tagging_loss=0.01132, over 15591.00 frames. ], tot_loss[loss=0.1145, simple_loss=0.1257, pruned_loss=0.03983, audio_tagging_loss=0.01186, over 3053994.43 frames. ], batch size: 58, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:40:08,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=231033.33333333334, ans=0.0 2023-11-18 12:40:15,261 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=15.0 2023-11-18 12:40:16,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=231033.33333333334, ans=0.125 2023-11-18 12:40:27,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=231100.0, ans=0.1 2023-11-18 12:40:30,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=231166.66666666666, ans=0.2 2023-11-18 12:40:33,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=231166.66666666666, ans=0.07 2023-11-18 12:40:35,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2023-11-18 12:40:52,908 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10650, loss[loss=0.09006, simple_loss=0.1015, pruned_loss=0.0275, audio_tagging_loss=0.01179, over 16080.00 frames. ], tot_loss[loss=0.1136, simple_loss=0.1247, pruned_loss=0.03945, audio_tagging_loss=0.01179, over 3047686.97 frames. ], batch size: 58, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:41:01,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=231300.0, ans=0.125 2023-11-18 12:41:04,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=231366.66666666666, ans=0.125 2023-11-18 12:41:05,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=231366.66666666666, ans=0.0 2023-11-18 12:41:20,428 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.385e+01 9.562e+01 1.044e+02 1.196e+02 1.427e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 12:41:29,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=231500.0, ans=0.05 2023-11-18 12:41:39,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=231566.66666666666, ans=0.125 2023-11-18 12:41:48,147 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10700, loss[loss=0.09507, simple_loss=0.0995, pruned_loss=0.03117, audio_tagging_loss=0.01415, over 13848.00 frames. ], tot_loss[loss=0.1128, simple_loss=0.124, pruned_loss=0.03895, audio_tagging_loss=0.01185, over 3049006.66 frames. ], batch size: 53, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:41:51,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=231633.33333333334, ans=0.125 2023-11-18 12:41:53,231 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:41:57,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-11-18 12:42:10,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.75 vs. limit=10.0 2023-11-18 12:42:14,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=231766.66666666666, ans=0.125 2023-11-18 12:42:16,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=231766.66666666666, ans=0.125 2023-11-18 12:42:21,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-11-18 12:42:24,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231833.33333333334, ans=0.1 2023-11-18 12:42:26,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=231833.33333333334, ans=0.0 2023-11-18 12:42:27,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=231833.33333333334, ans=0.0 2023-11-18 12:42:43,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.14 vs. limit=15.0 2023-11-18 12:42:44,310 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10750, loss[loss=0.1127, simple_loss=0.1208, pruned_loss=0.04029, audio_tagging_loss=0.012, over 14847.00 frames. ], tot_loss[loss=0.1129, simple_loss=0.1243, pruned_loss=0.03898, audio_tagging_loss=0.0118, over 3044868.55 frames. ], batch size: 56, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:43:09,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=232100.0, ans=0.2 2023-11-18 12:43:10,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=232100.0, ans=0.0 2023-11-18 12:43:12,201 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 9.211e+01 1.033e+02 1.162e+02 1.735e+02, threshold=2.066e+02, percent-clipped=0.0 2023-11-18 12:43:40,217 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10800, loss[loss=0.1253, simple_loss=0.1425, pruned_loss=0.04541, audio_tagging_loss=0.008696, over 15678.00 frames. ], tot_loss[loss=0.1125, simple_loss=0.1238, pruned_loss=0.03874, audio_tagging_loss=0.01181, over 3036316.29 frames. ], batch size: 56, lr: 1.92e-02, grad_scale: 128.0 2023-11-18 12:44:35,723 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10850, loss[loss=0.1187, simple_loss=0.1344, pruned_loss=0.04166, audio_tagging_loss=0.009877, over 15134.00 frames. ], tot_loss[loss=0.1125, simple_loss=0.1238, pruned_loss=0.03869, audio_tagging_loss=0.01188, over 3039814.55 frames. ], batch size: 57, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:44:41,675 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=232633.33333333334, ans=0.0 2023-11-18 12:44:43,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.37 vs. limit=22.5 2023-11-18 12:44:56,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=232700.0, ans=0.0 2023-11-18 12:45:04,525 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.924e+01 9.821e+01 1.081e+02 1.220e+02 1.822e+02, threshold=2.162e+02, percent-clipped=0.0 2023-11-18 12:45:27,315 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:45:31,530 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10900, loss[loss=0.09233, simple_loss=0.09829, pruned_loss=0.03019, audio_tagging_loss=0.01299, over 15528.00 frames. ], tot_loss[loss=0.1131, simple_loss=0.1244, pruned_loss=0.039, audio_tagging_loss=0.01192, over 3038568.98 frames. ], batch size: 57, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:45:31,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=232966.66666666666, ans=0.1 2023-11-18 12:45:35,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=232966.66666666666, ans=0.125 2023-11-18 12:45:40,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=232966.66666666666, ans=0.125 2023-11-18 12:45:42,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=233033.33333333334, ans=0.125 2023-11-18 12:45:47,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=233033.33333333334, ans=0.1 2023-11-18 12:45:48,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=15.0 2023-11-18 12:46:13,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=233166.66666666666, ans=0.1 2023-11-18 12:46:27,306 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 10950, loss[loss=0.1314, simple_loss=0.1573, pruned_loss=0.04466, audio_tagging_loss=0.008048, over 14722.00 frames. ], tot_loss[loss=0.1133, simple_loss=0.1247, pruned_loss=0.03896, audio_tagging_loss=0.01203, over 3041194.61 frames. ], batch size: 56, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:46:27,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=233300.0, ans=0.2 2023-11-18 12:46:41,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=233366.66666666666, ans=0.125 2023-11-18 12:46:56,639 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 9.530e+01 1.056e+02 1.170e+02 1.707e+02, threshold=2.112e+02, percent-clipped=0.0 2023-11-18 12:47:23,192 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11000, loss[loss=0.1166, simple_loss=0.1264, pruned_loss=0.03979, audio_tagging_loss=0.01358, over 15592.00 frames. ], tot_loss[loss=0.1135, simple_loss=0.1249, pruned_loss=0.0389, audio_tagging_loss=0.01213, over 3043446.78 frames. ], batch size: 59, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:47:32,156 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:47:50,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=12.0 2023-11-18 12:48:16,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=233900.0, ans=0.125 2023-11-18 12:48:19,018 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11050, loss[loss=0.1242, simple_loss=0.1344, pruned_loss=0.04336, audio_tagging_loss=0.01364, over 15215.00 frames. ], tot_loss[loss=0.1133, simple_loss=0.1244, pruned_loss=0.03888, audio_tagging_loss=0.01223, over 3047529.29 frames. ], batch size: 55, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:48:20,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=233966.66666666666, ans=0.2 2023-11-18 12:48:32,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=234033.33333333334, ans=0.125 2023-11-18 12:48:33,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2023-11-18 12:48:47,584 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 9.699e+01 1.056e+02 1.206e+02 1.867e+02, threshold=2.111e+02, percent-clipped=0.0 2023-11-18 12:48:49,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.26 vs. limit=15.0 2023-11-18 12:48:59,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=234166.66666666666, ans=0.125 2023-11-18 12:48:59,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=234166.66666666666, ans=0.125 2023-11-18 12:49:04,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=234233.33333333334, ans=0.125 2023-11-18 12:49:14,674 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11100, loss[loss=0.09338, simple_loss=0.0948, pruned_loss=0.03002, audio_tagging_loss=0.01596, over 15582.00 frames. ], tot_loss[loss=0.1137, simple_loss=0.1245, pruned_loss=0.03901, audio_tagging_loss=0.01242, over 3051056.46 frames. ], batch size: 63, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:49:15,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.66 vs. limit=10.0 2023-11-18 12:49:49,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=234500.0, ans=0.125 2023-11-18 12:50:09,601 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11150, loss[loss=0.09415, simple_loss=0.1024, pruned_loss=0.02934, audio_tagging_loss=0.0136, over 14387.00 frames. ], tot_loss[loss=0.1132, simple_loss=0.1234, pruned_loss=0.03892, audio_tagging_loss=0.01251, over 3042393.67 frames. ], batch size: 56, lr: 1.91e-02, grad_scale: 64.0 2023-11-18 12:50:27,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=234700.0, ans=0.2 2023-11-18 12:50:32,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.99 vs. limit=15.0 2023-11-18 12:50:39,490 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.809e+01 9.879e+01 1.104e+02 1.293e+02 2.710e+02, threshold=2.209e+02, percent-clipped=1.0 2023-11-18 12:50:42,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=234833.33333333334, ans=0.0 2023-11-18 12:50:43,897 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:50:51,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.53 vs. limit=22.5 2023-11-18 12:50:55,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=234900.0, ans=0.125 2023-11-18 12:50:55,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.55 vs. limit=10.0 2023-11-18 12:50:58,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=234900.0, ans=0.1 2023-11-18 12:51:06,383 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11200, loss[loss=0.1043, simple_loss=0.1119, pruned_loss=0.03646, audio_tagging_loss=0.01191, over 14062.00 frames. ], tot_loss[loss=0.1121, simple_loss=0.1223, pruned_loss=0.03822, audio_tagging_loss=0.01269, over 3042305.63 frames. ], batch size: 54, lr: 1.91e-02, grad_scale: 64.0 2023-11-18 12:51:16,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=235033.33333333334, ans=0.125 2023-11-18 12:51:16,853 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:51:25,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=235033.33333333334, ans=0.2 2023-11-18 12:51:25,343 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2023-11-18 12:51:40,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=235166.66666666666, ans=0.07 2023-11-18 12:51:50,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=235233.33333333334, ans=0.125 2023-11-18 12:51:55,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=235233.33333333334, ans=0.125 2023-11-18 12:52:01,480 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11250, loss[loss=0.09902, simple_loss=0.09974, pruned_loss=0.03518, audio_tagging_loss=0.01397, over 14201.00 frames. ], tot_loss[loss=0.1113, simple_loss=0.1215, pruned_loss=0.0379, audio_tagging_loss=0.01263, over 3039269.44 frames. ], batch size: 56, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:52:18,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=235366.66666666666, ans=0.125 2023-11-18 12:52:31,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.992e+01 9.651e+01 1.080e+02 1.306e+02 2.369e+02, threshold=2.160e+02, percent-clipped=1.0 2023-11-18 12:52:40,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=235500.0, ans=0.125 2023-11-18 12:52:56,392 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11300, loss[loss=0.0991, simple_loss=0.106, pruned_loss=0.03117, audio_tagging_loss=0.01493, over 14351.00 frames. ], tot_loss[loss=0.1121, simple_loss=0.1229, pruned_loss=0.03825, audio_tagging_loss=0.01241, over 3045379.58 frames. ], batch size: 58, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:53:51,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=235966.66666666666, ans=0.125 2023-11-18 12:53:52,131 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11350, loss[loss=0.06108, simple_loss=0.05873, pruned_loss=0.01804, audio_tagging_loss=0.01367, over 14877.00 frames. ], tot_loss[loss=0.1127, simple_loss=0.1236, pruned_loss=0.03869, audio_tagging_loss=0.01219, over 3042202.61 frames. ], batch size: 59, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:54:06,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=236033.33333333334, ans=0.1 2023-11-18 12:54:12,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=15.0 2023-11-18 12:54:21,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=236100.0, ans=0.1 2023-11-18 12:54:21,992 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.976e+01 9.667e+01 1.093e+02 1.238e+02 1.995e+02, threshold=2.185e+02, percent-clipped=0.0 2023-11-18 12:54:48,567 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11400, loss[loss=0.1119, simple_loss=0.1188, pruned_loss=0.04129, audio_tagging_loss=0.01122, over 14677.00 frames. ], tot_loss[loss=0.1141, simple_loss=0.1252, pruned_loss=0.03946, audio_tagging_loss=0.01201, over 3041112.99 frames. ], batch size: 54, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:54:58,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2023-11-18 12:55:31,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=236566.66666666666, ans=0.1 2023-11-18 12:55:37,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=236566.66666666666, ans=0.1 2023-11-18 12:55:43,121 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11450, loss[loss=0.1043, simple_loss=0.1192, pruned_loss=0.03479, audio_tagging_loss=0.00986, over 15104.00 frames. ], tot_loss[loss=0.113, simple_loss=0.124, pruned_loss=0.03911, audio_tagging_loss=0.01189, over 3036228.61 frames. ], batch size: 57, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:56:05,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=236766.66666666666, ans=0.125 2023-11-18 12:56:12,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=236766.66666666666, ans=0.125 2023-11-18 12:56:13,499 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 9.179e+01 9.957e+01 1.104e+02 1.348e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-18 12:56:17,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=236833.33333333334, ans=0.125 2023-11-18 12:56:20,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=236833.33333333334, ans=0.125 2023-11-18 12:56:22,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2023-11-18 12:56:23,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=236833.33333333334, ans=0.125 2023-11-18 12:56:28,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=236900.0, ans=0.2 2023-11-18 12:56:38,365 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11500, loss[loss=0.1226, simple_loss=0.142, pruned_loss=0.04359, audio_tagging_loss=0.008057, over 15393.00 frames. ], tot_loss[loss=0.1129, simple_loss=0.1238, pruned_loss=0.03901, audio_tagging_loss=0.01192, over 3042588.92 frames. ], batch size: 59, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:56:49,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=237033.33333333334, ans=0.2 2023-11-18 12:56:54,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=15.0 2023-11-18 12:56:59,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=237033.33333333334, ans=0.0 2023-11-18 12:56:59,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2023-11-18 12:57:33,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=237300.0, ans=0.07 2023-11-18 12:57:35,028 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11550, loss[loss=0.09751, simple_loss=0.1003, pruned_loss=0.03193, audio_tagging_loss=0.01542, over 15367.00 frames. ], tot_loss[loss=0.1124, simple_loss=0.1233, pruned_loss=0.0388, audio_tagging_loss=0.01198, over 3037458.62 frames. ], batch size: 61, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 12:57:52,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=237366.66666666666, ans=0.125 2023-11-18 12:57:57,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=237433.33333333334, ans=0.125 2023-11-18 12:58:04,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 9.411e+01 1.015e+02 1.135e+02 1.692e+02, threshold=2.029e+02, percent-clipped=0.0 2023-11-18 12:58:08,070 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:58:17,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=237500.0, ans=0.1 2023-11-18 12:58:22,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=237566.66666666666, ans=0.125 2023-11-18 12:58:30,019 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11600, loss[loss=0.138, simple_loss=0.1458, pruned_loss=0.05261, audio_tagging_loss=0.01248, over 14854.00 frames. ], tot_loss[loss=0.1125, simple_loss=0.1236, pruned_loss=0.03872, audio_tagging_loss=0.012, over 3038609.89 frames. ], batch size: 56, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 12:58:52,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=237766.66666666666, ans=0.015 2023-11-18 12:58:54,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=237766.66666666666, ans=12.0 2023-11-18 12:58:59,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=237766.66666666666, ans=0.125 2023-11-18 12:59:03,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2023-11-18 12:59:05,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237833.33333333334, ans=0.1 2023-11-18 12:59:08,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237833.33333333334, ans=0.1 2023-11-18 12:59:12,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=237833.33333333334, ans=0.015 2023-11-18 12:59:25,495 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11650, loss[loss=0.1463, simple_loss=0.1713, pruned_loss=0.0501, audio_tagging_loss=0.01057, over 15536.00 frames. ], tot_loss[loss=0.1127, simple_loss=0.1238, pruned_loss=0.0387, audio_tagging_loss=0.01207, over 3038277.20 frames. ], batch size: 56, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 12:59:30,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=237966.66666666666, ans=0.0 2023-11-18 12:59:30,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=237966.66666666666, ans=0.125 2023-11-18 12:59:42,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=238033.33333333334, ans=0.0 2023-11-18 12:59:55,508 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 9.411e+01 1.037e+02 1.164e+02 1.752e+02, threshold=2.075e+02, percent-clipped=0.0 2023-11-18 13:00:08,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=238233.33333333334, ans=0.125 2023-11-18 13:00:17,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=238233.33333333334, ans=0.0 2023-11-18 13:00:20,933 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11700, loss[loss=0.1348, simple_loss=0.1451, pruned_loss=0.05101, audio_tagging_loss=0.01122, over 14528.00 frames. ], tot_loss[loss=0.1139, simple_loss=0.1251, pruned_loss=0.03926, audio_tagging_loss=0.01206, over 3039996.20 frames. ], batch size: 57, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:00:28,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=238300.0, ans=0.125 2023-11-18 13:00:57,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=15.0 2023-11-18 13:00:59,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=238500.0, ans=0.125 2023-11-18 13:01:11,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2023-11-18 13:01:14,750 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.23 vs. limit=6.0 2023-11-18 13:01:16,330 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11750, loss[loss=0.0982, simple_loss=0.1185, pruned_loss=0.02773, audio_tagging_loss=0.01124, over 15618.00 frames. ], tot_loss[loss=0.1142, simple_loss=0.1253, pruned_loss=0.03938, audio_tagging_loss=0.01214, over 3044171.41 frames. ], batch size: 57, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:01:23,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=238633.33333333334, ans=0.125 2023-11-18 13:01:35,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=238700.0, ans=0.125 2023-11-18 13:01:35,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2023-11-18 13:01:37,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=238766.66666666666, ans=0.125 2023-11-18 13:01:46,324 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.614e+01 1.047e+02 1.164e+02 1.458e+02 1.909e+02, threshold=2.328e+02, percent-clipped=0.0 2023-11-18 13:01:51,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=238833.33333333334, ans=0.2 2023-11-18 13:01:51,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=238833.33333333334, ans=0.125 2023-11-18 13:01:54,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=238833.33333333334, ans=0.125 2023-11-18 13:02:07,688 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.92 vs. limit=15.0 2023-11-18 13:02:11,166 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11800, loss[loss=0.1021, simple_loss=0.1124, pruned_loss=0.03506, audio_tagging_loss=0.01084, over 15636.00 frames. ], tot_loss[loss=0.1131, simple_loss=0.124, pruned_loss=0.03889, audio_tagging_loss=0.0122, over 3043240.80 frames. ], batch size: 57, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:02:19,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=238966.66666666666, ans=0.125 2023-11-18 13:02:24,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=239033.33333333334, ans=0.125 2023-11-18 13:02:32,073 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.327e-01 2023-11-18 13:02:36,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=239100.0, ans=0.125 2023-11-18 13:02:44,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239166.66666666666, ans=0.1 2023-11-18 13:03:07,105 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11850, loss[loss=0.1085, simple_loss=0.1177, pruned_loss=0.03755, audio_tagging_loss=0.01211, over 15003.00 frames. ], tot_loss[loss=0.1128, simple_loss=0.124, pruned_loss=0.03861, audio_tagging_loss=0.01217, over 3043125.95 frames. ], batch size: 55, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:03:09,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=239300.0, ans=0.0 2023-11-18 13:03:09,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=239300.0, ans=0.125 2023-11-18 13:03:11,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=239300.0, ans=0.0 2023-11-18 13:03:17,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=239366.66666666666, ans=0.125 2023-11-18 13:03:27,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=239366.66666666666, ans=0.07 2023-11-18 13:03:28,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=239433.33333333334, ans=0.1 2023-11-18 13:03:29,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=239433.33333333334, ans=0.0 2023-11-18 13:03:36,864 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.964e+01 9.657e+01 1.079e+02 1.246e+02 1.721e+02, threshold=2.158e+02, percent-clipped=0.0 2023-11-18 13:04:02,555 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11900, loss[loss=0.1193, simple_loss=0.1314, pruned_loss=0.04317, audio_tagging_loss=0.0104, over 15254.00 frames. ], tot_loss[loss=0.1131, simple_loss=0.1244, pruned_loss=0.03872, audio_tagging_loss=0.01218, over 3048310.71 frames. ], batch size: 56, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:04:02,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=239633.33333333334, ans=0.125 2023-11-18 13:04:29,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=239766.66666666666, ans=0.0 2023-11-18 13:04:37,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2023-11-18 13:04:47,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=239900.0, ans=0.125 2023-11-18 13:04:51,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=239900.0, ans=0.0 2023-11-18 13:04:57,156 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 11950, loss[loss=0.1275, simple_loss=0.1411, pruned_loss=0.04441, audio_tagging_loss=0.01251, over 14984.00 frames. ], tot_loss[loss=0.1137, simple_loss=0.1247, pruned_loss=0.03904, audio_tagging_loss=0.01231, over 3042148.84 frames. ], batch size: 56, lr: 1.89e-02, grad_scale: 32.0 2023-11-18 13:05:01,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=239966.66666666666, ans=0.125 2023-11-18 13:05:05,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=239966.66666666666, ans=0.2 2023-11-18 13:05:29,706 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 9.537e+01 1.099e+02 1.271e+02 1.974e+02, threshold=2.199e+02, percent-clipped=0.0 2023-11-18 13:05:32,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=240166.66666666666, ans=0.0 2023-11-18 13:05:34,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=240166.66666666666, ans=0.125 2023-11-18 13:05:43,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=240233.33333333334, ans=0.2 2023-11-18 13:05:46,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.08 vs. limit=15.0 2023-11-18 13:05:53,168 INFO [train_asr.py:1115] (1/4) Epoch 3, batch 12000, loss[loss=0.09465, simple_loss=0.108, pruned_loss=0.03029, audio_tagging_loss=0.01037, over 15574.00 frames. ], tot_loss[loss=0.1134, simple_loss=0.1241, pruned_loss=0.03885, audio_tagging_loss=0.01245, over 3036019.60 frames. ], batch size: 60, lr: 1.89e-02, grad_scale: 32.0 2023-11-18 13:05:53,169 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 13:06:26,323 INFO [train_asr.py:1147] (1/4) Epoch 3, validation: loss=0.07855, simple_loss=0.06384, pruned_loss=0.01132, audio_tagging_loss=0.03531, over 4681554.00 frames. 2023-11-18 13:06:26,324 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 13:06:31,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2023-11-18 13:06:48,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.64 vs. limit=22.5 2023-11-18 13:07:28,603 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 0, loss[loss=0.1023, simple_loss=0.09422, pruned_loss=0.02595, audio_tagging_loss=0.02927, over 15388.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.09422, pruned_loss=0.02595, audio_tagging_loss=0.02927, over 15388.00 frames. ], batch size: 61, lr: 1.77e-02, grad_scale: 32.0 2023-11-18 13:07:28,604 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 13:08:00,451 INFO [train_asr.py:1147] (1/4) Epoch 4, validation: loss=0.07694, simple_loss=0.06378, pruned_loss=0.01116, audio_tagging_loss=0.03389, over 4681554.00 frames. 2023-11-18 13:08:00,452 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 13:08:14,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=240520.0, ans=0.125 2023-11-18 13:08:23,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=240586.66666666666, ans=0.125 2023-11-18 13:08:35,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=240653.33333333334, ans=0.0 2023-11-18 13:08:55,949 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 50, loss[loss=0.1019, simple_loss=0.1081, pruned_loss=0.02829, audio_tagging_loss=0.01951, over 15708.00 frames. ], tot_loss[loss=0.125, simple_loss=0.1253, pruned_loss=0.03885, audio_tagging_loss=0.02354, over 690334.00 frames. ], batch size: 58, lr: 1.77e-02, grad_scale: 32.0 2023-11-18 13:08:56,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=240786.66666666666, ans=0.2 2023-11-18 13:09:00,235 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.826e+01 9.935e+01 1.154e+02 1.332e+02 1.872e+02, threshold=2.308e+02, percent-clipped=0.0 2023-11-18 13:09:05,272 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.162e+00 2023-11-18 13:09:06,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=240853.33333333334, ans=0.125 2023-11-18 13:09:26,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=240920.0, ans=0.0 2023-11-18 13:09:39,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=241053.33333333334, ans=0.125 2023-11-18 13:09:40,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=241053.33333333334, ans=0.2 2023-11-18 13:09:41,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=241053.33333333334, ans=0.125 2023-11-18 13:09:43,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=241053.33333333334, ans=0.125 2023-11-18 13:09:46,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=241053.33333333334, ans=0.0 2023-11-18 13:09:52,226 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 100, loss[loss=0.1291, simple_loss=0.1528, pruned_loss=0.03854, audio_tagging_loss=0.01414, over 17667.00 frames. ], tot_loss[loss=0.1233, simple_loss=0.1248, pruned_loss=0.03836, audio_tagging_loss=0.02253, over 1209576.01 frames. ], batch size: 64, lr: 1.77e-02, grad_scale: 32.0 2023-11-18 13:09:55,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=241120.0, ans=0.0 2023-11-18 13:10:10,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=241186.66666666666, ans=0.125 2023-11-18 13:10:28,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=241320.0, ans=15.0 2023-11-18 13:10:31,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.40 vs. limit=22.5 2023-11-18 13:10:38,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=241386.66666666666, ans=0.5 2023-11-18 13:10:39,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2023-11-18 13:10:47,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=241453.33333333334, ans=0.1 2023-11-18 13:10:48,255 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 150, loss[loss=0.09594, simple_loss=0.1054, pruned_loss=0.02901, audio_tagging_loss=0.01426, over 14752.00 frames. ], tot_loss[loss=0.1203, simple_loss=0.125, pruned_loss=0.03792, audio_tagging_loss=0.01986, over 1620202.01 frames. ], batch size: 55, lr: 1.77e-02, grad_scale: 32.0 2023-11-18 13:10:48,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=241453.33333333334, ans=0.125 2023-11-18 13:10:51,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.64 vs. limit=12.0 2023-11-18 13:10:52,413 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 9.521e+01 1.016e+02 1.130e+02 1.451e+02, threshold=2.033e+02, percent-clipped=0.0 2023-11-18 13:11:02,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=241520.0, ans=0.09899494936611666 2023-11-18 13:11:23,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=241653.33333333334, ans=0.0 2023-11-18 13:11:32,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=241720.0, ans=0.0 2023-11-18 13:11:36,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=241720.0, ans=0.2 2023-11-18 13:11:37,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=241720.0, ans=0.125 2023-11-18 13:11:44,091 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 200, loss[loss=0.1325, simple_loss=0.1497, pruned_loss=0.05071, audio_tagging_loss=0.006921, over 15168.00 frames. ], tot_loss[loss=0.1195, simple_loss=0.1269, pruned_loss=0.03877, audio_tagging_loss=0.01733, over 1934375.87 frames. ], batch size: 56, lr: 1.76e-02, grad_scale: 32.0 2023-11-18 13:11:53,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=241786.66666666666, ans=0.0 2023-11-18 13:11:59,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=241853.33333333334, ans=0.1 2023-11-18 13:12:10,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=241920.0, ans=0.125 2023-11-18 13:12:13,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=241920.0, ans=0.0 2023-11-18 13:12:24,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.09 vs. limit=22.5 2023-11-18 13:12:31,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2023-11-18 13:12:40,379 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 250, loss[loss=0.1301, simple_loss=0.1496, pruned_loss=0.04644, audio_tagging_loss=0.008901, over 14707.00 frames. ], tot_loss[loss=0.1175, simple_loss=0.1265, pruned_loss=0.03865, audio_tagging_loss=0.01554, over 2177159.55 frames. ], batch size: 54, lr: 1.76e-02, grad_scale: 16.0 2023-11-18 13:12:44,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=242120.0, ans=0.125 2023-11-18 13:12:45,587 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 9.626e+01 1.050e+02 1.196e+02 1.667e+02, threshold=2.101e+02, percent-clipped=0.0 2023-11-18 13:12:47,255 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.12 vs. limit=22.5 2023-11-18 13:12:51,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=242186.66666666666, ans=0.125 2023-11-18 13:13:24,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=242386.66666666666, ans=0.125 2023-11-18 13:13:35,857 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 300, loss[loss=0.07528, simple_loss=0.08583, pruned_loss=0.02028, audio_tagging_loss=0.01208, over 15442.00 frames. ], tot_loss[loss=0.1152, simple_loss=0.1251, pruned_loss=0.03821, audio_tagging_loss=0.01443, over 2371881.78 frames. ], batch size: 58, lr: 1.76e-02, grad_scale: 16.0 2023-11-18 13:13:37,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.30 vs. limit=22.5 2023-11-18 13:13:53,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=242520.0, ans=0.125 2023-11-18 13:14:16,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=242653.33333333334, ans=0.125 2023-11-18 13:14:31,227 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 350, loss[loss=0.113, simple_loss=0.128, pruned_loss=0.03688, audio_tagging_loss=0.01212, over 15708.00 frames. ], tot_loss[loss=0.1136, simple_loss=0.1241, pruned_loss=0.03778, audio_tagging_loss=0.01378, over 2527546.00 frames. ], batch size: 58, lr: 1.76e-02, grad_scale: 16.0 2023-11-18 13:14:37,029 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.919e+01 9.705e+01 1.099e+02 1.261e+02 1.880e+02, threshold=2.197e+02, percent-clipped=0.0 2023-11-18 13:14:39,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=12.0 2023-11-18 13:15:27,799 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 400, loss[loss=0.09064, simple_loss=0.09748, pruned_loss=0.02918, audio_tagging_loss=0.01272, over 14536.00 frames. ], tot_loss[loss=0.1133, simple_loss=0.1245, pruned_loss=0.03779, audio_tagging_loss=0.01323, over 2654077.14 frames. ], batch size: 57, lr: 1.76e-02, grad_scale: 32.0 2023-11-18 13:15:35,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=243120.0, ans=0.125 2023-11-18 13:16:02,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=243320.0, ans=0.125 2023-11-18 13:16:22,997 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 450, loss[loss=0.1163, simple_loss=0.1468, pruned_loss=0.0355, audio_tagging_loss=0.007432, over 15588.00 frames. ], tot_loss[loss=0.1137, simple_loss=0.1256, pruned_loss=0.03816, audio_tagging_loss=0.01271, over 2742500.20 frames. ], batch size: 56, lr: 1.76e-02, grad_scale: 32.0 2023-11-18 13:16:26,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=243453.33333333334, ans=0.125 2023-11-18 13:16:28,335 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 9.241e+01 1.029e+02 1.146e+02 1.664e+02, threshold=2.058e+02, percent-clipped=0.0 2023-11-18 13:16:28,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=243453.33333333334, ans=0.0 2023-11-18 13:16:33,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=243520.0, ans=0.0 2023-11-18 13:17:01,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=243653.33333333334, ans=0.09899494936611666 2023-11-18 13:17:02,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=243653.33333333334, ans=0.125 2023-11-18 13:17:03,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=15.0 2023-11-18 13:17:18,760 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 500, loss[loss=0.1212, simple_loss=0.1236, pruned_loss=0.0467, audio_tagging_loss=0.01265, over 15875.00 frames. ], tot_loss[loss=0.1132, simple_loss=0.125, pruned_loss=0.03811, audio_tagging_loss=0.01257, over 2805218.28 frames. ], batch size: 60, lr: 1.76e-02, grad_scale: 16.0 2023-11-18 13:18:03,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=244053.33333333334, ans=0.1 2023-11-18 13:18:07,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=244053.33333333334, ans=0.0 2023-11-18 13:18:15,517 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 550, loss[loss=0.08993, simple_loss=0.1025, pruned_loss=0.02615, audio_tagging_loss=0.01253, over 15185.00 frames. ], tot_loss[loss=0.1116, simple_loss=0.123, pruned_loss=0.0376, audio_tagging_loss=0.01253, over 2856207.69 frames. ], batch size: 56, lr: 1.76e-02, grad_scale: 8.0 2023-11-18 13:18:19,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=244120.0, ans=0.125 2023-11-18 13:18:20,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=244120.0, ans=0.125 2023-11-18 13:18:21,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=244120.0, ans=0.125 2023-11-18 13:18:23,484 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 9.469e+01 1.045e+02 1.178e+02 1.805e+02, threshold=2.090e+02, percent-clipped=0.0 2023-11-18 13:18:46,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=244253.33333333334, ans=22.5 2023-11-18 13:18:51,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=244320.0, ans=0.0 2023-11-18 13:19:04,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=244386.66666666666, ans=0.125 2023-11-18 13:19:11,336 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 600, loss[loss=0.133, simple_loss=0.1444, pruned_loss=0.05026, audio_tagging_loss=0.01054, over 15103.00 frames. ], tot_loss[loss=0.111, simple_loss=0.1222, pruned_loss=0.03751, audio_tagging_loss=0.01242, over 2896118.60 frames. ], batch size: 57, lr: 1.76e-02, grad_scale: 8.0 2023-11-18 13:19:16,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=244453.33333333334, ans=0.125 2023-11-18 13:19:21,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2023-11-18 13:19:36,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=244586.66666666666, ans=0.2 2023-11-18 13:19:44,061 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=15.0 2023-11-18 13:19:49,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=244653.33333333334, ans=0.125 2023-11-18 13:19:50,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=244653.33333333334, ans=0.125 2023-11-18 13:19:52,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=244653.33333333334, ans=0.0 2023-11-18 13:19:53,443 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2023-11-18 13:19:55,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=244720.0, ans=0.125 2023-11-18 13:20:06,676 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 650, loss[loss=0.1112, simple_loss=0.1215, pruned_loss=0.03597, audio_tagging_loss=0.01442, over 15678.00 frames. ], tot_loss[loss=0.1114, simple_loss=0.1227, pruned_loss=0.03759, audio_tagging_loss=0.01242, over 2933632.89 frames. ], batch size: 58, lr: 1.75e-02, grad_scale: 8.0 2023-11-18 13:20:15,226 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 9.493e+01 1.068e+02 1.179e+02 1.760e+02, threshold=2.137e+02, percent-clipped=0.0 2023-11-18 13:20:16,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2023-11-18 13:20:18,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=244853.33333333334, ans=0.0 2023-11-18 13:20:22,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2023-11-18 13:20:35,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=15.0 2023-11-18 13:20:52,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2023-11-18 13:21:03,272 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 700, loss[loss=0.1069, simple_loss=0.122, pruned_loss=0.03523, audio_tagging_loss=0.01065, over 15317.00 frames. ], tot_loss[loss=0.1116, simple_loss=0.124, pruned_loss=0.03743, audio_tagging_loss=0.01219, over 2964191.85 frames. ], batch size: 57, lr: 1.75e-02, grad_scale: 8.0 2023-11-18 13:21:13,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=245186.66666666666, ans=0.0 2023-11-18 13:21:47,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=245386.66666666666, ans=0.125 2023-11-18 13:21:48,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=245386.66666666666, ans=0.125 2023-11-18 13:21:59,700 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 750, loss[loss=0.06142, simple_loss=0.05251, pruned_loss=0.02051, audio_tagging_loss=0.01465, over 15245.00 frames. ], tot_loss[loss=0.1117, simple_loss=0.1238, pruned_loss=0.03757, audio_tagging_loss=0.01218, over 2979010.02 frames. ], batch size: 60, lr: 1.75e-02, grad_scale: 8.0 2023-11-18 13:22:07,024 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.458e+01 1.066e+02 1.214e+02 1.611e+02, threshold=2.132e+02, percent-clipped=0.0 2023-11-18 13:22:10,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=245520.0, ans=0.2 2023-11-18 13:22:26,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=245586.66666666666, ans=0.125 2023-11-18 13:22:34,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=245653.33333333334, ans=0.5 2023-11-18 13:22:39,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2023-11-18 13:22:40,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=245653.33333333334, ans=0.0 2023-11-18 13:22:54,557 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 800, loss[loss=0.124, simple_loss=0.1409, pruned_loss=0.04141, audio_tagging_loss=0.01212, over 15480.00 frames. ], tot_loss[loss=0.1118, simple_loss=0.124, pruned_loss=0.03763, audio_tagging_loss=0.01221, over 2995654.45 frames. ], batch size: 60, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:23:19,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=245920.0, ans=0.125 2023-11-18 13:23:23,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=245920.0, ans=0.125 2023-11-18 13:23:24,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.15 vs. limit=15.0 2023-11-18 13:23:32,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.99 vs. limit=10.0 2023-11-18 13:23:45,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=246053.33333333334, ans=0.125 2023-11-18 13:23:47,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=246053.33333333334, ans=0.125 2023-11-18 13:23:50,688 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 850, loss[loss=0.08382, simple_loss=0.08698, pruned_loss=0.02466, audio_tagging_loss=0.01567, over 14974.00 frames. ], tot_loss[loss=0.1127, simple_loss=0.1249, pruned_loss=0.03802, audio_tagging_loss=0.01221, over 3008285.62 frames. ], batch size: 56, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:23:59,081 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.134e+01 9.530e+01 1.051e+02 1.203e+02 1.738e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-18 13:24:11,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=246186.66666666666, ans=0.0 2023-11-18 13:24:13,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=246253.33333333334, ans=0.125 2023-11-18 13:24:17,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=246253.33333333334, ans=0.1 2023-11-18 13:24:22,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=246320.0, ans=0.125 2023-11-18 13:24:27,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=246320.0, ans=0.125 2023-11-18 13:24:36,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=246386.66666666666, ans=0.2 2023-11-18 13:24:47,012 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 900, loss[loss=0.1748, simple_loss=0.2009, pruned_loss=0.06367, audio_tagging_loss=0.01071, over 15772.00 frames. ], tot_loss[loss=0.1126, simple_loss=0.1248, pruned_loss=0.03793, audio_tagging_loss=0.01227, over 3019222.25 frames. ], batch size: 56, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:25:03,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=246520.0, ans=0.025 2023-11-18 13:25:04,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=246520.0, ans=0.0 2023-11-18 13:25:14,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=246586.66666666666, ans=0.04949747468305833 2023-11-18 13:25:37,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=246720.0, ans=0.1 2023-11-18 13:25:42,393 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 950, loss[loss=0.1127, simple_loss=0.1237, pruned_loss=0.03789, audio_tagging_loss=0.01299, over 15212.00 frames. ], tot_loss[loss=0.1128, simple_loss=0.1252, pruned_loss=0.03804, audio_tagging_loss=0.0121, over 3025437.55 frames. ], batch size: 56, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:25:49,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 9.343e+01 1.032e+02 1.151e+02 2.313e+02, threshold=2.063e+02, percent-clipped=1.0 2023-11-18 13:26:20,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=246986.66666666666, ans=0.125 2023-11-18 13:26:26,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=247053.33333333334, ans=0.0 2023-11-18 13:26:37,925 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1000, loss[loss=0.1238, simple_loss=0.1303, pruned_loss=0.04364, audio_tagging_loss=0.01502, over 16070.00 frames. ], tot_loss[loss=0.1124, simple_loss=0.1252, pruned_loss=0.03792, audio_tagging_loss=0.01186, over 3021265.65 frames. ], batch size: 59, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:26:40,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2023-11-18 13:26:56,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.18 vs. limit=15.0 2023-11-18 13:26:58,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-11-18 13:27:00,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=247253.33333333334, ans=0.0 2023-11-18 13:27:01,675 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:27:33,825 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1050, loss[loss=0.123, simple_loss=0.1406, pruned_loss=0.04327, audio_tagging_loss=0.009459, over 15072.00 frames. ], tot_loss[loss=0.1118, simple_loss=0.1244, pruned_loss=0.03779, audio_tagging_loss=0.01178, over 3025936.23 frames. ], batch size: 56, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:27:41,177 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.026e+01 9.797e+01 1.106e+02 1.274e+02 2.848e+02, threshold=2.212e+02, percent-clipped=1.0 2023-11-18 13:27:48,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=247520.0, ans=0.125 2023-11-18 13:28:00,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=247586.66666666666, ans=0.0 2023-11-18 13:28:03,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=247586.66666666666, ans=0.07 2023-11-18 13:28:04,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=247586.66666666666, ans=0.0 2023-11-18 13:28:13,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=247653.33333333334, ans=0.125 2023-11-18 13:28:20,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=247720.0, ans=0.125 2023-11-18 13:28:28,417 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1100, loss[loss=0.1407, simple_loss=0.1543, pruned_loss=0.05333, audio_tagging_loss=0.01026, over 14328.00 frames. ], tot_loss[loss=0.1115, simple_loss=0.1238, pruned_loss=0.03786, audio_tagging_loss=0.01174, over 3025036.13 frames. ], batch size: 55, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:28:30,576 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:28:30,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=247786.66666666666, ans=0.04949747468305833 2023-11-18 13:28:49,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=247920.0, ans=0.2 2023-11-18 13:28:55,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=247920.0, ans=0.2 2023-11-18 13:29:12,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=248053.33333333334, ans=0.125 2023-11-18 13:29:24,438 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1150, loss[loss=0.1054, simple_loss=0.1287, pruned_loss=0.03138, audio_tagging_loss=0.0097, over 16758.00 frames. ], tot_loss[loss=0.1124, simple_loss=0.1253, pruned_loss=0.03824, audio_tagging_loss=0.01155, over 3040513.81 frames. ], batch size: 62, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:29:31,759 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 9.396e+01 1.043e+02 1.149e+02 1.593e+02, threshold=2.087e+02, percent-clipped=0.0 2023-11-18 13:29:32,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=248120.0, ans=0.0 2023-11-18 13:29:38,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=248186.66666666666, ans=0.0 2023-11-18 13:29:43,048 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:29:45,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=248186.66666666666, ans=0.95 2023-11-18 13:29:52,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=248253.33333333334, ans=0.0 2023-11-18 13:30:18,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=248386.66666666666, ans=0.1 2023-11-18 13:30:19,697 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:30:21,171 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1200, loss[loss=0.09355, simple_loss=0.0958, pruned_loss=0.03221, audio_tagging_loss=0.01344, over 14873.00 frames. ], tot_loss[loss=0.112, simple_loss=0.1246, pruned_loss=0.03814, audio_tagging_loss=0.01155, over 3037696.40 frames. ], batch size: 57, lr: 1.74e-02, grad_scale: 32.0 2023-11-18 13:30:34,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=248520.0, ans=0.125 2023-11-18 13:30:49,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=248586.66666666666, ans=0.125 2023-11-18 13:30:57,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=248653.33333333334, ans=0.1 2023-11-18 13:31:05,059 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=12.0 2023-11-18 13:31:10,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=248720.0, ans=0.95 2023-11-18 13:31:10,959 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:31:12,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=248720.0, ans=0.125 2023-11-18 13:31:16,181 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1250, loss[loss=0.1106, simple_loss=0.1269, pruned_loss=0.03663, audio_tagging_loss=0.01053, over 14744.00 frames. ], tot_loss[loss=0.1103, simple_loss=0.1224, pruned_loss=0.03746, audio_tagging_loss=0.01165, over 3038202.59 frames. ], batch size: 56, lr: 1.74e-02, grad_scale: 32.0 2023-11-18 13:31:20,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.83 vs. limit=22.5 2023-11-18 13:31:23,567 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 9.534e+01 1.061e+02 1.217e+02 1.836e+02, threshold=2.122e+02, percent-clipped=0.0 2023-11-18 13:31:23,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=248786.66666666666, ans=0.09899494936611666 2023-11-18 13:31:30,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.69 vs. limit=22.5 2023-11-18 13:31:39,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=248920.0, ans=0.125 2023-11-18 13:31:57,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=248986.66666666666, ans=0.0 2023-11-18 13:31:58,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=248986.66666666666, ans=0.125 2023-11-18 13:32:11,683 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1300, loss[loss=0.1435, simple_loss=0.1709, pruned_loss=0.05133, audio_tagging_loss=0.006702, over 15483.00 frames. ], tot_loss[loss=0.1105, simple_loss=0.1228, pruned_loss=0.03742, audio_tagging_loss=0.01166, over 3034678.23 frames. ], batch size: 56, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:32:18,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=249120.0, ans=0.125 2023-11-18 13:32:20,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=249120.0, ans=0.0 2023-11-18 13:32:32,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=249186.66666666666, ans=0.09899494936611666 2023-11-18 13:32:38,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.06 vs. limit=22.5 2023-11-18 13:33:08,239 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1350, loss[loss=0.1004, simple_loss=0.1071, pruned_loss=0.03266, audio_tagging_loss=0.01416, over 14054.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.1228, pruned_loss=0.03718, audio_tagging_loss=0.01163, over 3027739.50 frames. ], batch size: 53, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:33:12,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.71 vs. limit=10.0 2023-11-18 13:33:17,322 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.519e+01 9.738e+01 1.103e+02 1.190e+02 1.796e+02, threshold=2.206e+02, percent-clipped=0.0 2023-11-18 13:33:24,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=249520.0, ans=0.125 2023-11-18 13:33:28,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=249520.0, ans=0.125 2023-11-18 13:33:35,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=249586.66666666666, ans=0.125 2023-11-18 13:33:41,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=249653.33333333334, ans=0.1 2023-11-18 13:33:47,412 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:33:52,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=249720.0, ans=0.0 2023-11-18 13:33:54,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2023-11-18 13:34:04,458 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1400, loss[loss=0.1213, simple_loss=0.1352, pruned_loss=0.04457, audio_tagging_loss=0.00915, over 15951.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.1228, pruned_loss=0.03709, audio_tagging_loss=0.01171, over 3034020.00 frames. ], batch size: 59, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:34:13,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=249786.66666666666, ans=0.125 2023-11-18 13:34:17,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=249853.33333333334, ans=0.125 2023-11-18 13:34:20,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2023-11-18 13:34:28,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=249920.0, ans=0.0 2023-11-18 13:35:00,079 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1450, loss[loss=0.1199, simple_loss=0.1356, pruned_loss=0.03971, audio_tagging_loss=0.01243, over 15563.00 frames. ], tot_loss[loss=0.1107, simple_loss=0.123, pruned_loss=0.03744, audio_tagging_loss=0.0118, over 3033826.51 frames. ], batch size: 57, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:35:09,004 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 9.516e+01 1.029e+02 1.105e+02 1.571e+02, threshold=2.057e+02, percent-clipped=0.0 2023-11-18 13:35:11,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.44 vs. limit=15.0 2023-11-18 13:35:14,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=250186.66666666666, ans=0.2 2023-11-18 13:35:56,372 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1500, loss[loss=0.1193, simple_loss=0.1336, pruned_loss=0.0403, audio_tagging_loss=0.0122, over 14976.00 frames. ], tot_loss[loss=0.1126, simple_loss=0.1251, pruned_loss=0.03815, audio_tagging_loss=0.01187, over 3037868.71 frames. ], batch size: 55, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:36:06,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.69 vs. limit=15.0 2023-11-18 13:36:13,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=250520.0, ans=0.125 2023-11-18 13:36:16,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=250520.0, ans=0.0 2023-11-18 13:36:29,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2023-11-18 13:36:33,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=250653.33333333334, ans=0.0 2023-11-18 13:36:36,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=250653.33333333334, ans=0.125 2023-11-18 13:36:44,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=250720.0, ans=0.1 2023-11-18 13:36:52,282 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1550, loss[loss=0.1072, simple_loss=0.1086, pruned_loss=0.03629, audio_tagging_loss=0.01659, over 14198.00 frames. ], tot_loss[loss=0.1119, simple_loss=0.1244, pruned_loss=0.03773, audio_tagging_loss=0.01193, over 3035659.30 frames. ], batch size: 54, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:36:59,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.73 vs. limit=15.0 2023-11-18 13:37:01,191 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 9.375e+01 1.072e+02 1.254e+02 1.823e+02, threshold=2.144e+02, percent-clipped=0.0 2023-11-18 13:37:06,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=250853.33333333334, ans=0.125 2023-11-18 13:37:09,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=250853.33333333334, ans=0.07 2023-11-18 13:37:10,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=250853.33333333334, ans=0.125 2023-11-18 13:37:13,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2023-11-18 13:37:34,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=250986.66666666666, ans=0.0 2023-11-18 13:37:47,454 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1600, loss[loss=0.164, simple_loss=0.1923, pruned_loss=0.05746, audio_tagging_loss=0.0104, over 15364.00 frames. ], tot_loss[loss=0.1119, simple_loss=0.1238, pruned_loss=0.03781, audio_tagging_loss=0.01221, over 3038801.89 frames. ], batch size: 55, lr: 1.73e-02, grad_scale: 32.0 2023-11-18 13:37:48,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.08 vs. limit=6.0 2023-11-18 13:37:48,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=251120.0, ans=0.0 2023-11-18 13:37:52,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=251120.0, ans=0.0 2023-11-18 13:37:52,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=251120.0, ans=0.0 2023-11-18 13:37:56,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=251120.0, ans=0.95 2023-11-18 13:37:59,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=251186.66666666666, ans=0.125 2023-11-18 13:38:12,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=251253.33333333334, ans=0.125 2023-11-18 13:38:34,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=251386.66666666666, ans=0.125 2023-11-18 13:38:43,340 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1650, loss[loss=0.102, simple_loss=0.1142, pruned_loss=0.03448, audio_tagging_loss=0.01043, over 14748.00 frames. ], tot_loss[loss=0.1117, simple_loss=0.1234, pruned_loss=0.0378, audio_tagging_loss=0.01218, over 3039983.93 frames. ], batch size: 54, lr: 1.73e-02, grad_scale: 32.0 2023-11-18 13:38:51,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.16 vs. limit=10.0 2023-11-18 13:38:52,826 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 9.946e+01 1.090e+02 1.261e+02 1.677e+02, threshold=2.181e+02, percent-clipped=0.0 2023-11-18 13:38:58,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=251520.0, ans=0.2 2023-11-18 13:39:35,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=251720.0, ans=0.2 2023-11-18 13:39:39,287 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1700, loss[loss=0.1192, simple_loss=0.1406, pruned_loss=0.0386, audio_tagging_loss=0.01027, over 16020.00 frames. ], tot_loss[loss=0.1108, simple_loss=0.1225, pruned_loss=0.03725, audio_tagging_loss=0.01226, over 3037728.38 frames. ], batch size: 60, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:39:39,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2023-11-18 13:39:56,121 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.51 vs. limit=22.5 2023-11-18 13:40:01,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=251920.0, ans=0.125 2023-11-18 13:40:15,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=251986.66666666666, ans=0.0 2023-11-18 13:40:23,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=252053.33333333334, ans=0.125 2023-11-18 13:40:24,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=252053.33333333334, ans=0.2 2023-11-18 13:40:35,220 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1750, loss[loss=0.1066, simple_loss=0.1143, pruned_loss=0.04059, audio_tagging_loss=0.008894, over 15564.00 frames. ], tot_loss[loss=0.1104, simple_loss=0.1222, pruned_loss=0.03715, audio_tagging_loss=0.01217, over 3036128.57 frames. ], batch size: 57, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:40:38,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=252120.0, ans=0.125 2023-11-18 13:40:45,328 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 9.260e+01 1.013e+02 1.177e+02 1.598e+02, threshold=2.026e+02, percent-clipped=0.0 2023-11-18 13:41:07,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=252320.0, ans=0.125 2023-11-18 13:41:14,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=252320.0, ans=0.125 2023-11-18 13:41:31,151 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1800, loss[loss=0.1052, simple_loss=0.1207, pruned_loss=0.03537, audio_tagging_loss=0.009538, over 15097.00 frames. ], tot_loss[loss=0.1109, simple_loss=0.1233, pruned_loss=0.03736, audio_tagging_loss=0.01188, over 3040516.29 frames. ], batch size: 58, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:41:46,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=252520.0, ans=0.125 2023-11-18 13:41:50,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2023-11-18 13:41:54,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=252586.66666666666, ans=0.2 2023-11-18 13:42:19,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.64 vs. limit=10.0 2023-11-18 13:42:27,626 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1850, loss[loss=0.09898, simple_loss=0.1067, pruned_loss=0.03289, audio_tagging_loss=0.01275, over 15076.00 frames. ], tot_loss[loss=0.1117, simple_loss=0.1242, pruned_loss=0.03783, audio_tagging_loss=0.01172, over 3045680.07 frames. ], batch size: 56, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:42:37,088 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 9.907e+01 1.064e+02 1.171e+02 1.741e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-18 13:43:04,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=252986.66666666666, ans=0.1 2023-11-18 13:43:22,182 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1900, loss[loss=0.1102, simple_loss=0.1284, pruned_loss=0.03496, audio_tagging_loss=0.011, over 15380.00 frames. ], tot_loss[loss=0.1097, simple_loss=0.122, pruned_loss=0.03706, audio_tagging_loss=0.01161, over 3044441.34 frames. ], batch size: 57, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:43:35,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=253186.66666666666, ans=0.07 2023-11-18 13:43:37,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.45 vs. limit=15.0 2023-11-18 13:43:48,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=253253.33333333334, ans=0.125 2023-11-18 13:44:03,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=253320.0, ans=0.1 2023-11-18 13:44:07,357 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:44:18,708 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 1950, loss[loss=0.08151, simple_loss=0.08745, pruned_loss=0.0234, audio_tagging_loss=0.01438, over 15113.00 frames. ], tot_loss[loss=0.1097, simple_loss=0.1222, pruned_loss=0.03695, audio_tagging_loss=0.01165, over 3046184.31 frames. ], batch size: 58, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:44:21,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=253453.33333333334, ans=10.0 2023-11-18 13:44:28,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=253453.33333333334, ans=0.125 2023-11-18 13:44:29,457 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 9.223e+01 1.021e+02 1.142e+02 1.490e+02, threshold=2.042e+02, percent-clipped=0.0 2023-11-18 13:45:15,429 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2000, loss[loss=0.1027, simple_loss=0.1085, pruned_loss=0.03549, audio_tagging_loss=0.0129, over 15649.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1213, pruned_loss=0.0368, audio_tagging_loss=0.01172, over 3035575.93 frames. ], batch size: 59, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:45:55,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=253986.66666666666, ans=0.0 2023-11-18 13:45:59,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=254053.33333333334, ans=0.025 2023-11-18 13:46:10,804 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2050, loss[loss=0.09321, simple_loss=0.1018, pruned_loss=0.02834, audio_tagging_loss=0.01398, over 16025.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1222, pruned_loss=0.037, audio_tagging_loss=0.0117, over 3031788.90 frames. ], batch size: 61, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:46:21,821 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 9.338e+01 1.033e+02 1.135e+02 2.200e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 13:46:27,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=254186.66666666666, ans=0.035 2023-11-18 13:46:27,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=254186.66666666666, ans=0.125 2023-11-18 13:46:32,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=254253.33333333334, ans=0.1 2023-11-18 13:46:35,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=254253.33333333334, ans=0.125 2023-11-18 13:47:01,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=254386.66666666666, ans=0.125 2023-11-18 13:47:04,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=12.0 2023-11-18 13:47:06,234 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2100, loss[loss=0.09301, simple_loss=0.1047, pruned_loss=0.02761, audio_tagging_loss=0.01303, over 15334.00 frames. ], tot_loss[loss=0.1096, simple_loss=0.1217, pruned_loss=0.03696, audio_tagging_loss=0.01176, over 3033557.34 frames. ], batch size: 59, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:47:11,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=254453.33333333334, ans=0.125 2023-11-18 13:47:11,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=254453.33333333334, ans=0.0 2023-11-18 13:47:21,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=254520.0, ans=0.04949747468305833 2023-11-18 13:47:24,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=254520.0, ans=0.1 2023-11-18 13:47:31,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=254586.66666666666, ans=0.125 2023-11-18 13:47:42,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=254653.33333333334, ans=0.125 2023-11-18 13:47:56,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=254720.0, ans=0.2 2023-11-18 13:48:03,509 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2150, loss[loss=0.1152, simple_loss=0.1244, pruned_loss=0.03912, audio_tagging_loss=0.01382, over 15050.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1215, pruned_loss=0.03691, audio_tagging_loss=0.01171, over 3034855.25 frames. ], batch size: 55, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:48:13,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=254853.33333333334, ans=0.125 2023-11-18 13:48:13,405 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:48:14,130 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 9.557e+01 1.080e+02 1.239e+02 1.582e+02, threshold=2.161e+02, percent-clipped=1.0 2023-11-18 13:48:26,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=254920.0, ans=0.1 2023-11-18 13:48:36,064 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:48:57,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2023-11-18 13:48:58,276 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2200, loss[loss=0.1248, simple_loss=0.144, pruned_loss=0.04078, audio_tagging_loss=0.01203, over 15957.00 frames. ], tot_loss[loss=0.1095, simple_loss=0.1216, pruned_loss=0.03707, audio_tagging_loss=0.01167, over 3034431.31 frames. ], batch size: 57, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:49:01,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=255120.0, ans=0.125 2023-11-18 13:49:21,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=255253.33333333334, ans=0.2 2023-11-18 13:49:26,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=255253.33333333334, ans=0.125 2023-11-18 13:49:53,795 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2250, loss[loss=0.1178, simple_loss=0.1188, pruned_loss=0.04459, audio_tagging_loss=0.01378, over 15597.00 frames. ], tot_loss[loss=0.1115, simple_loss=0.124, pruned_loss=0.03787, audio_tagging_loss=0.0116, over 3032982.59 frames. ], batch size: 59, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:50:05,599 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 9.450e+01 1.063e+02 1.205e+02 1.681e+02, threshold=2.126e+02, percent-clipped=0.0 2023-11-18 13:50:09,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=255520.0, ans=0.125 2023-11-18 13:50:32,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=255653.33333333334, ans=0.1 2023-11-18 13:50:32,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=255653.33333333334, ans=0.0 2023-11-18 13:50:38,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=255720.0, ans=0.125 2023-11-18 13:50:41,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=255720.0, ans=0.0 2023-11-18 13:50:42,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=255720.0, ans=0.125 2023-11-18 13:50:50,930 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2300, loss[loss=0.1053, simple_loss=0.1167, pruned_loss=0.03445, audio_tagging_loss=0.01249, over 14601.00 frames. ], tot_loss[loss=0.1106, simple_loss=0.1229, pruned_loss=0.03752, audio_tagging_loss=0.01162, over 3028147.86 frames. ], batch size: 56, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:50:56,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=255786.66666666666, ans=0.04949747468305833 2023-11-18 13:51:01,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=255853.33333333334, ans=0.0 2023-11-18 13:51:20,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=255920.0, ans=0.09899494936611666 2023-11-18 13:51:24,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=255986.66666666666, ans=0.125 2023-11-18 13:51:24,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=255986.66666666666, ans=0.125 2023-11-18 13:51:24,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=255986.66666666666, ans=0.125 2023-11-18 13:51:39,954 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:51:40,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=256053.33333333334, ans=0.2 2023-11-18 13:51:43,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=256053.33333333334, ans=0.0 2023-11-18 13:51:46,285 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2350, loss[loss=0.1037, simple_loss=0.1166, pruned_loss=0.03722, audio_tagging_loss=0.008217, over 15657.00 frames. ], tot_loss[loss=0.1107, simple_loss=0.1227, pruned_loss=0.03761, audio_tagging_loss=0.01172, over 3028180.70 frames. ], batch size: 59, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:51:57,479 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.372e+01 1.028e+02 1.162e+02 1.776e+02, threshold=2.057e+02, percent-clipped=0.0 2023-11-18 13:52:06,178 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:52:14,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=256253.33333333334, ans=0.0 2023-11-18 13:52:34,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=256386.66666666666, ans=0.0 2023-11-18 13:52:42,247 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2400, loss[loss=0.08182, simple_loss=0.08733, pruned_loss=0.02497, audio_tagging_loss=0.01318, over 16435.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1206, pruned_loss=0.03684, audio_tagging_loss=0.01195, over 3034516.32 frames. ], batch size: 63, lr: 1.72e-02, grad_scale: 32.0 2023-11-18 13:52:46,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256453.33333333334, ans=0.1 2023-11-18 13:52:47,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=256453.33333333334, ans=0.125 2023-11-18 13:52:50,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=256453.33333333334, ans=0.2 2023-11-18 13:52:54,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=256520.0, ans=0.05 2023-11-18 13:52:58,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=256520.0, ans=0.2 2023-11-18 13:53:02,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=256520.0, ans=0.2 2023-11-18 13:53:13,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=256586.66666666666, ans=0.125 2023-11-18 13:53:17,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256653.33333333334, ans=0.1 2023-11-18 13:53:20,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=256653.33333333334, ans=0.0 2023-11-18 13:53:38,562 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2450, loss[loss=0.07482, simple_loss=0.07689, pruned_loss=0.02116, audio_tagging_loss=0.01521, over 15571.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1216, pruned_loss=0.03699, audio_tagging_loss=0.01199, over 3039251.59 frames. ], batch size: 59, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:53:49,525 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 9.544e+01 1.043e+02 1.156e+02 1.781e+02, threshold=2.086e+02, percent-clipped=0.0 2023-11-18 13:54:02,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=256920.0, ans=0.125 2023-11-18 13:54:04,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=256920.0, ans=0.0 2023-11-18 13:54:08,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=256920.0, ans=0.0 2023-11-18 13:54:09,250 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.209e+00 2023-11-18 13:54:14,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=256986.66666666666, ans=0.2 2023-11-18 13:54:20,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=256986.66666666666, ans=0.2 2023-11-18 13:54:25,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=257053.33333333334, ans=0.1 2023-11-18 13:54:27,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=257053.33333333334, ans=0.0 2023-11-18 13:54:31,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=257053.33333333334, ans=0.125 2023-11-18 13:54:33,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=257120.0, ans=0.2 2023-11-18 13:54:33,908 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2500, loss[loss=0.1166, simple_loss=0.1271, pruned_loss=0.04191, audio_tagging_loss=0.01121, over 15092.00 frames. ], tot_loss[loss=0.1111, simple_loss=0.1232, pruned_loss=0.03757, audio_tagging_loss=0.01197, over 3036775.33 frames. ], batch size: 57, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:54:39,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.52 vs. limit=15.0 2023-11-18 13:54:42,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=257120.0, ans=0.0 2023-11-18 13:54:54,139 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.18 vs. limit=22.5 2023-11-18 13:54:55,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=257253.33333333334, ans=0.125 2023-11-18 13:55:12,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=257320.0, ans=0.125 2023-11-18 13:55:15,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=257320.0, ans=0.125 2023-11-18 13:55:29,860 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2550, loss[loss=0.1291, simple_loss=0.1447, pruned_loss=0.04768, audio_tagging_loss=0.00905, over 15600.00 frames. ], tot_loss[loss=0.1111, simple_loss=0.1234, pruned_loss=0.03762, audio_tagging_loss=0.01181, over 3039482.44 frames. ], batch size: 55, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:55:32,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=257453.33333333334, ans=0.1 2023-11-18 13:55:40,569 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.163e+01 9.922e+01 1.114e+02 1.302e+02 1.822e+02, threshold=2.229e+02, percent-clipped=0.0 2023-11-18 13:56:04,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=12.0 2023-11-18 13:56:11,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=257653.33333333334, ans=0.125 2023-11-18 13:56:25,577 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2600, loss[loss=0.08461, simple_loss=0.09428, pruned_loss=0.02665, audio_tagging_loss=0.01082, over 14268.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.122, pruned_loss=0.03709, audio_tagging_loss=0.01173, over 3034720.23 frames. ], batch size: 56, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:56:29,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=257786.66666666666, ans=0.1 2023-11-18 13:56:39,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.77 vs. limit=15.0 2023-11-18 13:56:40,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=257853.33333333334, ans=10.0 2023-11-18 13:57:09,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=257986.66666666666, ans=0.125 2023-11-18 13:57:14,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=258053.33333333334, ans=0.0 2023-11-18 13:57:21,395 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2650, loss[loss=0.09903, simple_loss=0.1095, pruned_loss=0.03284, audio_tagging_loss=0.01143, over 15154.00 frames. ], tot_loss[loss=0.1104, simple_loss=0.1227, pruned_loss=0.03735, audio_tagging_loss=0.01166, over 3035961.16 frames. ], batch size: 58, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:57:24,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2023-11-18 13:57:32,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.000e+01 9.528e+01 1.033e+02 1.143e+02 1.471e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 13:58:13,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=258386.66666666666, ans=0.02 2023-11-18 13:58:17,142 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2700, loss[loss=0.09496, simple_loss=0.1033, pruned_loss=0.0316, audio_tagging_loss=0.01171, over 16073.00 frames. ], tot_loss[loss=0.1103, simple_loss=0.1228, pruned_loss=0.03733, audio_tagging_loss=0.01156, over 3045996.83 frames. ], batch size: 62, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:58:22,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=258453.33333333334, ans=0.125 2023-11-18 13:58:25,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=258453.33333333334, ans=0.125 2023-11-18 13:58:41,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=258586.66666666666, ans=0.125 2023-11-18 13:58:57,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=258653.33333333334, ans=0.0 2023-11-18 13:59:03,927 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:59:13,213 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2750, loss[loss=0.109, simple_loss=0.1219, pruned_loss=0.03818, audio_tagging_loss=0.009911, over 14878.00 frames. ], tot_loss[loss=0.1113, simple_loss=0.1241, pruned_loss=0.03768, audio_tagging_loss=0.01155, over 3039285.33 frames. ], batch size: 57, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:59:24,866 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.704e+01 9.301e+01 1.031e+02 1.106e+02 1.514e+02, threshold=2.061e+02, percent-clipped=0.0 2023-11-18 13:59:30,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=258853.33333333334, ans=0.125 2023-11-18 14:00:00,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=259053.33333333334, ans=0.125 2023-11-18 14:00:01,472 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:00:08,824 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2800, loss[loss=0.09961, simple_loss=0.1011, pruned_loss=0.03548, audio_tagging_loss=0.0136, over 14177.00 frames. ], tot_loss[loss=0.1105, simple_loss=0.1232, pruned_loss=0.03714, audio_tagging_loss=0.01174, over 3036391.05 frames. ], batch size: 55, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 14:00:18,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.45 vs. limit=10.0 2023-11-18 14:00:35,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=259253.33333333334, ans=0.1 2023-11-18 14:00:50,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.67 vs. limit=22.5 2023-11-18 14:01:04,443 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2850, loss[loss=0.08494, simple_loss=0.09063, pruned_loss=0.02806, audio_tagging_loss=0.01156, over 15202.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1224, pruned_loss=0.03692, audio_tagging_loss=0.01171, over 3033532.59 frames. ], batch size: 57, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 14:01:15,601 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 9.766e+01 1.049e+02 1.164e+02 1.614e+02, threshold=2.099e+02, percent-clipped=0.0 2023-11-18 14:01:29,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.89 vs. limit=10.0 2023-11-18 14:01:38,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=259653.33333333334, ans=0.0 2023-11-18 14:01:48,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=259720.0, ans=0.2 2023-11-18 14:01:55,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=259720.0, ans=0.125 2023-11-18 14:02:00,182 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2900, loss[loss=0.09839, simple_loss=0.1025, pruned_loss=0.02838, audio_tagging_loss=0.01874, over 16187.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.1224, pruned_loss=0.03696, audio_tagging_loss=0.01179, over 3031613.36 frames. ], batch size: 63, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:02:00,469 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:02:56,636 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 2950, loss[loss=0.08779, simple_loss=0.09649, pruned_loss=0.02481, audio_tagging_loss=0.01473, over 14871.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.122, pruned_loss=0.0366, audio_tagging_loss=0.0118, over 3037621.61 frames. ], batch size: 58, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:03:02,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=260120.0, ans=0.2 2023-11-18 14:03:06,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=260186.66666666666, ans=0.2 2023-11-18 14:03:07,250 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 9.370e+01 1.013e+02 1.101e+02 1.808e+02, threshold=2.027e+02, percent-clipped=0.0 2023-11-18 14:03:38,192 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:03:40,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=260386.66666666666, ans=0.125 2023-11-18 14:03:51,841 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3000, loss[loss=0.08972, simple_loss=0.09924, pruned_loss=0.02631, audio_tagging_loss=0.01379, over 15104.00 frames. ], tot_loss[loss=0.1096, simple_loss=0.1226, pruned_loss=0.0365, audio_tagging_loss=0.01183, over 3044185.63 frames. ], batch size: 58, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:03:51,842 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 14:04:05,938 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.7113, 5.4928, 5.4841, 5.3682], device='cuda:1') 2023-11-18 14:04:25,234 INFO [train_asr.py:1147] (1/4) Epoch 4, validation: loss=0.07718, simple_loss=0.06278, pruned_loss=0.01045, audio_tagging_loss=0.03534, over 4681554.00 frames. 2023-11-18 14:04:25,235 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 14:04:52,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=260586.66666666666, ans=0.125 2023-11-18 14:04:57,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=260653.33333333334, ans=0.0 2023-11-18 14:05:03,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.65 vs. limit=12.0 2023-11-18 14:05:07,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=260653.33333333334, ans=0.125 2023-11-18 14:05:20,215 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3050, loss[loss=0.1311, simple_loss=0.1449, pruned_loss=0.04626, audio_tagging_loss=0.01237, over 16096.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1211, pruned_loss=0.03613, audio_tagging_loss=0.01195, over 3047512.04 frames. ], batch size: 61, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:05:23,575 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:05:30,850 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.697e+01 9.501e+01 1.094e+02 1.227e+02 1.890e+02, threshold=2.188e+02, percent-clipped=0.0 2023-11-18 14:05:32,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=260853.33333333334, ans=0.0 2023-11-18 14:05:44,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=260920.0, ans=0.125 2023-11-18 14:05:53,406 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:05:55,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=15.0 2023-11-18 14:06:09,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=261053.33333333334, ans=0.2 2023-11-18 14:06:15,724 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3100, loss[loss=0.07494, simple_loss=0.08749, pruned_loss=0.0185, audio_tagging_loss=0.0127, over 14065.00 frames. ], tot_loss[loss=0.1087, simple_loss=0.121, pruned_loss=0.03612, audio_tagging_loss=0.01203, over 3053075.32 frames. ], batch size: 56, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:06:17,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=261120.0, ans=0.0 2023-11-18 14:06:25,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=261186.66666666666, ans=0.125 2023-11-18 14:06:38,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=261253.33333333334, ans=0.1 2023-11-18 14:06:52,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=261320.0, ans=0.025 2023-11-18 14:06:54,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2023-11-18 14:06:58,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=261320.0, ans=0.04949747468305833 2023-11-18 14:07:12,445 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3150, loss[loss=0.0776, simple_loss=0.08486, pruned_loss=0.02481, audio_tagging_loss=0.01036, over 14739.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1225, pruned_loss=0.03674, audio_tagging_loss=0.01202, over 3054269.71 frames. ], batch size: 56, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:07:13,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=261453.33333333334, ans=0.125 2023-11-18 14:07:15,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=261453.33333333334, ans=0.125 2023-11-18 14:07:24,261 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 9.617e+01 1.054e+02 1.142e+02 1.769e+02, threshold=2.109e+02, percent-clipped=0.0 2023-11-18 14:07:34,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=261586.66666666666, ans=0.125 2023-11-18 14:07:42,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=261586.66666666666, ans=0.2 2023-11-18 14:07:43,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.82 vs. limit=15.0 2023-11-18 14:08:02,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=261720.0, ans=0.125 2023-11-18 14:08:05,360 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2023-11-18 14:08:09,110 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3200, loss[loss=0.1201, simple_loss=0.1401, pruned_loss=0.03999, audio_tagging_loss=0.01006, over 14372.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1214, pruned_loss=0.03633, audio_tagging_loss=0.0121, over 3053091.97 frames. ], batch size: 54, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:08:27,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=261853.33333333334, ans=0.1 2023-11-18 14:08:31,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=261920.0, ans=0.0 2023-11-18 14:08:51,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=261986.66666666666, ans=0.125 2023-11-18 14:09:04,136 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3250, loss[loss=0.1396, simple_loss=0.1589, pruned_loss=0.05057, audio_tagging_loss=0.009531, over 16257.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1207, pruned_loss=0.03596, audio_tagging_loss=0.01217, over 3046408.08 frames. ], batch size: 61, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:09:12,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=262120.0, ans=0.125 2023-11-18 14:09:15,324 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.287e+01 9.218e+01 1.067e+02 1.190e+02 1.746e+02, threshold=2.133e+02, percent-clipped=0.0 2023-11-18 14:09:21,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=262186.6666666667, ans=0.0 2023-11-18 14:09:27,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=262253.3333333333, ans=0.125 2023-11-18 14:09:29,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=262253.3333333333, ans=0.125 2023-11-18 14:09:37,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=262320.0, ans=0.2 2023-11-18 14:09:50,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=262386.6666666667, ans=0.2 2023-11-18 14:09:59,326 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3300, loss[loss=0.09966, simple_loss=0.1057, pruned_loss=0.03568, audio_tagging_loss=0.01115, over 14972.00 frames. ], tot_loss[loss=0.108, simple_loss=0.1198, pruned_loss=0.0358, audio_tagging_loss=0.01223, over 3043102.30 frames. ], batch size: 57, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:10:07,085 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.173e+00 2023-11-18 14:10:10,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=262520.0, ans=0.125 2023-11-18 14:10:12,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=262520.0, ans=0.0 2023-11-18 14:10:30,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=262586.6666666667, ans=0.0 2023-11-18 14:10:37,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=262653.3333333333, ans=0.1 2023-11-18 14:10:56,640 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3350, loss[loss=0.1003, simple_loss=0.1072, pruned_loss=0.03132, audio_tagging_loss=0.01538, over 16654.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1208, pruned_loss=0.03616, audio_tagging_loss=0.01201, over 3045633.33 frames. ], batch size: 63, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:11:05,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=15.0 2023-11-18 14:11:07,052 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.720e+01 9.463e+01 1.035e+02 1.183e+02 1.659e+02, threshold=2.070e+02, percent-clipped=0.0 2023-11-18 14:11:13,868 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2023-11-18 14:11:21,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=262920.0, ans=0.05 2023-11-18 14:11:31,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=262986.6666666667, ans=0.125 2023-11-18 14:11:37,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=262986.6666666667, ans=0.0 2023-11-18 14:11:49,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=263053.3333333333, ans=0.125 2023-11-18 14:11:51,603 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3400, loss[loss=0.132, simple_loss=0.1474, pruned_loss=0.0446, audio_tagging_loss=0.01372, over 14398.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.1217, pruned_loss=0.03614, audio_tagging_loss=0.01195, over 3040005.42 frames. ], batch size: 53, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:11:57,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=263120.0, ans=0.125 2023-11-18 14:12:13,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=263253.3333333333, ans=0.0 2023-11-18 14:12:14,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=263253.3333333333, ans=0.125 2023-11-18 14:12:14,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=263253.3333333333, ans=0.125 2023-11-18 14:12:47,598 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3450, loss[loss=0.08985, simple_loss=0.09766, pruned_loss=0.03124, audio_tagging_loss=0.00978, over 14818.00 frames. ], tot_loss[loss=0.1101, simple_loss=0.1228, pruned_loss=0.03673, audio_tagging_loss=0.01191, over 3038961.93 frames. ], batch size: 58, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:12:49,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=263453.3333333333, ans=0.125 2023-11-18 14:12:56,343 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:12:59,380 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 9.398e+01 1.018e+02 1.161e+02 1.639e+02, threshold=2.037e+02, percent-clipped=0.0 2023-11-18 14:13:01,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=263520.0, ans=0.125 2023-11-18 14:13:03,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=263520.0, ans=0.125 2023-11-18 14:13:20,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=263653.3333333333, ans=0.2 2023-11-18 14:13:28,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=263653.3333333333, ans=0.5 2023-11-18 14:13:37,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=263720.0, ans=0.125 2023-11-18 14:13:44,345 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3500, loss[loss=0.1192, simple_loss=0.139, pruned_loss=0.03985, audio_tagging_loss=0.009824, over 15328.00 frames. ], tot_loss[loss=0.109, simple_loss=0.1221, pruned_loss=0.03628, audio_tagging_loss=0.01166, over 3039592.50 frames. ], batch size: 57, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:13:44,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=263786.6666666667, ans=0.07 2023-11-18 14:13:55,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=263853.3333333333, ans=0.125 2023-11-18 14:14:04,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=263853.3333333333, ans=0.5 2023-11-18 14:14:10,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=263920.0, ans=0.2 2023-11-18 14:14:13,273 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:14:28,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=264053.3333333333, ans=0.2 2023-11-18 14:14:32,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=264053.3333333333, ans=0.125 2023-11-18 14:14:34,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=264053.3333333333, ans=0.125 2023-11-18 14:14:40,051 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3550, loss[loss=0.1029, simple_loss=0.1144, pruned_loss=0.03269, audio_tagging_loss=0.01296, over 15160.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1224, pruned_loss=0.03639, audio_tagging_loss=0.01151, over 3045454.02 frames. ], batch size: 57, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:14:51,084 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 9.400e+01 1.087e+02 1.239e+02 1.521e+02, threshold=2.174e+02, percent-clipped=0.0 2023-11-18 14:14:51,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=264186.6666666667, ans=0.125 2023-11-18 14:15:04,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=264253.3333333333, ans=0.125 2023-11-18 14:15:08,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=264253.3333333333, ans=0.0 2023-11-18 14:15:17,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=264320.0, ans=0.125 2023-11-18 14:15:19,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.32 vs. limit=15.0 2023-11-18 14:15:33,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=264386.6666666667, ans=0.0 2023-11-18 14:15:35,559 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3600, loss[loss=0.1132, simple_loss=0.1158, pruned_loss=0.04066, audio_tagging_loss=0.01466, over 14128.00 frames. ], tot_loss[loss=0.1101, simple_loss=0.1235, pruned_loss=0.03682, audio_tagging_loss=0.0115, over 3043509.89 frames. ], batch size: 55, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:15:39,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264453.3333333333, ans=0.1 2023-11-18 14:15:51,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=12.0 2023-11-18 14:15:55,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=264520.0, ans=0.2 2023-11-18 14:16:02,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=264586.6666666667, ans=0.0 2023-11-18 14:16:26,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264720.0, ans=0.1 2023-11-18 14:16:32,046 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3650, loss[loss=0.09005, simple_loss=0.1014, pruned_loss=0.0291, audio_tagging_loss=0.01024, over 15067.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1234, pruned_loss=0.03691, audio_tagging_loss=0.01142, over 3042140.95 frames. ], batch size: 57, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:16:33,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=22.5 2023-11-18 14:16:43,160 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 9.557e+01 1.072e+02 1.214e+02 1.788e+02, threshold=2.145e+02, percent-clipped=0.0 2023-11-18 14:16:45,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=264853.3333333333, ans=0.125 2023-11-18 14:17:27,643 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3700, loss[loss=0.09995, simple_loss=0.1051, pruned_loss=0.03599, audio_tagging_loss=0.01141, over 15313.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.123, pruned_loss=0.03682, audio_tagging_loss=0.01158, over 3050617.67 frames. ], batch size: 56, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:17:28,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=265120.0, ans=0.125 2023-11-18 14:17:32,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=265120.0, ans=0.1 2023-11-18 14:17:44,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=265186.6666666667, ans=0.125 2023-11-18 14:17:48,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2023-11-18 14:18:14,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=265386.6666666667, ans=0.125 2023-11-18 14:18:23,560 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3750, loss[loss=0.08471, simple_loss=0.08975, pruned_loss=0.02249, audio_tagging_loss=0.01734, over 16110.00 frames. ], tot_loss[loss=0.1105, simple_loss=0.1233, pruned_loss=0.03721, audio_tagging_loss=0.01168, over 3046156.68 frames. ], batch size: 60, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:18:26,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=265453.3333333333, ans=0.2 2023-11-18 14:18:34,675 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 1.034e+02 1.153e+02 1.284e+02 1.931e+02, threshold=2.306e+02, percent-clipped=0.0 2023-11-18 14:18:50,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=265586.6666666667, ans=0.125 2023-11-18 14:18:59,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=265653.3333333333, ans=0.1 2023-11-18 14:19:02,321 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:19:07,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=265720.0, ans=10.0 2023-11-18 14:19:08,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=265720.0, ans=0.0 2023-11-18 14:19:19,902 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3800, loss[loss=0.1145, simple_loss=0.119, pruned_loss=0.03864, audio_tagging_loss=0.01633, over 15368.00 frames. ], tot_loss[loss=0.1105, simple_loss=0.1232, pruned_loss=0.03704, audio_tagging_loss=0.01181, over 3054507.81 frames. ], batch size: 57, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:19:34,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=265853.3333333333, ans=0.1 2023-11-18 14:19:51,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=265986.6666666667, ans=0.125 2023-11-18 14:20:00,766 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2023-11-18 14:20:15,029 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3850, loss[loss=0.102, simple_loss=0.1127, pruned_loss=0.02818, audio_tagging_loss=0.01752, over 15653.00 frames. ], tot_loss[loss=0.1104, simple_loss=0.1235, pruned_loss=0.03697, audio_tagging_loss=0.01174, over 3052901.43 frames. ], batch size: 57, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:20:16,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2023-11-18 14:20:26,234 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 9.423e+01 1.054e+02 1.147e+02 1.619e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 14:20:30,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.11 vs. limit=22.5 2023-11-18 14:20:45,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=266253.3333333333, ans=0.2 2023-11-18 14:20:53,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=266320.0, ans=0.0 2023-11-18 14:21:04,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=266386.6666666667, ans=0.125 2023-11-18 14:21:04,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=266386.6666666667, ans=0.125 2023-11-18 14:21:10,653 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3900, loss[loss=0.1131, simple_loss=0.1233, pruned_loss=0.03659, audio_tagging_loss=0.01486, over 15532.00 frames. ], tot_loss[loss=0.1097, simple_loss=0.1224, pruned_loss=0.03667, audio_tagging_loss=0.01183, over 3053720.49 frames. ], batch size: 58, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:21:27,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2023-11-18 14:21:32,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=266586.6666666667, ans=0.2 2023-11-18 14:21:51,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=266653.3333333333, ans=0.125 2023-11-18 14:22:02,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2023-11-18 14:22:10,120 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 3950, loss[loss=0.06613, simple_loss=0.07265, pruned_loss=0.01759, audio_tagging_loss=0.01222, over 15953.00 frames. ], tot_loss[loss=0.1095, simple_loss=0.122, pruned_loss=0.03654, audio_tagging_loss=0.01197, over 3054708.35 frames. ], batch size: 64, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:22:20,673 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 9.384e+01 1.022e+02 1.131e+02 1.477e+02, threshold=2.044e+02, percent-clipped=0.0 2023-11-18 14:22:26,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=266853.3333333333, ans=0.125 2023-11-18 14:22:28,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=266853.3333333333, ans=0.125 2023-11-18 14:22:28,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=266853.3333333333, ans=0.0 2023-11-18 14:22:28,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.08 vs. limit=15.0 2023-11-18 14:22:31,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=266920.0, ans=0.09899494936611666 2023-11-18 14:22:48,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=266986.6666666667, ans=0.125 2023-11-18 14:22:58,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=267053.3333333333, ans=0.125 2023-11-18 14:23:02,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=267053.3333333333, ans=0.125 2023-11-18 14:23:05,088 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4000, loss[loss=0.144, simple_loss=0.168, pruned_loss=0.05077, audio_tagging_loss=0.009174, over 17258.00 frames. ], tot_loss[loss=0.1097, simple_loss=0.122, pruned_loss=0.0366, audio_tagging_loss=0.01211, over 3064533.54 frames. ], batch size: 62, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:23:20,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=267186.6666666667, ans=0.0 2023-11-18 14:23:32,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=267253.3333333333, ans=0.125 2023-11-18 14:23:33,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=267253.3333333333, ans=0.125 2023-11-18 14:23:38,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=267320.0, ans=0.125 2023-11-18 14:24:01,219 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4050, loss[loss=0.1133, simple_loss=0.1211, pruned_loss=0.0394, audio_tagging_loss=0.01333, over 14603.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1208, pruned_loss=0.03603, audio_tagging_loss=0.01215, over 3057180.72 frames. ], batch size: 55, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:24:03,455 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:24:12,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 9.604e+01 1.092e+02 1.269e+02 1.663e+02, threshold=2.185e+02, percent-clipped=0.0 2023-11-18 14:24:21,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=267520.0, ans=0.0 2023-11-18 14:24:32,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.13 vs. limit=15.0 2023-11-18 14:24:57,453 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4100, loss[loss=0.0814, simple_loss=0.08963, pruned_loss=0.02506, audio_tagging_loss=0.01152, over 14924.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.1223, pruned_loss=0.03665, audio_tagging_loss=0.01206, over 3053299.42 frames. ], batch size: 58, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:25:03,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=267786.6666666667, ans=0.125 2023-11-18 14:25:06,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=267786.6666666667, ans=0.05 2023-11-18 14:25:22,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=267920.0, ans=0.125 2023-11-18 14:25:31,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2023-11-18 14:25:35,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=267986.6666666667, ans=0.0 2023-11-18 14:25:43,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=268053.3333333333, ans=0.0 2023-11-18 14:25:47,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=268053.3333333333, ans=0.0 2023-11-18 14:25:53,587 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4150, loss[loss=0.1183, simple_loss=0.1341, pruned_loss=0.03899, audio_tagging_loss=0.01228, over 14826.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.1227, pruned_loss=0.03665, audio_tagging_loss=0.01189, over 3055167.60 frames. ], batch size: 56, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:26:04,173 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.914e+01 9.487e+01 1.055e+02 1.166e+02 1.501e+02, threshold=2.109e+02, percent-clipped=0.0 2023-11-18 14:26:04,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=268186.6666666667, ans=0.125 2023-11-18 14:26:08,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=268186.6666666667, ans=0.125 2023-11-18 14:26:18,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=268253.3333333333, ans=0.125 2023-11-18 14:26:19,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=268253.3333333333, ans=0.2 2023-11-18 14:26:21,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=268253.3333333333, ans=0.125 2023-11-18 14:26:31,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=268320.0, ans=0.0 2023-11-18 14:26:33,425 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:26:40,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.67 vs. limit=10.0 2023-11-18 14:26:42,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.97 vs. limit=15.0 2023-11-18 14:26:42,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.12 vs. limit=15.0 2023-11-18 14:26:47,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=268453.3333333333, ans=0.125 2023-11-18 14:26:48,317 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4200, loss[loss=0.1095, simple_loss=0.1257, pruned_loss=0.03627, audio_tagging_loss=0.01038, over 15166.00 frames. ], tot_loss[loss=0.1087, simple_loss=0.1216, pruned_loss=0.03616, audio_tagging_loss=0.01177, over 3045499.21 frames. ], batch size: 56, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:26:58,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=268453.3333333333, ans=0.125 2023-11-18 14:26:59,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.47 vs. limit=10.0 2023-11-18 14:27:02,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=268520.0, ans=0.0 2023-11-18 14:27:05,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=268520.0, ans=0.2 2023-11-18 14:27:14,679 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.47 vs. limit=22.5 2023-11-18 14:27:19,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=268586.6666666667, ans=0.2 2023-11-18 14:27:20,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=268586.6666666667, ans=0.0 2023-11-18 14:27:29,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2023-11-18 14:27:35,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=268720.0, ans=0.125 2023-11-18 14:27:41,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=268720.0, ans=0.125 2023-11-18 14:27:44,857 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4250, loss[loss=0.1228, simple_loss=0.135, pruned_loss=0.04476, audio_tagging_loss=0.01056, over 15344.00 frames. ], tot_loss[loss=0.1097, simple_loss=0.1233, pruned_loss=0.0365, audio_tagging_loss=0.01159, over 3048088.25 frames. ], batch size: 58, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:27:51,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=268786.6666666667, ans=0.125 2023-11-18 14:27:57,593 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 9.513e+01 1.037e+02 1.128e+02 1.811e+02, threshold=2.074e+02, percent-clipped=0.0 2023-11-18 14:28:28,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=268986.6666666667, ans=0.0 2023-11-18 14:28:37,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=269053.3333333333, ans=0.0 2023-11-18 14:28:41,083 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4300, loss[loss=0.105, simple_loss=0.1205, pruned_loss=0.03444, audio_tagging_loss=0.01035, over 15320.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1238, pruned_loss=0.03663, audio_tagging_loss=0.01149, over 3056794.71 frames. ], batch size: 56, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:28:51,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=269186.6666666667, ans=0.125 2023-11-18 14:29:11,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=269253.3333333333, ans=0.0 2023-11-18 14:29:19,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=269320.0, ans=0.125 2023-11-18 14:29:22,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=269320.0, ans=0.05 2023-11-18 14:29:36,919 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4350, loss[loss=0.1273, simple_loss=0.1543, pruned_loss=0.04034, audio_tagging_loss=0.009801, over 14921.00 frames. ], tot_loss[loss=0.1105, simple_loss=0.1245, pruned_loss=0.03683, audio_tagging_loss=0.01142, over 3054641.65 frames. ], batch size: 55, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:29:38,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.76 vs. limit=10.0 2023-11-18 14:29:40,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.81 vs. limit=10.0 2023-11-18 14:29:48,991 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.972e+01 1.042e+02 1.123e+02 1.311e+02 1.927e+02, threshold=2.246e+02, percent-clipped=0.0 2023-11-18 14:30:11,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=269653.3333333333, ans=0.1 2023-11-18 14:30:14,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.83 vs. limit=15.0 2023-11-18 14:30:15,656 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:30:23,657 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.37 vs. limit=22.5 2023-11-18 14:30:30,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=269720.0, ans=0.0 2023-11-18 14:30:31,877 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4400, loss[loss=0.09444, simple_loss=0.1083, pruned_loss=0.02761, audio_tagging_loss=0.01267, over 14399.00 frames. ], tot_loss[loss=0.1112, simple_loss=0.1248, pruned_loss=0.03719, audio_tagging_loss=0.01161, over 3057412.09 frames. ], batch size: 55, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:30:32,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=269786.6666666667, ans=0.2 2023-11-18 14:30:33,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=269786.6666666667, ans=0.0 2023-11-18 14:30:34,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=269786.6666666667, ans=0.125 2023-11-18 14:30:45,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=269853.3333333333, ans=0.125 2023-11-18 14:31:28,576 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4450, loss[loss=0.103, simple_loss=0.1046, pruned_loss=0.0366, audio_tagging_loss=0.01411, over 14702.00 frames. ], tot_loss[loss=0.1104, simple_loss=0.1236, pruned_loss=0.03683, audio_tagging_loss=0.01176, over 3061971.81 frames. ], batch size: 57, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:31:29,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=270120.0, ans=0.125 2023-11-18 14:31:40,207 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 9.826e+01 1.064e+02 1.191e+02 1.732e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-18 14:31:56,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270253.3333333333, ans=0.1 2023-11-18 14:32:19,309 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.32 vs. limit=15.0 2023-11-18 14:32:23,821 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4500, loss[loss=0.1143, simple_loss=0.1269, pruned_loss=0.04064, audio_tagging_loss=0.0102, over 15062.00 frames. ], tot_loss[loss=0.1105, simple_loss=0.1236, pruned_loss=0.03703, audio_tagging_loss=0.01163, over 3053862.48 frames. ], batch size: 56, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:32:38,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=270520.0, ans=0.0 2023-11-18 14:32:51,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=270586.6666666667, ans=0.1 2023-11-18 14:33:05,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.27 vs. limit=15.0 2023-11-18 14:33:07,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.65 vs. limit=12.0 2023-11-18 14:33:12,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=270720.0, ans=0.0 2023-11-18 14:33:20,080 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4550, loss[loss=0.1044, simple_loss=0.1178, pruned_loss=0.0305, audio_tagging_loss=0.01495, over 15647.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1227, pruned_loss=0.03678, audio_tagging_loss=0.01164, over 3052730.28 frames. ], batch size: 59, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:33:33,252 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.057e+01 9.598e+01 1.091e+02 1.194e+02 2.832e+02, threshold=2.183e+02, percent-clipped=1.0 2023-11-18 14:33:34,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=270853.3333333333, ans=0.125 2023-11-18 14:33:37,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=270853.3333333333, ans=0.125 2023-11-18 14:33:48,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=270920.0, ans=0.125 2023-11-18 14:33:59,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=270986.6666666667, ans=0.125 2023-11-18 14:34:02,581 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:34:05,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=271053.3333333333, ans=0.125 2023-11-18 14:34:11,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=271053.3333333333, ans=0.125 2023-11-18 14:34:17,071 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4600, loss[loss=0.1296, simple_loss=0.1483, pruned_loss=0.04632, audio_tagging_loss=0.009159, over 15278.00 frames. ], tot_loss[loss=0.1095, simple_loss=0.1224, pruned_loss=0.0366, audio_tagging_loss=0.01174, over 3044310.11 frames. ], batch size: 56, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:34:17,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=271120.0, ans=0.0 2023-11-18 14:34:20,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=271120.0, ans=0.0 2023-11-18 14:34:28,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=271186.6666666667, ans=0.1 2023-11-18 14:35:01,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=271386.6666666667, ans=0.1 2023-11-18 14:35:11,224 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=7.991e-01 2023-11-18 14:35:12,026 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4650, loss[loss=0.1349, simple_loss=0.158, pruned_loss=0.0463, audio_tagging_loss=0.009635, over 15527.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1228, pruned_loss=0.03679, audio_tagging_loss=0.01183, over 3045180.34 frames. ], batch size: 54, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:35:12,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=271453.3333333333, ans=0.125 2023-11-18 14:35:24,106 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.025e+01 1.020e+02 1.140e+02 1.306e+02 2.124e+02, threshold=2.280e+02, percent-clipped=0.0 2023-11-18 14:35:29,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=271520.0, ans=0.0 2023-11-18 14:36:07,758 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4700, loss[loss=0.1089, simple_loss=0.1134, pruned_loss=0.03908, audio_tagging_loss=0.01306, over 14662.00 frames. ], tot_loss[loss=0.1095, simple_loss=0.1216, pruned_loss=0.03664, audio_tagging_loss=0.01206, over 3044820.02 frames. ], batch size: 56, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:36:10,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=271786.6666666667, ans=0.2 2023-11-18 14:36:22,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=271853.3333333333, ans=0.125 2023-11-18 14:36:24,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.13 vs. limit=15.0 2023-11-18 14:36:28,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=271853.3333333333, ans=0.125 2023-11-18 14:36:38,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=271920.0, ans=0.125 2023-11-18 14:36:49,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=271986.6666666667, ans=0.0 2023-11-18 14:36:49,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=15.0 2023-11-18 14:36:52,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=272053.3333333333, ans=0.125 2023-11-18 14:37:01,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=272053.3333333333, ans=0.125 2023-11-18 14:37:04,177 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4750, loss[loss=0.11, simple_loss=0.1194, pruned_loss=0.03859, audio_tagging_loss=0.01176, over 14923.00 frames. ], tot_loss[loss=0.1097, simple_loss=0.1219, pruned_loss=0.03659, audio_tagging_loss=0.01217, over 3045225.94 frames. ], batch size: 57, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:37:06,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=272120.0, ans=0.125 2023-11-18 14:37:07,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=272120.0, ans=0.1 2023-11-18 14:37:16,363 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 9.592e+01 1.080e+02 1.195e+02 1.652e+02, threshold=2.159e+02, percent-clipped=0.0 2023-11-18 14:37:27,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=272253.3333333333, ans=0.125 2023-11-18 14:37:32,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=272253.3333333333, ans=0.0 2023-11-18 14:37:36,644 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.98 vs. limit=15.0 2023-11-18 14:37:42,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=272320.0, ans=0.125 2023-11-18 14:37:45,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2023-11-18 14:37:48,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=272386.6666666667, ans=0.125 2023-11-18 14:37:59,705 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4800, loss[loss=0.112, simple_loss=0.1262, pruned_loss=0.03517, audio_tagging_loss=0.01371, over 15838.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.1212, pruned_loss=0.03639, audio_tagging_loss=0.01229, over 3043941.08 frames. ], batch size: 57, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:38:05,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=272453.3333333333, ans=0.125 2023-11-18 14:38:08,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2023-11-18 14:38:14,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=272520.0, ans=0.025 2023-11-18 14:38:19,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=272520.0, ans=0.1 2023-11-18 14:38:32,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=272653.3333333333, ans=0.0 2023-11-18 14:38:34,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2023-11-18 14:38:37,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=272653.3333333333, ans=0.125 2023-11-18 14:38:40,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=272653.3333333333, ans=0.125 2023-11-18 14:38:42,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.94 vs. limit=15.0 2023-11-18 14:38:55,229 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4850, loss[loss=0.117, simple_loss=0.1305, pruned_loss=0.03873, audio_tagging_loss=0.01304, over 14651.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.1208, pruned_loss=0.03611, audio_tagging_loss=0.01233, over 3045141.38 frames. ], batch size: 54, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:39:07,952 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 9.439e+01 1.075e+02 1.233e+02 2.240e+02, threshold=2.150e+02, percent-clipped=1.0 2023-11-18 14:39:11,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=272853.3333333333, ans=0.2 2023-11-18 14:39:13,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=272853.3333333333, ans=0.0 2023-11-18 14:39:20,892 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:39:23,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=272920.0, ans=0.0 2023-11-18 14:39:29,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=272986.6666666667, ans=0.125 2023-11-18 14:39:33,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=272986.6666666667, ans=0.125 2023-11-18 14:39:34,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=272986.6666666667, ans=0.1 2023-11-18 14:39:48,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2023-11-18 14:39:51,328 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4900, loss[loss=0.07853, simple_loss=0.07434, pruned_loss=0.02776, audio_tagging_loss=0.0136, over 15081.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.1211, pruned_loss=0.03617, audio_tagging_loss=0.01225, over 3046057.58 frames. ], batch size: 59, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:40:36,590 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2023-11-18 14:40:46,585 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 4950, loss[loss=0.09272, simple_loss=0.1062, pruned_loss=0.03168, audio_tagging_loss=0.007946, over 13612.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1205, pruned_loss=0.03585, audio_tagging_loss=0.01207, over 3046374.29 frames. ], batch size: 52, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:40:52,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.31 vs. limit=22.5 2023-11-18 14:40:58,628 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.761e+01 9.503e+01 1.074e+02 1.226e+02 1.825e+02, threshold=2.148e+02, percent-clipped=0.0 2023-11-18 14:41:00,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=273520.0, ans=12.0 2023-11-18 14:41:02,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=273520.0, ans=0.0 2023-11-18 14:41:07,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.29 vs. limit=15.0 2023-11-18 14:41:15,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=273586.6666666667, ans=0.125 2023-11-18 14:41:18,866 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.74 vs. limit=22.5 2023-11-18 14:41:41,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=273786.6666666667, ans=0.0 2023-11-18 14:41:42,363 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5000, loss[loss=0.1019, simple_loss=0.1147, pruned_loss=0.03114, audio_tagging_loss=0.0134, over 14326.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.121, pruned_loss=0.03583, audio_tagging_loss=0.01192, over 3048481.50 frames. ], batch size: 57, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:41:49,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=273786.6666666667, ans=0.0 2023-11-18 14:42:31,562 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:42:32,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.28 vs. limit=15.0 2023-11-18 14:42:38,355 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5050, loss[loss=0.1267, simple_loss=0.1529, pruned_loss=0.0419, audio_tagging_loss=0.008394, over 15549.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1205, pruned_loss=0.03555, audio_tagging_loss=0.01169, over 3051815.66 frames. ], batch size: 55, lr: 1.66e-02, grad_scale: 16.0 2023-11-18 14:42:43,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-18 14:42:47,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=274120.0, ans=0.025 2023-11-18 14:42:51,026 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 9.577e+01 1.097e+02 1.238e+02 1.791e+02, threshold=2.193e+02, percent-clipped=0.0 2023-11-18 14:43:05,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.70 vs. limit=22.5 2023-11-18 14:43:11,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=274320.0, ans=0.0 2023-11-18 14:43:19,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2023-11-18 14:43:22,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.30 vs. limit=10.0 2023-11-18 14:43:32,742 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5100, loss[loss=0.1232, simple_loss=0.1475, pruned_loss=0.03996, audio_tagging_loss=0.009533, over 16576.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1214, pruned_loss=0.03585, audio_tagging_loss=0.01155, over 3050817.18 frames. ], batch size: 59, lr: 1.66e-02, grad_scale: 16.0 2023-11-18 14:44:17,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=274720.0, ans=0.125 2023-11-18 14:44:27,772 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5150, loss[loss=0.1227, simple_loss=0.1444, pruned_loss=0.04006, audio_tagging_loss=0.01048, over 15510.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1213, pruned_loss=0.03589, audio_tagging_loss=0.01158, over 3051939.65 frames. ], batch size: 55, lr: 1.66e-02, grad_scale: 16.0 2023-11-18 14:44:41,459 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.141e+01 9.655e+01 1.078e+02 1.221e+02 1.622e+02, threshold=2.156e+02, percent-clipped=0.0 2023-11-18 14:44:44,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=274853.3333333333, ans=0.1 2023-11-18 14:45:19,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=275053.3333333333, ans=0.05 2023-11-18 14:45:22,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=275120.0, ans=0.1 2023-11-18 14:45:23,352 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5200, loss[loss=0.0785, simple_loss=0.0818, pruned_loss=0.0234, audio_tagging_loss=0.0142, over 14406.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1221, pruned_loss=0.03608, audio_tagging_loss=0.01149, over 3050365.76 frames. ], batch size: 57, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:45:25,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=275120.0, ans=0.1 2023-11-18 14:45:32,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=275120.0, ans=0.125 2023-11-18 14:45:41,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=275186.6666666667, ans=0.1 2023-11-18 14:45:58,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=275320.0, ans=0.0 2023-11-18 14:46:10,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=17.08 vs. limit=15.0 2023-11-18 14:46:18,245 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5250, loss[loss=0.1065, simple_loss=0.1145, pruned_loss=0.03758, audio_tagging_loss=0.01164, over 15136.00 frames. ], tot_loss[loss=0.1096, simple_loss=0.1232, pruned_loss=0.03652, audio_tagging_loss=0.0115, over 3049684.70 frames. ], batch size: 57, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:46:24,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=275453.3333333333, ans=0.0 2023-11-18 14:46:30,876 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.969e+01 9.429e+01 1.029e+02 1.136e+02 1.567e+02, threshold=2.057e+02, percent-clipped=0.0 2023-11-18 14:46:35,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=275520.0, ans=0.125 2023-11-18 14:46:47,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2023-11-18 14:47:09,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=275720.0, ans=0.125 2023-11-18 14:47:12,129 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5300, loss[loss=0.07342, simple_loss=0.08228, pruned_loss=0.02287, audio_tagging_loss=0.009404, over 16336.00 frames. ], tot_loss[loss=0.1106, simple_loss=0.1241, pruned_loss=0.0371, audio_tagging_loss=0.01148, over 3047925.20 frames. ], batch size: 61, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:47:14,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=275786.6666666667, ans=0.0 2023-11-18 14:47:21,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=275786.6666666667, ans=0.5 2023-11-18 14:47:31,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=275853.3333333333, ans=0.0 2023-11-18 14:47:36,681 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.69 vs. limit=15.0 2023-11-18 14:47:43,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=275920.0, ans=0.0 2023-11-18 14:47:51,778 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.27 vs. limit=22.5 2023-11-18 14:47:54,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.18 vs. limit=10.0 2023-11-18 14:48:07,983 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5350, loss[loss=0.1083, simple_loss=0.1242, pruned_loss=0.03511, audio_tagging_loss=0.01111, over 14208.00 frames. ], tot_loss[loss=0.1112, simple_loss=0.1249, pruned_loss=0.03725, audio_tagging_loss=0.01146, over 3048390.51 frames. ], batch size: 52, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:48:08,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.66 vs. limit=22.5 2023-11-18 14:48:16,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=276120.0, ans=0.125 2023-11-18 14:48:17,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=276120.0, ans=0.0 2023-11-18 14:48:21,236 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 9.718e+01 1.034e+02 1.191e+02 1.805e+02, threshold=2.068e+02, percent-clipped=0.0 2023-11-18 14:48:32,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=276253.3333333333, ans=0.125 2023-11-18 14:48:33,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=276253.3333333333, ans=0.0 2023-11-18 14:48:55,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=276386.6666666667, ans=10.0 2023-11-18 14:49:03,117 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5400, loss[loss=0.1009, simple_loss=0.1132, pruned_loss=0.03182, audio_tagging_loss=0.01247, over 14522.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.1237, pruned_loss=0.03672, audio_tagging_loss=0.01159, over 3048236.43 frames. ], batch size: 56, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:49:06,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=276453.3333333333, ans=0.1 2023-11-18 14:49:11,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=276453.3333333333, ans=0.125 2023-11-18 14:49:22,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.52 vs. limit=10.0 2023-11-18 14:49:39,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.85 vs. limit=10.0 2023-11-18 14:49:52,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=276720.0, ans=0.125 2023-11-18 14:49:57,656 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5450, loss[loss=0.1098, simple_loss=0.1245, pruned_loss=0.03547, audio_tagging_loss=0.01206, over 14896.00 frames. ], tot_loss[loss=0.1095, simple_loss=0.1226, pruned_loss=0.03654, audio_tagging_loss=0.01167, over 3042220.78 frames. ], batch size: 56, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:50:05,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=276786.6666666667, ans=0.125 2023-11-18 14:50:06,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=276786.6666666667, ans=0.125 2023-11-18 14:50:10,737 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 9.674e+01 1.094e+02 1.267e+02 1.723e+02, threshold=2.188e+02, percent-clipped=0.0 2023-11-18 14:50:14,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=276853.3333333333, ans=0.125 2023-11-18 14:50:18,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=276853.3333333333, ans=0.1 2023-11-18 14:50:20,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.43 vs. limit=6.0 2023-11-18 14:50:22,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=276920.0, ans=0.125 2023-11-18 14:50:30,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=276986.6666666667, ans=0.125 2023-11-18 14:50:31,031 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.382e-02 2023-11-18 14:50:32,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=15.0 2023-11-18 14:50:33,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=276986.6666666667, ans=0.125 2023-11-18 14:50:49,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.94 vs. limit=12.0 2023-11-18 14:50:52,382 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5500, loss[loss=0.07144, simple_loss=0.0791, pruned_loss=0.01927, audio_tagging_loss=0.01262, over 15715.00 frames. ], tot_loss[loss=0.109, simple_loss=0.122, pruned_loss=0.03633, audio_tagging_loss=0.01162, over 3047311.39 frames. ], batch size: 60, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:51:05,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=277186.6666666667, ans=0.125 2023-11-18 14:51:40,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.48 vs. limit=15.0 2023-11-18 14:51:43,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=277386.6666666667, ans=0.0 2023-11-18 14:51:47,654 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5550, loss[loss=0.09207, simple_loss=0.09269, pruned_loss=0.03097, audio_tagging_loss=0.01476, over 14478.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1214, pruned_loss=0.03598, audio_tagging_loss=0.01184, over 3047054.49 frames. ], batch size: 56, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:51:57,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=277520.0, ans=0.0 2023-11-18 14:52:00,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.658e+01 9.567e+01 1.041e+02 1.171e+02 1.468e+02, threshold=2.082e+02, percent-clipped=0.0 2023-11-18 14:52:04,347 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.12 vs. limit=15.0 2023-11-18 14:52:20,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=277653.3333333333, ans=0.125 2023-11-18 14:52:41,984 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5600, loss[loss=0.09558, simple_loss=0.1038, pruned_loss=0.0311, audio_tagging_loss=0.01261, over 15424.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.1218, pruned_loss=0.03607, audio_tagging_loss=0.01195, over 3046592.21 frames. ], batch size: 60, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:52:54,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=277853.3333333333, ans=0.0 2023-11-18 14:53:04,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=277920.0, ans=0.95 2023-11-18 14:53:08,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2023-11-18 14:53:11,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.25 vs. limit=6.0 2023-11-18 14:53:12,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=277920.0, ans=0.0 2023-11-18 14:53:20,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=277986.6666666667, ans=0.0 2023-11-18 14:53:21,534 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:53:22,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277986.6666666667, ans=0.1 2023-11-18 14:53:25,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=278053.3333333333, ans=0.1 2023-11-18 14:53:34,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=278053.3333333333, ans=0.125 2023-11-18 14:53:36,756 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5650, loss[loss=0.1039, simple_loss=0.1183, pruned_loss=0.03276, audio_tagging_loss=0.01196, over 15332.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1216, pruned_loss=0.03581, audio_tagging_loss=0.0119, over 3048080.24 frames. ], batch size: 57, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:53:39,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=278120.0, ans=0.0 2023-11-18 14:53:50,487 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 9.354e+01 1.022e+02 1.173e+02 1.530e+02, threshold=2.043e+02, percent-clipped=0.0 2023-11-18 14:53:51,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=278186.6666666667, ans=0.125 2023-11-18 14:54:06,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=278253.3333333333, ans=0.1 2023-11-18 14:54:17,820 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.50 vs. limit=15.0 2023-11-18 14:54:32,133 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5700, loss[loss=0.1363, simple_loss=0.1486, pruned_loss=0.0509, audio_tagging_loss=0.01113, over 15049.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1227, pruned_loss=0.03625, audio_tagging_loss=0.01179, over 3048477.77 frames. ], batch size: 55, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:54:40,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=278453.3333333333, ans=0.125 2023-11-18 14:54:44,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=278520.0, ans=0.1 2023-11-18 14:54:49,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.03 vs. limit=6.0 2023-11-18 14:54:56,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=278586.6666666667, ans=0.0 2023-11-18 14:55:15,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=278720.0, ans=0.2 2023-11-18 14:55:23,550 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.93 vs. limit=22.5 2023-11-18 14:55:24,018 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.312e-01 2023-11-18 14:55:27,010 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5750, loss[loss=0.1085, simple_loss=0.1138, pruned_loss=0.03761, audio_tagging_loss=0.01396, over 14839.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1213, pruned_loss=0.03603, audio_tagging_loss=0.01171, over 3048273.03 frames. ], batch size: 56, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:55:34,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=278786.6666666667, ans=0.0 2023-11-18 14:55:36,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=278853.3333333333, ans=0.2 2023-11-18 14:55:40,249 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.668e+01 1.031e+02 1.141e+02 1.503e+02, threshold=2.062e+02, percent-clipped=0.0 2023-11-18 14:55:47,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.94 vs. limit=15.0 2023-11-18 14:56:01,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2023-11-18 14:56:05,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=278986.6666666667, ans=0.125 2023-11-18 14:56:06,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=278986.6666666667, ans=0.1 2023-11-18 14:56:11,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.24 vs. limit=10.0 2023-11-18 14:56:22,402 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5800, loss[loss=0.08315, simple_loss=0.09851, pruned_loss=0.02349, audio_tagging_loss=0.01041, over 14657.00 frames. ], tot_loss[loss=0.108, simple_loss=0.1211, pruned_loss=0.03578, audio_tagging_loss=0.01167, over 3044744.31 frames. ], batch size: 59, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:56:23,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=279120.0, ans=0.0 2023-11-18 14:57:03,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2023-11-18 14:57:18,277 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5850, loss[loss=0.09667, simple_loss=0.0963, pruned_loss=0.03153, audio_tagging_loss=0.01699, over 14277.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1221, pruned_loss=0.03599, audio_tagging_loss=0.01156, over 3037855.23 frames. ], batch size: 56, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:57:25,044 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2023-11-18 14:57:27,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=12.0 2023-11-18 14:57:31,471 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.015e+01 9.648e+01 1.054e+02 1.215e+02 1.872e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 14:57:42,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=279586.6666666667, ans=0.0 2023-11-18 14:58:01,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=279720.0, ans=0.0 2023-11-18 14:58:13,667 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5900, loss[loss=0.1107, simple_loss=0.1299, pruned_loss=0.0362, audio_tagging_loss=0.009564, over 16862.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.122, pruned_loss=0.03584, audio_tagging_loss=0.01149, over 3035299.30 frames. ], batch size: 62, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 14:58:17,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=279786.6666666667, ans=0.2 2023-11-18 14:58:19,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.18 vs. limit=15.0 2023-11-18 14:58:35,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279920.0, ans=0.1 2023-11-18 14:58:35,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=279920.0, ans=0.025 2023-11-18 14:58:40,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.73 vs. limit=6.0 2023-11-18 14:58:58,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.36 vs. limit=15.0 2023-11-18 14:59:08,890 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 5950, loss[loss=0.08995, simple_loss=0.1029, pruned_loss=0.02498, audio_tagging_loss=0.01349, over 14590.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1218, pruned_loss=0.03566, audio_tagging_loss=0.01153, over 3041430.60 frames. ], batch size: 54, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 14:59:09,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=280120.0, ans=0.125 2023-11-18 14:59:23,199 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 1.041e+02 1.163e+02 1.306e+02 1.742e+02, threshold=2.325e+02, percent-clipped=0.0 2023-11-18 14:59:48,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.21 vs. limit=22.5 2023-11-18 14:59:56,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2023-11-18 15:00:00,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=280386.6666666667, ans=0.0 2023-11-18 15:00:05,302 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6000, loss[loss=0.08748, simple_loss=0.09539, pruned_loss=0.02691, audio_tagging_loss=0.01288, over 14824.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1221, pruned_loss=0.0357, audio_tagging_loss=0.01155, over 3054589.11 frames. ], batch size: 60, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:00:05,302 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 15:00:38,402 INFO [train_asr.py:1147] (1/4) Epoch 4, validation: loss=0.07584, simple_loss=0.06235, pruned_loss=0.0102, audio_tagging_loss=0.03446, over 4681554.00 frames. 2023-11-18 15:00:38,403 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 15:00:40,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.89 vs. limit=15.0 2023-11-18 15:00:50,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=280520.0, ans=0.1 2023-11-18 15:00:51,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=15.0 2023-11-18 15:00:52,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-11-18 15:00:57,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=280520.0, ans=0.2 2023-11-18 15:01:03,657 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.65 vs. limit=15.0 2023-11-18 15:01:10,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2023-11-18 15:01:18,960 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:01:33,798 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6050, loss[loss=0.1159, simple_loss=0.1403, pruned_loss=0.0353, audio_tagging_loss=0.01042, over 16343.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1213, pruned_loss=0.03551, audio_tagging_loss=0.01156, over 3055214.16 frames. ], batch size: 58, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:01:47,523 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.202e+01 9.320e+01 1.035e+02 1.195e+02 1.658e+02, threshold=2.071e+02, percent-clipped=0.0 2023-11-18 15:02:02,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=280920.0, ans=0.125 2023-11-18 15:02:06,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=280986.6666666667, ans=0.125 2023-11-18 15:02:08,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=280986.6666666667, ans=0.125 2023-11-18 15:02:11,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=280986.6666666667, ans=0.1 2023-11-18 15:02:29,913 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6100, loss[loss=0.1044, simple_loss=0.1039, pruned_loss=0.03803, audio_tagging_loss=0.01438, over 14642.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1212, pruned_loss=0.03555, audio_tagging_loss=0.01164, over 3055499.37 frames. ], batch size: 57, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:02:40,761 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:03:09,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=281320.0, ans=0.0 2023-11-18 15:03:11,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=281320.0, ans=0.1 2023-11-18 15:03:24,781 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6150, loss[loss=0.1026, simple_loss=0.1159, pruned_loss=0.03317, audio_tagging_loss=0.01146, over 14787.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1211, pruned_loss=0.03559, audio_tagging_loss=0.01172, over 3059217.89 frames. ], batch size: 57, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:03:38,018 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 9.712e+01 1.096e+02 1.258e+02 1.781e+02, threshold=2.192e+02, percent-clipped=0.0 2023-11-18 15:03:45,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=281520.0, ans=0.125 2023-11-18 15:03:54,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-18 15:03:56,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=281586.6666666667, ans=0.125 2023-11-18 15:04:12,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=281720.0, ans=0.125 2023-11-18 15:04:20,403 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6200, loss[loss=0.102, simple_loss=0.117, pruned_loss=0.02746, audio_tagging_loss=0.01605, over 15273.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.12, pruned_loss=0.03526, audio_tagging_loss=0.01194, over 3047035.72 frames. ], batch size: 55, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:04:24,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=281786.6666666667, ans=0.125 2023-11-18 15:04:56,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=281986.6666666667, ans=0.2 2023-11-18 15:05:14,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=282053.3333333333, ans=0.0 2023-11-18 15:05:14,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.83 vs. limit=15.0 2023-11-18 15:05:16,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2023-11-18 15:05:17,030 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6250, loss[loss=0.1069, simple_loss=0.1029, pruned_loss=0.0382, audio_tagging_loss=0.01731, over 16506.00 frames. ], tot_loss[loss=0.1079, simple_loss=0.1205, pruned_loss=0.03559, audio_tagging_loss=0.01206, over 3050243.28 frames. ], batch size: 61, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:05:29,636 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 9.446e+01 1.080e+02 1.226e+02 1.932e+02, threshold=2.161e+02, percent-clipped=0.0 2023-11-18 15:05:45,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2023-11-18 15:06:11,969 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6300, loss[loss=0.1246, simple_loss=0.1421, pruned_loss=0.04375, audio_tagging_loss=0.009763, over 15729.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1199, pruned_loss=0.03532, audio_tagging_loss=0.01216, over 3049737.81 frames. ], batch size: 58, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:06:24,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=282520.0, ans=10.0 2023-11-18 15:06:28,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2023-11-18 15:06:42,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=282586.6666666667, ans=0.125 2023-11-18 15:06:45,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.58 vs. limit=15.0 2023-11-18 15:06:51,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=282653.3333333333, ans=0.0 2023-11-18 15:07:07,481 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6350, loss[loss=0.09963, simple_loss=0.1056, pruned_loss=0.03547, audio_tagging_loss=0.01137, over 14025.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1194, pruned_loss=0.03494, audio_tagging_loss=0.01215, over 3052337.54 frames. ], batch size: 55, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:07:12,380 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:07:19,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=282853.3333333333, ans=0.0 2023-11-18 15:07:21,673 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 9.648e+01 1.090e+02 1.229e+02 1.753e+02, threshold=2.179e+02, percent-clipped=0.0 2023-11-18 15:07:26,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=282853.3333333333, ans=0.125 2023-11-18 15:07:30,629 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.94 vs. limit=6.0 2023-11-18 15:07:46,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=282986.6666666667, ans=0.125 2023-11-18 15:07:49,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=282986.6666666667, ans=0.125 2023-11-18 15:07:49,205 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=282986.6666666667, ans=0.2 2023-11-18 15:07:54,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=283053.3333333333, ans=0.09899494936611666 2023-11-18 15:07:59,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=283053.3333333333, ans=0.125 2023-11-18 15:08:03,794 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6400, loss[loss=0.1006, simple_loss=0.1092, pruned_loss=0.03243, audio_tagging_loss=0.0136, over 15062.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1199, pruned_loss=0.03528, audio_tagging_loss=0.01226, over 3047409.20 frames. ], batch size: 55, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:08:04,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=283120.0, ans=0.2 2023-11-18 15:08:33,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=283253.3333333333, ans=0.125 2023-11-18 15:08:53,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=283386.6666666667, ans=0.125 2023-11-18 15:08:58,500 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6450, loss[loss=0.08863, simple_loss=0.1086, pruned_loss=0.0272, audio_tagging_loss=0.007118, over 14337.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1207, pruned_loss=0.03567, audio_tagging_loss=0.01224, over 3049179.43 frames. ], batch size: 54, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:09:11,021 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.683e+01 9.197e+01 1.014e+02 1.179e+02 1.440e+02, threshold=2.029e+02, percent-clipped=0.0 2023-11-18 15:09:12,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283520.0, ans=0.1 2023-11-18 15:09:12,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2023-11-18 15:09:14,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=283520.0, ans=0.125 2023-11-18 15:09:40,929 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:09:53,325 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6500, loss[loss=0.1142, simple_loss=0.1179, pruned_loss=0.0396, audio_tagging_loss=0.01563, over 14047.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1201, pruned_loss=0.03562, audio_tagging_loss=0.01212, over 3046837.91 frames. ], batch size: 54, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:09:58,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=283786.6666666667, ans=0.125 2023-11-18 15:10:04,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=283853.3333333333, ans=0.1 2023-11-18 15:10:20,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=283920.0, ans=0.125 2023-11-18 15:10:36,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2023-11-18 15:10:44,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=284053.3333333333, ans=0.0 2023-11-18 15:10:49,937 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6550, loss[loss=0.1313, simple_loss=0.1386, pruned_loss=0.05083, audio_tagging_loss=0.0111, over 14756.00 frames. ], tot_loss[loss=0.108, simple_loss=0.1207, pruned_loss=0.03576, audio_tagging_loss=0.01186, over 3047528.11 frames. ], batch size: 57, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:10:52,765 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:11:03,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 9.628e+01 1.072e+02 1.195e+02 1.710e+02, threshold=2.144e+02, percent-clipped=0.0 2023-11-18 15:11:14,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=284253.3333333333, ans=0.125 2023-11-18 15:11:15,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=284253.3333333333, ans=0.125 2023-11-18 15:11:18,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=284253.3333333333, ans=0.07 2023-11-18 15:11:45,580 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6600, loss[loss=0.1212, simple_loss=0.1369, pruned_loss=0.04369, audio_tagging_loss=0.009062, over 15573.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1214, pruned_loss=0.0359, audio_tagging_loss=0.0116, over 3045061.10 frames. ], batch size: 57, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:11:48,142 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.84 vs. limit=15.0 2023-11-18 15:11:57,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=284520.0, ans=0.1 2023-11-18 15:12:14,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=284586.6666666667, ans=0.2 2023-11-18 15:12:32,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=284720.0, ans=0.125 2023-11-18 15:12:40,469 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6650, loss[loss=0.09524, simple_loss=0.1162, pruned_loss=0.02913, audio_tagging_loss=0.008034, over 15299.00 frames. ], tot_loss[loss=0.108, simple_loss=0.1212, pruned_loss=0.03589, audio_tagging_loss=0.01153, over 3039119.30 frames. ], batch size: 55, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:12:51,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=284853.3333333333, ans=0.125 2023-11-18 15:12:54,222 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.970e+01 9.511e+01 1.065e+02 1.198e+02 1.619e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-18 15:13:01,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=22.5 2023-11-18 15:13:08,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=284920.0, ans=0.0 2023-11-18 15:13:19,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=284986.6666666667, ans=0.1 2023-11-18 15:13:22,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=284986.6666666667, ans=0.1 2023-11-18 15:13:34,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=285120.0, ans=0.125 2023-11-18 15:13:36,272 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6700, loss[loss=0.09648, simple_loss=0.1001, pruned_loss=0.03381, audio_tagging_loss=0.01263, over 14782.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1206, pruned_loss=0.03568, audio_tagging_loss=0.01147, over 3040954.73 frames. ], batch size: 58, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:13:44,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=285120.0, ans=0.125 2023-11-18 15:13:51,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=285186.6666666667, ans=0.015 2023-11-18 15:14:03,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.93 vs. limit=22.5 2023-11-18 15:14:32,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=285453.3333333333, ans=0.0 2023-11-18 15:14:32,996 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6750, loss[loss=0.1283, simple_loss=0.147, pruned_loss=0.04185, audio_tagging_loss=0.01295, over 16224.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1205, pruned_loss=0.03567, audio_tagging_loss=0.01145, over 3044538.69 frames. ], batch size: 58, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:14:38,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=285453.3333333333, ans=0.5 2023-11-18 15:14:38,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=285453.3333333333, ans=0.0 2023-11-18 15:14:45,690 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 9.541e+01 1.044e+02 1.172e+02 1.686e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 15:14:46,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=285520.0, ans=0.2 2023-11-18 15:14:59,658 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.11 vs. limit=10.0 2023-11-18 15:15:05,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=285653.3333333333, ans=0.125 2023-11-18 15:15:15,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=285653.3333333333, ans=0.1 2023-11-18 15:15:28,138 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6800, loss[loss=0.1665, simple_loss=0.194, pruned_loss=0.0593, audio_tagging_loss=0.01017, over 16157.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1216, pruned_loss=0.03607, audio_tagging_loss=0.01157, over 3043046.77 frames. ], batch size: 57, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:15:44,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=285853.3333333333, ans=0.125 2023-11-18 15:15:47,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=285853.3333333333, ans=0.125 2023-11-18 15:15:48,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=285853.3333333333, ans=0.09899494936611666 2023-11-18 15:15:55,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=285920.0, ans=0.0 2023-11-18 15:16:05,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2023-11-18 15:16:13,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=286053.3333333333, ans=0.05 2023-11-18 15:16:15,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=286053.3333333333, ans=0.0 2023-11-18 15:16:19,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=286053.3333333333, ans=0.0 2023-11-18 15:16:23,775 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6850, loss[loss=0.1027, simple_loss=0.1158, pruned_loss=0.03087, audio_tagging_loss=0.01398, over 17056.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1215, pruned_loss=0.03601, audio_tagging_loss=0.01152, over 3038406.74 frames. ], batch size: 61, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:16:26,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=286120.0, ans=0.125 2023-11-18 15:16:37,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=286186.6666666667, ans=0.0 2023-11-18 15:16:37,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.225e+01 9.571e+01 1.055e+02 1.193e+02 1.601e+02, threshold=2.111e+02, percent-clipped=0.0 2023-11-18 15:16:47,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=286253.3333333333, ans=0.0 2023-11-18 15:16:48,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.17 vs. limit=22.5 2023-11-18 15:17:16,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.08 vs. limit=10.0 2023-11-18 15:17:20,141 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6900, loss[loss=0.0952, simple_loss=0.1081, pruned_loss=0.02896, audio_tagging_loss=0.01218, over 14507.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1209, pruned_loss=0.03573, audio_tagging_loss=0.01157, over 3042117.38 frames. ], batch size: 54, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:17:28,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=286453.3333333333, ans=0.125 2023-11-18 15:17:43,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.89 vs. limit=22.5 2023-11-18 15:18:04,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=286720.0, ans=0.125 2023-11-18 15:18:04,994 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:18:05,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=286720.0, ans=0.0 2023-11-18 15:18:07,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=286720.0, ans=0.0 2023-11-18 15:18:15,636 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 6950, loss[loss=0.09826, simple_loss=0.1144, pruned_loss=0.03063, audio_tagging_loss=0.01044, over 15229.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1223, pruned_loss=0.03598, audio_tagging_loss=0.0114, over 3044582.81 frames. ], batch size: 57, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:18:17,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.86 vs. limit=22.5 2023-11-18 15:18:17,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=286786.6666666667, ans=0.125 2023-11-18 15:18:22,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=286786.6666666667, ans=0.0 2023-11-18 15:18:26,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.68 vs. limit=10.0 2023-11-18 15:18:28,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=286853.3333333333, ans=0.125 2023-11-18 15:18:28,743 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.933e+01 9.398e+01 1.033e+02 1.158e+02 1.660e+02, threshold=2.066e+02, percent-clipped=0.0 2023-11-18 15:18:33,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.93 vs. limit=12.0 2023-11-18 15:19:01,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.89 vs. limit=22.5 2023-11-18 15:19:05,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=287053.3333333333, ans=0.125 2023-11-18 15:19:11,042 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7000, loss[loss=0.09725, simple_loss=0.1067, pruned_loss=0.03231, audio_tagging_loss=0.01157, over 16192.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.1225, pruned_loss=0.03607, audio_tagging_loss=0.01145, over 3045895.72 frames. ], batch size: 60, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:19:27,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=287186.6666666667, ans=0.2 2023-11-18 15:19:35,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=287253.3333333333, ans=0.2 2023-11-18 15:19:47,185 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.18 vs. limit=15.0 2023-11-18 15:19:47,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=287320.0, ans=0.1 2023-11-18 15:19:55,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=287386.6666666667, ans=0.0 2023-11-18 15:19:56,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=287386.6666666667, ans=0.125 2023-11-18 15:20:05,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2023-11-18 15:20:07,124 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7050, loss[loss=0.08724, simple_loss=0.08881, pruned_loss=0.027, audio_tagging_loss=0.01584, over 14164.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1237, pruned_loss=0.03649, audio_tagging_loss=0.0115, over 3044190.37 frames. ], batch size: 55, lr: 1.62e-02, grad_scale: 64.0 2023-11-18 15:20:07,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=287453.3333333333, ans=0.125 2023-11-18 15:20:08,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=287453.3333333333, ans=0.0 2023-11-18 15:20:14,562 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=12.0 2023-11-18 15:20:20,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.175e+01 9.557e+01 1.044e+02 1.189e+02 1.971e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 15:20:49,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=287653.3333333333, ans=0.0 2023-11-18 15:21:02,547 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7100, loss[loss=0.1131, simple_loss=0.1306, pruned_loss=0.03646, audio_tagging_loss=0.01139, over 15150.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.123, pruned_loss=0.03623, audio_tagging_loss=0.01163, over 3051133.31 frames. ], batch size: 60, lr: 1.62e-02, grad_scale: 64.0 2023-11-18 15:21:10,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.06 vs. limit=10.0 2023-11-18 15:21:13,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=287853.3333333333, ans=0.125 2023-11-18 15:21:27,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=287920.0, ans=0.125 2023-11-18 15:21:33,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=287920.0, ans=0.0 2023-11-18 15:21:47,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=288053.3333333333, ans=0.125 2023-11-18 15:21:48,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=288053.3333333333, ans=0.125 2023-11-18 15:21:49,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=288053.3333333333, ans=0.125 2023-11-18 15:21:58,403 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7150, loss[loss=0.09086, simple_loss=0.08776, pruned_loss=0.03357, audio_tagging_loss=0.01341, over 15863.00 frames. ], tot_loss[loss=0.109, simple_loss=0.1224, pruned_loss=0.03609, audio_tagging_loss=0.01174, over 3053491.42 frames. ], batch size: 63, lr: 1.62e-02, grad_scale: 64.0 2023-11-18 15:22:12,085 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.925e+01 9.651e+01 1.094e+02 1.204e+02 1.585e+02, threshold=2.188e+02, percent-clipped=0.0 2023-11-18 15:22:19,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=288253.3333333333, ans=0.0 2023-11-18 15:22:27,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=288253.3333333333, ans=0.125 2023-11-18 15:22:35,030 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:22:54,520 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7200, loss[loss=0.08915, simple_loss=0.09604, pruned_loss=0.02678, audio_tagging_loss=0.01434, over 14734.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1211, pruned_loss=0.03576, audio_tagging_loss=0.01186, over 3048613.65 frames. ], batch size: 57, lr: 1.62e-02, grad_scale: 64.0 2023-11-18 15:23:10,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288520.0, ans=0.1 2023-11-18 15:23:14,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=288520.0, ans=0.125 2023-11-18 15:23:21,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=288586.6666666667, ans=0.125 2023-11-18 15:23:32,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=288653.3333333333, ans=0.125 2023-11-18 15:23:49,854 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7250, loss[loss=0.1354, simple_loss=0.1576, pruned_loss=0.0464, audio_tagging_loss=0.01018, over 15795.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1207, pruned_loss=0.03549, audio_tagging_loss=0.01199, over 3046424.22 frames. ], batch size: 59, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:23:59,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.08 vs. limit=22.5 2023-11-18 15:24:03,639 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 9.776e+01 1.072e+02 1.209e+02 1.575e+02, threshold=2.144e+02, percent-clipped=0.0 2023-11-18 15:24:14,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=288920.0, ans=0.1 2023-11-18 15:24:40,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=289053.3333333333, ans=0.07 2023-11-18 15:24:44,975 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7300, loss[loss=0.09645, simple_loss=0.1032, pruned_loss=0.03144, audio_tagging_loss=0.01341, over 14466.00 frames. ], tot_loss[loss=0.1073, simple_loss=0.1203, pruned_loss=0.03532, audio_tagging_loss=0.01183, over 3052902.68 frames. ], batch size: 56, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:24:59,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=289186.6666666667, ans=0.07 2023-11-18 15:25:40,810 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7350, loss[loss=0.1463, simple_loss=0.1751, pruned_loss=0.05146, audio_tagging_loss=0.007298, over 15788.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1213, pruned_loss=0.03558, audio_tagging_loss=0.01153, over 3050826.08 frames. ], batch size: 56, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:25:46,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=289453.3333333333, ans=0.125 2023-11-18 15:25:54,546 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.880e+01 9.633e+01 1.075e+02 1.263e+02 1.928e+02, threshold=2.150e+02, percent-clipped=0.0 2023-11-18 15:26:00,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=289520.0, ans=0.07 2023-11-18 15:26:10,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=289586.6666666667, ans=0.1 2023-11-18 15:26:16,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=289653.3333333333, ans=0.0 2023-11-18 15:26:30,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=289720.0, ans=0.125 2023-11-18 15:26:30,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=289720.0, ans=0.125 2023-11-18 15:26:35,456 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7400, loss[loss=0.1112, simple_loss=0.13, pruned_loss=0.03449, audio_tagging_loss=0.01168, over 14932.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1213, pruned_loss=0.03555, audio_tagging_loss=0.01144, over 3045046.56 frames. ], batch size: 53, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:26:39,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.90 vs. limit=15.0 2023-11-18 15:26:57,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=289920.0, ans=0.125 2023-11-18 15:27:05,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=12.0 2023-11-18 15:27:07,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=289920.0, ans=0.125 2023-11-18 15:27:19,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=290053.3333333333, ans=0.0 2023-11-18 15:27:30,959 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7450, loss[loss=0.09714, simple_loss=0.1185, pruned_loss=0.02733, audio_tagging_loss=0.01053, over 15197.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1212, pruned_loss=0.03553, audio_tagging_loss=0.01137, over 3043420.68 frames. ], batch size: 58, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:27:31,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=290120.0, ans=0.2 2023-11-18 15:27:41,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=290186.6666666667, ans=0.125 2023-11-18 15:27:44,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=290186.6666666667, ans=0.09899494936611666 2023-11-18 15:27:46,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 9.437e+01 1.026e+02 1.201e+02 2.000e+02, threshold=2.053e+02, percent-clipped=0.0 2023-11-18 15:27:51,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2023-11-18 15:27:52,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=12.0 2023-11-18 15:28:22,406 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2023-11-18 15:28:27,308 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7500, loss[loss=0.08917, simple_loss=0.1013, pruned_loss=0.02418, audio_tagging_loss=0.01434, over 14834.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1208, pruned_loss=0.03545, audio_tagging_loss=0.01139, over 3038958.30 frames. ], batch size: 57, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:28:35,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.44 vs. limit=15.0 2023-11-18 15:28:36,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=290520.0, ans=0.0 2023-11-18 15:28:37,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2023-11-18 15:28:37,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=12.0 2023-11-18 15:28:44,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=290520.0, ans=0.0 2023-11-18 15:28:46,465 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=290520.0, ans=0.125 2023-11-18 15:28:55,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=290586.6666666667, ans=0.125 2023-11-18 15:28:57,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2023-11-18 15:29:13,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=290720.0, ans=0.0 2023-11-18 15:29:22,435 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7550, loss[loss=0.1438, simple_loss=0.1749, pruned_loss=0.04782, audio_tagging_loss=0.008522, over 15603.00 frames. ], tot_loss[loss=0.107, simple_loss=0.1207, pruned_loss=0.0352, audio_tagging_loss=0.01144, over 3038279.63 frames. ], batch size: 58, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:29:28,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=290786.6666666667, ans=0.125 2023-11-18 15:29:34,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=290853.3333333333, ans=0.125 2023-11-18 15:29:36,113 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.781e+01 9.490e+01 1.043e+02 1.208e+02 1.931e+02, threshold=2.087e+02, percent-clipped=0.0 2023-11-18 15:29:51,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=290920.0, ans=0.125 2023-11-18 15:29:56,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=290986.6666666667, ans=0.0 2023-11-18 15:29:59,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=290986.6666666667, ans=0.0 2023-11-18 15:30:05,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=291053.3333333333, ans=0.125 2023-11-18 15:30:09,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=291053.3333333333, ans=0.0 2023-11-18 15:30:11,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=291053.3333333333, ans=0.125 2023-11-18 15:30:17,206 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7600, loss[loss=0.1007, simple_loss=0.1126, pruned_loss=0.03059, audio_tagging_loss=0.01378, over 14922.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1208, pruned_loss=0.03535, audio_tagging_loss=0.0115, over 3050664.79 frames. ], batch size: 61, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:30:22,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=291120.0, ans=0.0 2023-11-18 15:30:33,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=291186.6666666667, ans=0.035 2023-11-18 15:30:42,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=291253.3333333333, ans=0.1 2023-11-18 15:30:56,421 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:30:57,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=291320.0, ans=0.125 2023-11-18 15:31:09,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=291386.6666666667, ans=0.125 2023-11-18 15:31:13,065 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7650, loss[loss=0.07709, simple_loss=0.07739, pruned_loss=0.02232, audio_tagging_loss=0.01607, over 14364.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1205, pruned_loss=0.03539, audio_tagging_loss=0.01146, over 3044516.11 frames. ], batch size: 57, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:31:14,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=291453.3333333333, ans=0.125 2023-11-18 15:31:27,117 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 9.408e+01 1.037e+02 1.133e+02 1.442e+02, threshold=2.074e+02, percent-clipped=0.0 2023-11-18 15:31:44,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=291653.3333333333, ans=0.125 2023-11-18 15:31:53,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.74 vs. limit=22.5 2023-11-18 15:32:03,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=291720.0, ans=0.125 2023-11-18 15:32:05,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=291720.0, ans=0.0 2023-11-18 15:32:08,459 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7700, loss[loss=0.0982, simple_loss=0.115, pruned_loss=0.02924, audio_tagging_loss=0.01147, over 15003.00 frames. ], tot_loss[loss=0.1063, simple_loss=0.1195, pruned_loss=0.03507, audio_tagging_loss=0.0115, over 3043186.89 frames. ], batch size: 57, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:32:08,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=291786.6666666667, ans=0.1 2023-11-18 15:32:18,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=291853.3333333333, ans=0.125 2023-11-18 15:32:23,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=291853.3333333333, ans=0.2 2023-11-18 15:32:25,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=291853.3333333333, ans=0.2 2023-11-18 15:32:46,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=291986.6666666667, ans=0.125 2023-11-18 15:32:51,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=291986.6666666667, ans=0.1 2023-11-18 15:32:55,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=292053.3333333333, ans=0.125 2023-11-18 15:32:56,524 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:33:03,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.36 vs. limit=6.0 2023-11-18 15:33:03,730 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7750, loss[loss=0.1233, simple_loss=0.145, pruned_loss=0.04187, audio_tagging_loss=0.008952, over 15538.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1205, pruned_loss=0.03559, audio_tagging_loss=0.01154, over 3045964.53 frames. ], batch size: 57, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:33:18,448 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.257e+01 9.507e+01 1.083e+02 1.273e+02 2.415e+02, threshold=2.165e+02, percent-clipped=1.0 2023-11-18 15:33:20,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=292186.6666666667, ans=0.0 2023-11-18 15:33:20,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=292186.6666666667, ans=0.0 2023-11-18 15:33:51,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=292386.6666666667, ans=0.1 2023-11-18 15:33:59,633 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7800, loss[loss=0.1123, simple_loss=0.125, pruned_loss=0.03862, audio_tagging_loss=0.01122, over 15461.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1201, pruned_loss=0.0355, audio_tagging_loss=0.01157, over 3040969.59 frames. ], batch size: 58, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:34:27,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=292586.6666666667, ans=0.2 2023-11-18 15:34:29,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=22.5 2023-11-18 15:34:49,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=292720.0, ans=0.125 2023-11-18 15:34:50,987 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.93 vs. limit=15.0 2023-11-18 15:34:55,493 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7850, loss[loss=0.09823, simple_loss=0.1019, pruned_loss=0.0344, audio_tagging_loss=0.01288, over 15480.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.121, pruned_loss=0.03597, audio_tagging_loss=0.01166, over 3038417.71 frames. ], batch size: 57, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:35:03,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=292786.6666666667, ans=0.1 2023-11-18 15:35:05,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=292853.3333333333, ans=0.125 2023-11-18 15:35:09,098 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.474e+01 9.851e+01 1.052e+02 1.175e+02 1.725e+02, threshold=2.105e+02, percent-clipped=0.0 2023-11-18 15:35:22,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=292920.0, ans=0.0 2023-11-18 15:35:27,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2023-11-18 15:35:41,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=293053.3333333333, ans=0.035 2023-11-18 15:35:50,171 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7900, loss[loss=0.08051, simple_loss=0.08054, pruned_loss=0.02572, audio_tagging_loss=0.01453, over 15125.00 frames. ], tot_loss[loss=0.109, simple_loss=0.1223, pruned_loss=0.03621, audio_tagging_loss=0.01164, over 3034695.28 frames. ], batch size: 57, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:35:50,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=293120.0, ans=0.125 2023-11-18 15:36:18,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=293253.3333333333, ans=0.125 2023-11-18 15:36:30,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.85 vs. limit=12.0 2023-11-18 15:36:34,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=293320.0, ans=0.1 2023-11-18 15:36:36,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=293386.6666666667, ans=0.0 2023-11-18 15:36:43,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=293386.6666666667, ans=0.125 2023-11-18 15:36:47,574 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 7950, loss[loss=0.1253, simple_loss=0.1401, pruned_loss=0.04369, audio_tagging_loss=0.01158, over 14616.00 frames. ], tot_loss[loss=0.109, simple_loss=0.1221, pruned_loss=0.03618, audio_tagging_loss=0.01176, over 3041968.25 frames. ], batch size: 54, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:36:48,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=293453.3333333333, ans=0.125 2023-11-18 15:36:55,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=293453.3333333333, ans=0.125 2023-11-18 15:37:02,816 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.429e+01 9.694e+01 1.093e+02 1.229e+02 1.791e+02, threshold=2.186e+02, percent-clipped=0.0 2023-11-18 15:37:02,875 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:37:43,991 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8000, loss[loss=0.1087, simple_loss=0.1199, pruned_loss=0.03482, audio_tagging_loss=0.01393, over 14763.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.1222, pruned_loss=0.03625, audio_tagging_loss=0.01189, over 3041857.89 frames. ], batch size: 55, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:37:56,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2023-11-18 15:37:56,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=293853.3333333333, ans=0.09899494936611666 2023-11-18 15:38:00,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=293853.3333333333, ans=0.1 2023-11-18 15:38:17,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=293986.6666666667, ans=0.0 2023-11-18 15:38:38,559 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8050, loss[loss=0.1148, simple_loss=0.1351, pruned_loss=0.03714, audio_tagging_loss=0.01017, over 15309.00 frames. ], tot_loss[loss=0.109, simple_loss=0.1222, pruned_loss=0.03614, audio_tagging_loss=0.01178, over 3038219.63 frames. ], batch size: 56, lr: 1.61e-02, grad_scale: 16.0 2023-11-18 15:38:38,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=294120.0, ans=0.07 2023-11-18 15:38:45,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=294120.0, ans=0.0 2023-11-18 15:38:53,841 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.548e+01 1.018e+02 1.096e+02 1.204e+02 1.820e+02, threshold=2.193e+02, percent-clipped=0.0 2023-11-18 15:39:15,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=294320.0, ans=0.0 2023-11-18 15:39:22,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=294386.6666666667, ans=0.125 2023-11-18 15:39:24,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=294386.6666666667, ans=0.125 2023-11-18 15:39:31,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=294386.6666666667, ans=0.125 2023-11-18 15:39:33,370 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8100, loss[loss=0.1263, simple_loss=0.1546, pruned_loss=0.0387, audio_tagging_loss=0.01023, over 16289.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1217, pruned_loss=0.03578, audio_tagging_loss=0.01173, over 3044762.36 frames. ], batch size: 57, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:39:33,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=294453.3333333333, ans=0.025 2023-11-18 15:39:35,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2023-11-18 15:39:36,958 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.47 vs. limit=15.0 2023-11-18 15:39:37,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=294453.3333333333, ans=0.125 2023-11-18 15:39:41,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.49 vs. limit=10.0 2023-11-18 15:40:22,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=294720.0, ans=0.0 2023-11-18 15:40:29,785 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8150, loss[loss=0.09018, simple_loss=0.108, pruned_loss=0.02719, audio_tagging_loss=0.00901, over 14515.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1212, pruned_loss=0.03549, audio_tagging_loss=0.01157, over 3047806.58 frames. ], batch size: 53, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:40:34,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2023-11-18 15:40:37,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=294786.6666666667, ans=0.125 2023-11-18 15:40:44,588 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.218e+01 9.329e+01 1.045e+02 1.150e+02 1.655e+02, threshold=2.090e+02, percent-clipped=0.0 2023-11-18 15:40:49,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=294853.3333333333, ans=0.0 2023-11-18 15:40:51,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=294920.0, ans=0.125 2023-11-18 15:40:53,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2023-11-18 15:40:56,306 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.84 vs. limit=15.0 2023-11-18 15:41:24,204 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8200, loss[loss=0.1349, simple_loss=0.1473, pruned_loss=0.05181, audio_tagging_loss=0.009487, over 16388.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1211, pruned_loss=0.03539, audio_tagging_loss=0.01147, over 3041346.40 frames. ], batch size: 60, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:41:25,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=295120.0, ans=0.0 2023-11-18 15:41:26,334 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:41:57,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=295320.0, ans=0.0 2023-11-18 15:42:11,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=295386.6666666667, ans=0.0 2023-11-18 15:42:19,560 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8250, loss[loss=0.1154, simple_loss=0.134, pruned_loss=0.03727, audio_tagging_loss=0.01114, over 14890.00 frames. ], tot_loss[loss=0.108, simple_loss=0.1218, pruned_loss=0.03567, audio_tagging_loss=0.01142, over 3044179.52 frames. ], batch size: 54, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:42:34,806 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.584e+01 9.274e+01 1.030e+02 1.127e+02 2.119e+02, threshold=2.060e+02, percent-clipped=1.0 2023-11-18 15:42:44,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=295586.6666666667, ans=0.07 2023-11-18 15:42:49,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=295586.6666666667, ans=0.07 2023-11-18 15:42:51,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=295653.3333333333, ans=0.125 2023-11-18 15:42:54,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=295653.3333333333, ans=0.07 2023-11-18 15:42:57,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=295653.3333333333, ans=0.125 2023-11-18 15:42:57,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=295653.3333333333, ans=0.05 2023-11-18 15:43:15,125 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8300, loss[loss=0.1263, simple_loss=0.1451, pruned_loss=0.04411, audio_tagging_loss=0.009655, over 15420.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1217, pruned_loss=0.03551, audio_tagging_loss=0.01135, over 3051244.05 frames. ], batch size: 58, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:43:22,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=295786.6666666667, ans=0.125 2023-11-18 15:43:25,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2023-11-18 15:43:34,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=295853.3333333333, ans=0.125 2023-11-18 15:43:46,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=295920.0, ans=0.125 2023-11-18 15:44:08,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=296053.3333333333, ans=0.2 2023-11-18 15:44:09,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=296053.3333333333, ans=0.04949747468305833 2023-11-18 15:44:11,153 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8350, loss[loss=0.0842, simple_loss=0.09594, pruned_loss=0.027, audio_tagging_loss=0.00922, over 14634.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1207, pruned_loss=0.03515, audio_tagging_loss=0.0114, over 3053505.05 frames. ], batch size: 57, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:44:18,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=296120.0, ans=0.0 2023-11-18 15:44:19,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=296120.0, ans=0.0 2023-11-18 15:44:26,494 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.804e+01 9.554e+01 1.077e+02 1.196e+02 1.483e+02, threshold=2.155e+02, percent-clipped=0.0 2023-11-18 15:44:44,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=296320.0, ans=0.0 2023-11-18 15:44:51,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=296320.0, ans=0.0 2023-11-18 15:44:54,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=296386.6666666667, ans=0.2 2023-11-18 15:45:06,081 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8400, loss[loss=0.1135, simple_loss=0.1314, pruned_loss=0.03634, audio_tagging_loss=0.01149, over 15391.00 frames. ], tot_loss[loss=0.1057, simple_loss=0.1196, pruned_loss=0.03458, audio_tagging_loss=0.01132, over 3052882.60 frames. ], batch size: 56, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:45:30,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=12.0 2023-11-18 15:45:43,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=296653.3333333333, ans=0.1 2023-11-18 15:45:56,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=296720.0, ans=0.0 2023-11-18 15:46:02,606 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8450, loss[loss=0.09945, simple_loss=0.1161, pruned_loss=0.0319, audio_tagging_loss=0.009522, over 15208.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1196, pruned_loss=0.0348, audio_tagging_loss=0.01146, over 3050399.63 frames. ], batch size: 56, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:46:17,856 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.027e+01 9.338e+01 1.042e+02 1.138e+02 1.608e+02, threshold=2.084e+02, percent-clipped=0.0 2023-11-18 15:46:24,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.89 vs. limit=22.5 2023-11-18 15:46:26,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=296920.0, ans=0.125 2023-11-18 15:46:38,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=296986.6666666667, ans=0.125 2023-11-18 15:46:44,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=296986.6666666667, ans=0.125 2023-11-18 15:46:53,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=297053.3333333333, ans=0.0 2023-11-18 15:46:57,412 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8500, loss[loss=0.1226, simple_loss=0.1452, pruned_loss=0.03854, audio_tagging_loss=0.01148, over 15658.00 frames. ], tot_loss[loss=0.1058, simple_loss=0.1192, pruned_loss=0.03469, audio_tagging_loss=0.01155, over 3049592.14 frames. ], batch size: 57, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:47:09,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.66 vs. limit=15.0 2023-11-18 15:47:13,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=297186.6666666667, ans=0.09899494936611666 2023-11-18 15:47:17,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=297186.6666666667, ans=0.125 2023-11-18 15:47:22,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=297253.3333333333, ans=0.0 2023-11-18 15:47:33,011 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.60 vs. limit=15.0 2023-11-18 15:47:45,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=297386.6666666667, ans=0.125 2023-11-18 15:47:45,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=297386.6666666667, ans=0.1 2023-11-18 15:47:47,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=297386.6666666667, ans=0.125 2023-11-18 15:47:53,006 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8550, loss[loss=0.08399, simple_loss=0.08941, pruned_loss=0.02851, audio_tagging_loss=0.01078, over 15033.00 frames. ], tot_loss[loss=0.1058, simple_loss=0.119, pruned_loss=0.03464, audio_tagging_loss=0.01163, over 3041293.51 frames. ], batch size: 58, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:48:02,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=297453.3333333333, ans=0.125 2023-11-18 15:48:09,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.282e+01 9.983e+01 1.095e+02 1.210e+02 1.627e+02, threshold=2.189e+02, percent-clipped=0.0 2023-11-18 15:48:09,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297520.0, ans=0.1 2023-11-18 15:48:09,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=297520.0, ans=0.1 2023-11-18 15:48:18,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=297586.6666666667, ans=0.0 2023-11-18 15:48:28,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=297653.3333333333, ans=0.1 2023-11-18 15:48:30,468 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=12.0 2023-11-18 15:48:41,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.39 vs. limit=10.0 2023-11-18 15:48:49,350 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8600, loss[loss=0.1304, simple_loss=0.1456, pruned_loss=0.04679, audio_tagging_loss=0.01078, over 14512.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1201, pruned_loss=0.03514, audio_tagging_loss=0.01158, over 3047282.41 frames. ], batch size: 53, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:49:15,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=297920.0, ans=0.0 2023-11-18 15:49:25,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=297986.6666666667, ans=0.125 2023-11-18 15:49:27,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=297986.6666666667, ans=0.0 2023-11-18 15:49:27,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.63 vs. limit=22.5 2023-11-18 15:49:43,350 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8650, loss[loss=0.08661, simple_loss=0.08845, pruned_loss=0.02656, audio_tagging_loss=0.01582, over 14635.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1206, pruned_loss=0.03538, audio_tagging_loss=0.0115, over 3047188.97 frames. ], batch size: 57, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:49:58,604 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.945e+01 9.623e+01 1.078e+02 1.210e+02 1.696e+02, threshold=2.155e+02, percent-clipped=0.0 2023-11-18 15:50:07,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=298253.3333333333, ans=0.2 2023-11-18 15:50:19,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=298320.0, ans=0.125 2023-11-18 15:50:25,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=298320.0, ans=0.09899494936611666 2023-11-18 15:50:31,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=298386.6666666667, ans=0.125 2023-11-18 15:50:31,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=298386.6666666667, ans=0.125 2023-11-18 15:50:38,378 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8700, loss[loss=0.1139, simple_loss=0.1263, pruned_loss=0.03908, audio_tagging_loss=0.01167, over 15706.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.1226, pruned_loss=0.03606, audio_tagging_loss=0.01159, over 3051868.94 frames. ], batch size: 58, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:50:39,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=298453.3333333333, ans=0.125 2023-11-18 15:50:39,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=298453.3333333333, ans=0.0 2023-11-18 15:50:54,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=298520.0, ans=15.0 2023-11-18 15:51:04,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=298586.6666666667, ans=0.125 2023-11-18 15:51:32,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2023-11-18 15:51:33,502 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8750, loss[loss=0.1009, simple_loss=0.1185, pruned_loss=0.03113, audio_tagging_loss=0.01057, over 15437.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.1224, pruned_loss=0.03604, audio_tagging_loss=0.01157, over 3053360.23 frames. ], batch size: 57, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:51:48,793 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.159e+01 9.840e+01 1.091e+02 1.232e+02 1.815e+02, threshold=2.181e+02, percent-clipped=0.0 2023-11-18 15:52:28,372 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8800, loss[loss=0.1227, simple_loss=0.1443, pruned_loss=0.03968, audio_tagging_loss=0.01093, over 15682.00 frames. ], tot_loss[loss=0.1087, simple_loss=0.1223, pruned_loss=0.0359, audio_tagging_loss=0.0117, over 3057160.82 frames. ], batch size: 56, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:52:37,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=299120.0, ans=0.125 2023-11-18 15:52:39,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=299186.6666666667, ans=0.125 2023-11-18 15:52:41,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-11-18 15:53:09,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=299320.0, ans=0.0 2023-11-18 15:53:20,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=299386.6666666667, ans=0.0 2023-11-18 15:53:22,533 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8850, loss[loss=0.1033, simple_loss=0.1196, pruned_loss=0.02996, audio_tagging_loss=0.01356, over 15580.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1216, pruned_loss=0.03556, audio_tagging_loss=0.01179, over 3055428.98 frames. ], batch size: 56, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:53:35,244 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:53:38,341 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 9.407e+01 1.047e+02 1.181e+02 1.757e+02, threshold=2.094e+02, percent-clipped=0.0 2023-11-18 15:53:45,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=299586.6666666667, ans=0.0 2023-11-18 15:53:53,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=299586.6666666667, ans=0.0 2023-11-18 15:54:04,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=299653.3333333333, ans=0.125 2023-11-18 15:54:05,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=299720.0, ans=0.125 2023-11-18 15:54:15,659 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.32 vs. limit=5.0 2023-11-18 15:54:17,928 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8900, loss[loss=0.1253, simple_loss=0.1473, pruned_loss=0.0428, audio_tagging_loss=0.008859, over 15521.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.1228, pruned_loss=0.03584, audio_tagging_loss=0.01161, over 3058066.48 frames. ], batch size: 57, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:54:24,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=299786.6666666667, ans=0.125 2023-11-18 15:54:24,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=299786.6666666667, ans=0.1 2023-11-18 15:54:26,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=299786.6666666667, ans=0.1 2023-11-18 15:54:43,322 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2023-11-18 15:54:47,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=299920.0, ans=0.125 2023-11-18 15:54:51,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=299986.6666666667, ans=0.2 2023-11-18 15:54:59,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=299986.6666666667, ans=0.1 2023-11-18 15:55:12,595 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 8950, loss[loss=0.07328, simple_loss=0.08619, pruned_loss=0.01776, audio_tagging_loss=0.01243, over 14852.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1223, pruned_loss=0.03568, audio_tagging_loss=0.01154, over 3050474.36 frames. ], batch size: 57, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:55:27,259 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 9.233e+01 1.016e+02 1.150e+02 1.659e+02, threshold=2.033e+02, percent-clipped=0.0 2023-11-18 15:55:27,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.41 vs. limit=15.0 2023-11-18 15:55:43,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=300253.3333333333, ans=0.1 2023-11-18 15:55:54,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=300320.0, ans=0.125 2023-11-18 15:55:59,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=300386.6666666667, ans=0.2 2023-11-18 15:55:59,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=300386.6666666667, ans=0.125 2023-11-18 15:56:06,848 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9000, loss[loss=0.0996, simple_loss=0.1143, pruned_loss=0.0314, audio_tagging_loss=0.01106, over 14933.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1219, pruned_loss=0.03568, audio_tagging_loss=0.01152, over 3053921.68 frames. ], batch size: 58, lr: 1.59e-02, grad_scale: 16.0 2023-11-18 15:56:06,849 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 15:56:40,136 INFO [train_asr.py:1147] (1/4) Epoch 4, validation: loss=0.07668, simple_loss=0.06181, pruned_loss=0.009869, audio_tagging_loss=0.03591, over 4681554.00 frames. 2023-11-18 15:56:40,136 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 15:56:44,896 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.89 vs. limit=15.0 2023-11-18 15:56:59,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=300520.0, ans=0.035 2023-11-18 15:57:14,017 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=12.0 2023-11-18 15:57:14,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2023-11-18 15:57:28,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-11-18 15:57:34,507 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9050, loss[loss=0.1433, simple_loss=0.1536, pruned_loss=0.05514, audio_tagging_loss=0.01137, over 14337.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1215, pruned_loss=0.03562, audio_tagging_loss=0.01141, over 3057642.46 frames. ], batch size: 55, lr: 1.59e-02, grad_scale: 16.0 2023-11-18 15:57:46,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=300853.3333333333, ans=0.0 2023-11-18 15:57:50,189 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 9.255e+01 1.039e+02 1.147e+02 2.056e+02, threshold=2.078e+02, percent-clipped=1.0 2023-11-18 15:58:04,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2023-11-18 15:58:06,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=300986.6666666667, ans=0.2 2023-11-18 15:58:14,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.25 vs. limit=15.0 2023-11-18 15:58:24,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=301053.3333333333, ans=0.125 2023-11-18 15:58:28,441 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9100, loss[loss=0.07567, simple_loss=0.07094, pruned_loss=0.02438, audio_tagging_loss=0.01582, over 15828.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1205, pruned_loss=0.03521, audio_tagging_loss=0.01138, over 3052892.37 frames. ], batch size: 61, lr: 1.59e-02, grad_scale: 16.0 2023-11-18 15:58:29,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2023-11-18 15:58:29,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=301120.0, ans=0.125 2023-11-18 15:58:30,716 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:58:36,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=301120.0, ans=0.2 2023-11-18 15:58:39,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=301186.6666666667, ans=0.0 2023-11-18 15:58:55,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=301253.3333333333, ans=0.2 2023-11-18 15:59:00,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=301253.3333333333, ans=0.0 2023-11-18 15:59:02,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=301320.0, ans=0.0 2023-11-18 15:59:23,957 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9150, loss[loss=0.09129, simple_loss=0.092, pruned_loss=0.03206, audio_tagging_loss=0.01323, over 14852.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1205, pruned_loss=0.03515, audio_tagging_loss=0.0114, over 3045984.91 frames. ], batch size: 58, lr: 1.59e-02, grad_scale: 16.0 2023-11-18 15:59:24,510 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2023-11-18 15:59:29,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.91 vs. limit=22.5 2023-11-18 15:59:34,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=301453.3333333333, ans=0.125 2023-11-18 15:59:38,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.55 vs. limit=15.0 2023-11-18 15:59:41,557 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.692e+01 9.042e+01 1.024e+02 1.134e+02 1.471e+02, threshold=2.048e+02, percent-clipped=0.0 2023-11-18 15:59:41,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=301520.0, ans=0.125 2023-11-18 15:59:43,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=301520.0, ans=0.1 2023-11-18 15:59:45,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2023-11-18 15:59:46,625 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.85 vs. limit=22.5 2023-11-18 15:59:48,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=301586.6666666667, ans=0.125 2023-11-18 15:59:49,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=301586.6666666667, ans=0.125 2023-11-18 15:59:52,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=301586.6666666667, ans=0.125 2023-11-18 16:00:13,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=301720.0, ans=0.025 2023-11-18 16:00:21,039 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9200, loss[loss=0.07348, simple_loss=0.0844, pruned_loss=0.02007, audio_tagging_loss=0.01121, over 14220.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1211, pruned_loss=0.03524, audio_tagging_loss=0.01134, over 3043683.44 frames. ], batch size: 55, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 16:00:22,275 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:00:29,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=301786.6666666667, ans=10.0 2023-11-18 16:00:38,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=301853.3333333333, ans=0.04949747468305833 2023-11-18 16:00:40,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=301853.3333333333, ans=0.0 2023-11-18 16:00:45,281 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:01:12,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=302053.3333333333, ans=0.0 2023-11-18 16:01:16,224 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9250, loss[loss=0.1133, simple_loss=0.1286, pruned_loss=0.03799, audio_tagging_loss=0.01106, over 16087.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.121, pruned_loss=0.03542, audio_tagging_loss=0.01141, over 3039496.42 frames. ], batch size: 59, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:01:30,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=302186.6666666667, ans=0.125 2023-11-18 16:01:33,126 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 9.477e+01 1.067e+02 1.208e+02 1.657e+02, threshold=2.134e+02, percent-clipped=0.0 2023-11-18 16:01:49,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=302320.0, ans=0.125 2023-11-18 16:01:55,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=302320.0, ans=0.04949747468305833 2023-11-18 16:02:03,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=302386.6666666667, ans=10.0 2023-11-18 16:02:08,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=302386.6666666667, ans=0.125 2023-11-18 16:02:11,898 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9300, loss[loss=0.09913, simple_loss=0.1091, pruned_loss=0.03363, audio_tagging_loss=0.01097, over 15454.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1204, pruned_loss=0.0354, audio_tagging_loss=0.01147, over 3036332.08 frames. ], batch size: 57, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:02:21,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302453.3333333333, ans=0.1 2023-11-18 16:02:25,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=302520.0, ans=0.125 2023-11-18 16:02:31,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=302520.0, ans=0.0 2023-11-18 16:02:33,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=302586.6666666667, ans=0.125 2023-11-18 16:02:42,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.45 vs. limit=15.0 2023-11-18 16:02:46,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=302653.3333333333, ans=0.125 2023-11-18 16:02:46,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=302653.3333333333, ans=0.0 2023-11-18 16:03:05,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=302720.0, ans=0.125 2023-11-18 16:03:09,204 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9350, loss[loss=0.1445, simple_loss=0.1715, pruned_loss=0.04774, audio_tagging_loss=0.01104, over 15551.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1206, pruned_loss=0.03529, audio_tagging_loss=0.01153, over 3043165.70 frames. ], batch size: 56, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:03:14,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=302786.6666666667, ans=10.0 2023-11-18 16:03:18,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.17 vs. limit=10.0 2023-11-18 16:03:24,974 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 9.020e+01 1.030e+02 1.167e+02 1.548e+02, threshold=2.059e+02, percent-clipped=0.0 2023-11-18 16:03:29,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=302920.0, ans=0.125 2023-11-18 16:03:44,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2023-11-18 16:03:47,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=302986.6666666667, ans=0.2 2023-11-18 16:03:49,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302986.6666666667, ans=0.1 2023-11-18 16:04:04,197 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9400, loss[loss=0.08912, simple_loss=0.09245, pruned_loss=0.02501, audio_tagging_loss=0.01789, over 14252.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.121, pruned_loss=0.0355, audio_tagging_loss=0.01161, over 3044071.63 frames. ], batch size: 55, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:04:20,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=303186.6666666667, ans=0.125 2023-11-18 16:04:36,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-11-18 16:04:38,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=303320.0, ans=0.125 2023-11-18 16:04:40,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=303320.0, ans=0.125 2023-11-18 16:04:46,936 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:05:00,033 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9450, loss[loss=0.1176, simple_loss=0.1283, pruned_loss=0.04032, audio_tagging_loss=0.01312, over 15097.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1204, pruned_loss=0.03513, audio_tagging_loss=0.01173, over 3042278.14 frames. ], batch size: 55, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:05:00,069 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:05:17,120 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.847e+01 9.577e+01 1.061e+02 1.222e+02 1.461e+02, threshold=2.121e+02, percent-clipped=0.0 2023-11-18 16:05:21,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=303520.0, ans=0.0 2023-11-18 16:05:26,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=303586.6666666667, ans=0.125 2023-11-18 16:05:28,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=303586.6666666667, ans=0.1 2023-11-18 16:05:34,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=303653.3333333333, ans=0.1 2023-11-18 16:05:50,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=303720.0, ans=0.125 2023-11-18 16:05:56,428 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9500, loss[loss=0.1055, simple_loss=0.1172, pruned_loss=0.0358, audio_tagging_loss=0.01112, over 15597.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1205, pruned_loss=0.03529, audio_tagging_loss=0.01186, over 3043337.83 frames. ], batch size: 57, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:05:59,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=303786.6666666667, ans=0.125 2023-11-18 16:06:02,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=12.0 2023-11-18 16:06:13,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.27 vs. limit=22.5 2023-11-18 16:06:25,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.00 vs. limit=10.0 2023-11-18 16:06:33,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2023-11-18 16:06:52,140 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9550, loss[loss=0.09753, simple_loss=0.1052, pruned_loss=0.03133, audio_tagging_loss=0.01359, over 15231.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1206, pruned_loss=0.03526, audio_tagging_loss=0.01198, over 3039435.18 frames. ], batch size: 59, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:07:08,538 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.602e+01 1.044e+02 1.160e+02 1.697e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 16:07:08,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=304186.6666666667, ans=0.0 2023-11-18 16:07:09,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=304186.6666666667, ans=0.125 2023-11-18 16:07:29,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.66 vs. limit=15.0 2023-11-18 16:07:30,331 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.09 vs. limit=6.0 2023-11-18 16:07:41,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=304386.6666666667, ans=0.125 2023-11-18 16:07:48,092 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9600, loss[loss=0.1448, simple_loss=0.1602, pruned_loss=0.04969, audio_tagging_loss=0.015, over 15765.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1201, pruned_loss=0.03521, audio_tagging_loss=0.01211, over 3045297.34 frames. ], batch size: 57, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:07:52,757 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.85 vs. limit=22.5 2023-11-18 16:07:58,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=304520.0, ans=0.0 2023-11-18 16:08:04,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=304520.0, ans=0.0 2023-11-18 16:08:08,704 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=12.0 2023-11-18 16:08:39,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=304720.0, ans=0.125 2023-11-18 16:08:44,187 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9650, loss[loss=0.1166, simple_loss=0.1291, pruned_loss=0.04124, audio_tagging_loss=0.01088, over 16162.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.1189, pruned_loss=0.03494, audio_tagging_loss=0.01203, over 3048793.43 frames. ], batch size: 61, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:08:52,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=304786.6666666667, ans=0.125 2023-11-18 16:09:00,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.907e+01 9.312e+01 1.013e+02 1.091e+02 1.612e+02, threshold=2.027e+02, percent-clipped=0.0 2023-11-18 16:09:03,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=304853.3333333333, ans=0.0 2023-11-18 16:09:07,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.02 vs. limit=12.0 2023-11-18 16:09:09,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.40 vs. limit=22.5 2023-11-18 16:09:23,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=304986.6666666667, ans=0.0 2023-11-18 16:09:38,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=305120.0, ans=0.125 2023-11-18 16:09:38,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.63 vs. limit=10.0 2023-11-18 16:09:39,305 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9700, loss[loss=0.1065, simple_loss=0.115, pruned_loss=0.03777, audio_tagging_loss=0.01121, over 15778.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.1193, pruned_loss=0.0351, audio_tagging_loss=0.01178, over 3050461.37 frames. ], batch size: 59, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:09:41,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=305120.0, ans=0.0 2023-11-18 16:10:27,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=305386.6666666667, ans=0.125 2023-11-18 16:10:30,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.99 vs. limit=22.5 2023-11-18 16:10:35,412 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9750, loss[loss=0.09401, simple_loss=0.09842, pruned_loss=0.03204, audio_tagging_loss=0.01276, over 16684.00 frames. ], tot_loss[loss=0.1066, simple_loss=0.1197, pruned_loss=0.03513, audio_tagging_loss=0.01159, over 3051420.99 frames. ], batch size: 62, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:10:53,088 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 9.337e+01 1.028e+02 1.130e+02 1.491e+02, threshold=2.056e+02, percent-clipped=0.0 2023-11-18 16:11:01,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=305586.6666666667, ans=0.0 2023-11-18 16:11:03,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=305586.6666666667, ans=0.125 2023-11-18 16:11:03,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=305586.6666666667, ans=0.07 2023-11-18 16:11:09,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.69 vs. limit=10.0 2023-11-18 16:11:32,500 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9800, loss[loss=0.1016, simple_loss=0.1214, pruned_loss=0.03117, audio_tagging_loss=0.009769, over 15611.00 frames. ], tot_loss[loss=0.1073, simple_loss=0.1211, pruned_loss=0.03541, audio_tagging_loss=0.01137, over 3051376.39 frames. ], batch size: 57, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:11:40,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=305786.6666666667, ans=0.125 2023-11-18 16:11:49,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=305853.3333333333, ans=0.025 2023-11-18 16:11:59,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=305920.0, ans=0.125 2023-11-18 16:12:05,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.47 vs. limit=15.0 2023-11-18 16:12:15,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.82 vs. limit=15.0 2023-11-18 16:12:20,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=306053.3333333333, ans=0.125 2023-11-18 16:12:23,799 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:12:28,115 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9850, loss[loss=0.1177, simple_loss=0.1448, pruned_loss=0.03591, audio_tagging_loss=0.009346, over 15054.00 frames. ], tot_loss[loss=0.109, simple_loss=0.1235, pruned_loss=0.03602, audio_tagging_loss=0.01119, over 3053972.76 frames. ], batch size: 55, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:12:34,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=306120.0, ans=0.025 2023-11-18 16:12:35,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=12.0 2023-11-18 16:12:45,060 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.089e+01 9.433e+01 1.029e+02 1.148e+02 1.487e+02, threshold=2.058e+02, percent-clipped=0.0 2023-11-18 16:12:58,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=306253.3333333333, ans=0.025 2023-11-18 16:12:59,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=306253.3333333333, ans=0.2 2023-11-18 16:13:15,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=306386.6666666667, ans=0.2 2023-11-18 16:13:23,947 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9900, loss[loss=0.1024, simple_loss=0.1227, pruned_loss=0.03165, audio_tagging_loss=0.009379, over 16046.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1221, pruned_loss=0.03549, audio_tagging_loss=0.01128, over 3050247.11 frames. ], batch size: 62, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:13:33,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=306453.3333333333, ans=0.0 2023-11-18 16:13:39,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=306520.0, ans=0.1 2023-11-18 16:13:58,878 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:14:17,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=306720.0, ans=0.09899494936611666 2023-11-18 16:14:18,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=306720.0, ans=0.0 2023-11-18 16:14:20,556 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 9950, loss[loss=0.08505, simple_loss=0.1027, pruned_loss=0.02322, audio_tagging_loss=0.01048, over 14546.00 frames. ], tot_loss[loss=0.1079, simple_loss=0.1221, pruned_loss=0.03542, audio_tagging_loss=0.01137, over 3051394.45 frames. ], batch size: 54, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:14:23,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=306786.6666666667, ans=0.95 2023-11-18 16:14:25,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=306786.6666666667, ans=0.125 2023-11-18 16:14:36,439 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.909e+01 9.576e+01 1.088e+02 1.219e+02 1.506e+02, threshold=2.175e+02, percent-clipped=0.0 2023-11-18 16:14:41,229 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-11-18 16:14:46,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=306920.0, ans=0.2 2023-11-18 16:14:49,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=306920.0, ans=0.1 2023-11-18 16:14:56,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.54 vs. limit=6.0 2023-11-18 16:15:10,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=307053.3333333333, ans=0.025 2023-11-18 16:15:13,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=307053.3333333333, ans=0.0 2023-11-18 16:15:15,737 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10000, loss[loss=0.1242, simple_loss=0.1469, pruned_loss=0.04084, audio_tagging_loss=0.009864, over 14540.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1207, pruned_loss=0.03496, audio_tagging_loss=0.01146, over 3049338.18 frames. ], batch size: 54, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:15:30,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2023-11-18 16:16:11,297 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10050, loss[loss=0.1319, simple_loss=0.1651, pruned_loss=0.0435, audio_tagging_loss=0.005851, over 15807.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1216, pruned_loss=0.03504, audio_tagging_loss=0.01132, over 3053292.54 frames. ], batch size: 56, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:16:12,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=307453.3333333333, ans=0.125 2023-11-18 16:16:29,453 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 9.427e+01 1.040e+02 1.141e+02 1.376e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-18 16:16:34,053 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=307586.6666666667, ans=0.125 2023-11-18 16:16:51,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=307653.3333333333, ans=0.2 2023-11-18 16:17:08,302 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10100, loss[loss=0.1085, simple_loss=0.1186, pruned_loss=0.03547, audio_tagging_loss=0.01372, over 16665.00 frames. ], tot_loss[loss=0.107, simple_loss=0.1211, pruned_loss=0.03505, audio_tagging_loss=0.01143, over 3048057.45 frames. ], batch size: 64, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:17:09,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=307786.6666666667, ans=0.07 2023-11-18 16:17:20,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.81 vs. limit=10.0 2023-11-18 16:17:47,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=307986.6666666667, ans=0.04949747468305833 2023-11-18 16:17:53,391 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:17:55,289 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:18:03,729 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10150, loss[loss=0.1002, simple_loss=0.113, pruned_loss=0.03271, audio_tagging_loss=0.01096, over 14881.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1211, pruned_loss=0.03504, audio_tagging_loss=0.01147, over 3044238.93 frames. ], batch size: 55, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:18:09,125 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:18:15,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2023-11-18 16:18:19,766 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 9.614e+01 1.045e+02 1.146e+02 1.690e+02, threshold=2.090e+02, percent-clipped=0.0 2023-11-18 16:18:22,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=308186.6666666667, ans=0.1 2023-11-18 16:18:31,573 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:18:35,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=308253.3333333333, ans=0.125 2023-11-18 16:18:39,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=308320.0, ans=0.125 2023-11-18 16:18:43,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=308320.0, ans=0.0 2023-11-18 16:18:47,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=308386.6666666667, ans=0.0 2023-11-18 16:18:47,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=308386.6666666667, ans=0.125 2023-11-18 16:18:59,164 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10200, loss[loss=0.08961, simple_loss=0.09497, pruned_loss=0.02819, audio_tagging_loss=0.01394, over 15917.00 frames. ], tot_loss[loss=0.1055, simple_loss=0.1188, pruned_loss=0.03442, audio_tagging_loss=0.01171, over 3047774.05 frames. ], batch size: 62, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:18:59,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=308453.3333333333, ans=0.125 2023-11-18 16:19:22,776 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:19:31,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=308586.6666666667, ans=0.125 2023-11-18 16:19:33,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=308653.3333333333, ans=0.0 2023-11-18 16:19:39,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=308653.3333333333, ans=22.5 2023-11-18 16:19:55,122 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10250, loss[loss=0.1358, simple_loss=0.1508, pruned_loss=0.05013, audio_tagging_loss=0.01028, over 14709.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.1193, pruned_loss=0.03474, audio_tagging_loss=0.01179, over 3043815.41 frames. ], batch size: 57, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:19:55,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-11-18 16:20:03,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.83 vs. limit=10.0 2023-11-18 16:20:12,687 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 9.476e+01 1.039e+02 1.199e+02 1.617e+02, threshold=2.078e+02, percent-clipped=0.0 2023-11-18 16:20:32,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=308986.6666666667, ans=0.0 2023-11-18 16:20:32,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=308986.6666666667, ans=0.125 2023-11-18 16:20:44,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=309053.3333333333, ans=0.0 2023-11-18 16:20:45,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=309053.3333333333, ans=0.05 2023-11-18 16:20:49,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=15.0 2023-11-18 16:20:51,932 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10300, loss[loss=0.1187, simple_loss=0.1447, pruned_loss=0.03804, audio_tagging_loss=0.008338, over 15631.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.1194, pruned_loss=0.0349, audio_tagging_loss=0.01182, over 3038256.78 frames. ], batch size: 57, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:21:05,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=309186.6666666667, ans=0.125 2023-11-18 16:21:08,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=309186.6666666667, ans=0.1 2023-11-18 16:21:22,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.90 vs. limit=15.0 2023-11-18 16:21:25,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=309320.0, ans=0.0 2023-11-18 16:21:47,525 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10350, loss[loss=0.09847, simple_loss=0.1157, pruned_loss=0.03079, audio_tagging_loss=0.009849, over 14562.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1198, pruned_loss=0.03502, audio_tagging_loss=0.01194, over 3038393.54 frames. ], batch size: 56, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:22:04,391 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 9.661e+01 1.063e+02 1.175e+02 1.992e+02, threshold=2.126e+02, percent-clipped=0.0 2023-11-18 16:22:23,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=309653.3333333333, ans=0.0 2023-11-18 16:22:27,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2023-11-18 16:22:41,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=309720.0, ans=10.0 2023-11-18 16:22:42,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=309786.6666666667, ans=0.2 2023-11-18 16:22:43,318 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10400, loss[loss=0.1115, simple_loss=0.1264, pruned_loss=0.03603, audio_tagging_loss=0.01224, over 14634.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.1191, pruned_loss=0.03482, audio_tagging_loss=0.01209, over 3041392.73 frames. ], batch size: 55, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:23:07,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.40 vs. limit=22.5 2023-11-18 16:23:21,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=309986.6666666667, ans=0.025 2023-11-18 16:23:22,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=309986.6666666667, ans=0.0 2023-11-18 16:23:23,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309986.6666666667, ans=0.1 2023-11-18 16:23:39,767 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10450, loss[loss=0.1177, simple_loss=0.1352, pruned_loss=0.04384, audio_tagging_loss=0.006269, over 15854.00 frames. ], tot_loss[loss=0.1049, simple_loss=0.1175, pruned_loss=0.0342, audio_tagging_loss=0.01195, over 3031776.87 frames. ], batch size: 60, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:23:47,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=310120.0, ans=0.0 2023-11-18 16:23:52,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=310186.6666666667, ans=0.04949747468305833 2023-11-18 16:23:56,251 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 9.086e+01 9.811e+01 1.148e+02 1.710e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-18 16:24:20,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=310320.0, ans=0.0 2023-11-18 16:24:27,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=310386.6666666667, ans=0.0 2023-11-18 16:24:28,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.71 vs. limit=15.0 2023-11-18 16:24:35,559 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10500, loss[loss=0.1367, simple_loss=0.1576, pruned_loss=0.04555, audio_tagging_loss=0.01238, over 16578.00 frames. ], tot_loss[loss=0.1054, simple_loss=0.1184, pruned_loss=0.03445, audio_tagging_loss=0.01178, over 3032582.62 frames. ], batch size: 59, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:24:56,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=310520.0, ans=0.125 2023-11-18 16:24:56,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2023-11-18 16:25:18,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=310653.3333333333, ans=0.0 2023-11-18 16:25:22,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=310720.0, ans=0.125 2023-11-18 16:25:32,011 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10550, loss[loss=0.07265, simple_loss=0.08059, pruned_loss=0.02026, audio_tagging_loss=0.01209, over 14914.00 frames. ], tot_loss[loss=0.1047, simple_loss=0.1178, pruned_loss=0.03409, audio_tagging_loss=0.01175, over 3033158.52 frames. ], batch size: 57, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:25:49,165 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 9.120e+01 1.006e+02 1.112e+02 1.547e+02, threshold=2.011e+02, percent-clipped=0.0 2023-11-18 16:25:53,640 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2023-11-18 16:26:04,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=310986.6666666667, ans=0.09899494936611666 2023-11-18 16:26:26,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=311053.3333333333, ans=0.1 2023-11-18 16:26:28,620 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10600, loss[loss=0.1271, simple_loss=0.1538, pruned_loss=0.04179, audio_tagging_loss=0.008356, over 15820.00 frames. ], tot_loss[loss=0.1066, simple_loss=0.1203, pruned_loss=0.03492, audio_tagging_loss=0.01147, over 3043666.29 frames. ], batch size: 55, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:27:24,518 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10650, loss[loss=0.09824, simple_loss=0.1156, pruned_loss=0.031, audio_tagging_loss=0.009437, over 14961.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1213, pruned_loss=0.0353, audio_tagging_loss=0.01146, over 3040322.42 frames. ], batch size: 59, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:27:31,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=311453.3333333333, ans=0.0 2023-11-18 16:27:31,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=311453.3333333333, ans=0.0 2023-11-18 16:27:38,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=311520.0, ans=0.1 2023-11-18 16:27:40,907 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 9.767e+01 1.078e+02 1.173e+02 1.612e+02, threshold=2.157e+02, percent-clipped=0.0 2023-11-18 16:27:41,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=311520.0, ans=0.0 2023-11-18 16:27:42,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=311520.0, ans=0.1 2023-11-18 16:27:59,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2023-11-18 16:28:09,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=311720.0, ans=0.2 2023-11-18 16:28:20,379 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10700, loss[loss=0.07099, simple_loss=0.05851, pruned_loss=0.02625, audio_tagging_loss=0.01549, over 15010.00 frames. ], tot_loss[loss=0.107, simple_loss=0.1207, pruned_loss=0.03523, audio_tagging_loss=0.01143, over 3041849.61 frames. ], batch size: 60, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:28:24,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=311786.6666666667, ans=0.0 2023-11-18 16:28:58,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=311986.6666666667, ans=0.125 2023-11-18 16:29:12,430 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.80 vs. limit=22.5 2023-11-18 16:29:17,079 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10750, loss[loss=0.1326, simple_loss=0.1598, pruned_loss=0.04599, audio_tagging_loss=0.006733, over 14872.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1195, pruned_loss=0.0348, audio_tagging_loss=0.01156, over 3038894.58 frames. ], batch size: 53, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:29:24,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=312120.0, ans=0.07 2023-11-18 16:29:33,575 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 9.141e+01 9.911e+01 1.128e+02 1.714e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-18 16:29:51,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.01 vs. limit=15.0 2023-11-18 16:30:00,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=312386.6666666667, ans=0.2 2023-11-18 16:30:09,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=312386.6666666667, ans=0.1 2023-11-18 16:30:12,483 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10800, loss[loss=0.07599, simple_loss=0.08203, pruned_loss=0.0197, audio_tagging_loss=0.01528, over 16482.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1195, pruned_loss=0.03487, audio_tagging_loss=0.01146, over 3040874.98 frames. ], batch size: 63, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:30:28,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=312520.0, ans=0.0 2023-11-18 16:30:29,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2023-11-18 16:30:32,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=312520.0, ans=0.2 2023-11-18 16:30:39,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=312586.6666666667, ans=0.125 2023-11-18 16:30:48,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.43 vs. limit=6.0 2023-11-18 16:30:54,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=312653.3333333333, ans=0.125 2023-11-18 16:31:07,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2023-11-18 16:31:08,845 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10850, loss[loss=0.1061, simple_loss=0.102, pruned_loss=0.04155, audio_tagging_loss=0.01355, over 15165.00 frames. ], tot_loss[loss=0.1059, simple_loss=0.1191, pruned_loss=0.03482, audio_tagging_loss=0.01153, over 3038617.15 frames. ], batch size: 57, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:31:14,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=312786.6666666667, ans=0.125 2023-11-18 16:31:15,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=312786.6666666667, ans=0.0 2023-11-18 16:31:25,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 9.301e+01 1.024e+02 1.166e+02 1.801e+02, threshold=2.048e+02, percent-clipped=0.0 2023-11-18 16:31:31,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=312920.0, ans=0.0 2023-11-18 16:31:37,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=312920.0, ans=0.0 2023-11-18 16:31:51,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=312986.6666666667, ans=0.05 2023-11-18 16:31:57,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=313053.3333333333, ans=0.125 2023-11-18 16:32:03,382 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:32:04,477 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10900, loss[loss=0.0944, simple_loss=0.112, pruned_loss=0.02903, audio_tagging_loss=0.009359, over 14534.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.1182, pruned_loss=0.03459, audio_tagging_loss=0.0116, over 3036126.04 frames. ], batch size: 53, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:32:24,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=313253.3333333333, ans=0.95 2023-11-18 16:32:31,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=313253.3333333333, ans=0.125 2023-11-18 16:32:37,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=313320.0, ans=0.0 2023-11-18 16:32:38,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=313320.0, ans=0.0 2023-11-18 16:32:40,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=313320.0, ans=0.0 2023-11-18 16:32:45,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=313320.0, ans=0.1 2023-11-18 16:32:55,418 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:32:59,373 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 10950, loss[loss=0.1031, simple_loss=0.1176, pruned_loss=0.03337, audio_tagging_loss=0.01095, over 15237.00 frames. ], tot_loss[loss=0.1054, simple_loss=0.1185, pruned_loss=0.03456, audio_tagging_loss=0.01161, over 3032228.09 frames. ], batch size: 56, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:33:11,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=313520.0, ans=0.125 2023-11-18 16:33:16,523 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.810e+01 9.324e+01 1.025e+02 1.137e+02 1.491e+02, threshold=2.050e+02, percent-clipped=0.0 2023-11-18 16:33:28,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2023-11-18 16:33:30,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=313586.6666666667, ans=10.0 2023-11-18 16:33:54,815 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11000, loss[loss=0.1216, simple_loss=0.1355, pruned_loss=0.0421, audio_tagging_loss=0.01177, over 15155.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1193, pruned_loss=0.03474, audio_tagging_loss=0.0117, over 3036724.64 frames. ], batch size: 57, lr: 1.56e-02, grad_scale: 64.0 2023-11-18 16:34:02,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=313786.6666666667, ans=0.125 2023-11-18 16:34:05,948 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:34:18,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=313920.0, ans=0.2 2023-11-18 16:34:27,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=313986.6666666667, ans=0.0 2023-11-18 16:34:34,628 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=15.0 2023-11-18 16:34:35,979 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=15.0 2023-11-18 16:34:46,426 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.80 vs. limit=10.0 2023-11-18 16:34:50,184 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11050, loss[loss=0.1257, simple_loss=0.1365, pruned_loss=0.04753, audio_tagging_loss=0.009903, over 14890.00 frames. ], tot_loss[loss=0.1067, simple_loss=0.1198, pruned_loss=0.03497, audio_tagging_loss=0.01177, over 3034058.84 frames. ], batch size: 56, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:35:06,684 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.121e+01 9.418e+01 1.036e+02 1.168e+02 1.751e+02, threshold=2.073e+02, percent-clipped=0.0 2023-11-18 16:35:15,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=314253.3333333333, ans=0.0 2023-11-18 16:35:18,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=314253.3333333333, ans=0.0 2023-11-18 16:35:22,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=314320.0, ans=0.0 2023-11-18 16:35:24,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=314320.0, ans=22.5 2023-11-18 16:35:29,714 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2023-11-18 16:35:30,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=314320.0, ans=10.0 2023-11-18 16:35:44,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=314453.3333333333, ans=0.2 2023-11-18 16:35:45,673 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11100, loss[loss=0.07381, simple_loss=0.07584, pruned_loss=0.02622, audio_tagging_loss=0.009671, over 15190.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1202, pruned_loss=0.03493, audio_tagging_loss=0.01184, over 3042791.54 frames. ], batch size: 61, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:35:48,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=314453.3333333333, ans=0.125 2023-11-18 16:36:01,161 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-11-18 16:36:05,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=314520.0, ans=0.125 2023-11-18 16:36:18,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=314653.3333333333, ans=0.125 2023-11-18 16:36:27,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=314653.3333333333, ans=0.125 2023-11-18 16:36:36,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.48 vs. limit=15.0 2023-11-18 16:36:38,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=314720.0, ans=0.125 2023-11-18 16:36:40,806 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11150, loss[loss=0.09512, simple_loss=0.111, pruned_loss=0.0307, audio_tagging_loss=0.008932, over 15592.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.1192, pruned_loss=0.03471, audio_tagging_loss=0.01183, over 3040723.49 frames. ], batch size: 60, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:36:42,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=314786.6666666667, ans=0.0 2023-11-18 16:36:58,978 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.048e+01 9.570e+01 1.059e+02 1.181e+02 1.990e+02, threshold=2.118e+02, percent-clipped=0.0 2023-11-18 16:37:00,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=314853.3333333333, ans=15.0 2023-11-18 16:37:04,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=314920.0, ans=0.09899494936611666 2023-11-18 16:37:06,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=314920.0, ans=0.125 2023-11-18 16:37:09,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2023-11-18 16:37:19,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.46 vs. limit=15.0 2023-11-18 16:37:22,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=314986.6666666667, ans=0.95 2023-11-18 16:37:37,280 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11200, loss[loss=0.1057, simple_loss=0.1126, pruned_loss=0.03494, audio_tagging_loss=0.01447, over 14872.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.1195, pruned_loss=0.03467, audio_tagging_loss=0.01198, over 3036487.25 frames. ], batch size: 57, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:37:37,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=315120.0, ans=0.95 2023-11-18 16:38:20,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=315386.6666666667, ans=0.125 2023-11-18 16:38:29,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=315386.6666666667, ans=0.125 2023-11-18 16:38:32,600 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11250, loss[loss=0.1023, simple_loss=0.1087, pruned_loss=0.03508, audio_tagging_loss=0.01289, over 14628.00 frames. ], tot_loss[loss=0.1046, simple_loss=0.1172, pruned_loss=0.03393, audio_tagging_loss=0.01212, over 3040556.67 frames. ], batch size: 54, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:38:38,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=315453.3333333333, ans=0.125 2023-11-18 16:38:44,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=315520.0, ans=0.0 2023-11-18 16:38:47,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.88 vs. limit=12.0 2023-11-18 16:38:48,490 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 9.211e+01 1.045e+02 1.164e+02 1.761e+02, threshold=2.090e+02, percent-clipped=0.0 2023-11-18 16:39:01,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=315586.6666666667, ans=0.0 2023-11-18 16:39:01,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.14 vs. limit=15.0 2023-11-18 16:39:01,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2023-11-18 16:39:25,702 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.13 vs. limit=22.5 2023-11-18 16:39:27,250 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11300, loss[loss=0.09968, simple_loss=0.1095, pruned_loss=0.03333, audio_tagging_loss=0.01161, over 13882.00 frames. ], tot_loss[loss=0.1046, simple_loss=0.1172, pruned_loss=0.03402, audio_tagging_loss=0.01194, over 3048318.92 frames. ], batch size: 56, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:39:28,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=315786.6666666667, ans=0.125 2023-11-18 16:39:36,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.33 vs. limit=22.5 2023-11-18 16:39:44,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=315853.3333333333, ans=0.05 2023-11-18 16:39:45,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=315853.3333333333, ans=0.125 2023-11-18 16:39:47,224 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.879e+00 2023-11-18 16:39:48,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=315853.3333333333, ans=0.1 2023-11-18 16:39:52,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=315920.0, ans=0.5 2023-11-18 16:39:56,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=315920.0, ans=0.1 2023-11-18 16:39:56,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=315920.0, ans=0.125 2023-11-18 16:40:06,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=315986.6666666667, ans=0.125 2023-11-18 16:40:16,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=316053.3333333333, ans=0.05 2023-11-18 16:40:22,796 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11350, loss[loss=0.1265, simple_loss=0.155, pruned_loss=0.0399, audio_tagging_loss=0.009062, over 15220.00 frames. ], tot_loss[loss=0.1052, simple_loss=0.1183, pruned_loss=0.03427, audio_tagging_loss=0.01178, over 3045630.28 frames. ], batch size: 56, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:40:34,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=316186.6666666667, ans=12.0 2023-11-18 16:40:39,331 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 9.469e+01 1.052e+02 1.138e+02 1.718e+02, threshold=2.104e+02, percent-clipped=0.0 2023-11-18 16:41:01,570 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.417e+00 2023-11-18 16:41:15,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=316386.6666666667, ans=0.0 2023-11-18 16:41:18,411 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11400, loss[loss=0.09154, simple_loss=0.1063, pruned_loss=0.02498, audio_tagging_loss=0.01341, over 15337.00 frames. ], tot_loss[loss=0.1054, simple_loss=0.1189, pruned_loss=0.03432, audio_tagging_loss=0.01162, over 3040751.44 frames. ], batch size: 57, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:41:25,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=316453.3333333333, ans=0.125 2023-11-18 16:41:29,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=316520.0, ans=0.125 2023-11-18 16:41:33,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.13 vs. limit=22.5 2023-11-18 16:41:40,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=316586.6666666667, ans=0.125 2023-11-18 16:41:41,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=316586.6666666667, ans=0.125 2023-11-18 16:42:13,244 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11450, loss[loss=0.1097, simple_loss=0.1183, pruned_loss=0.03971, audio_tagging_loss=0.01091, over 15293.00 frames. ], tot_loss[loss=0.1049, simple_loss=0.118, pruned_loss=0.03429, audio_tagging_loss=0.01156, over 3040372.14 frames. ], batch size: 57, lr: 1.55e-02, grad_scale: 32.0 2023-11-18 16:42:30,666 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.176e+01 9.649e+01 1.077e+02 1.207e+02 1.681e+02, threshold=2.154e+02, percent-clipped=0.0 2023-11-18 16:43:03,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=317053.3333333333, ans=0.0 2023-11-18 16:43:08,871 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11500, loss[loss=0.08564, simple_loss=0.1002, pruned_loss=0.02281, audio_tagging_loss=0.01271, over 15239.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.1186, pruned_loss=0.03443, audio_tagging_loss=0.01159, over 3039951.82 frames. ], batch size: 57, lr: 1.55e-02, grad_scale: 32.0 2023-11-18 16:43:11,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=317120.0, ans=0.5 2023-11-18 16:43:19,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=317186.6666666667, ans=0.05 2023-11-18 16:43:48,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=317320.0, ans=0.0 2023-11-18 16:43:59,839 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:44:05,433 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11550, loss[loss=0.11, simple_loss=0.1193, pruned_loss=0.03725, audio_tagging_loss=0.01315, over 16059.00 frames. ], tot_loss[loss=0.1052, simple_loss=0.1186, pruned_loss=0.03438, audio_tagging_loss=0.01152, over 3052718.33 frames. ], batch size: 60, lr: 1.55e-02, grad_scale: 16.0 2023-11-18 16:44:13,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=317453.3333333333, ans=0.125 2023-11-18 16:44:19,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=317520.0, ans=0.0 2023-11-18 16:44:19,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317520.0, ans=0.1 2023-11-18 16:44:23,384 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 9.289e+01 1.045e+02 1.175e+02 1.806e+02, threshold=2.091e+02, percent-clipped=0.0 2023-11-18 16:44:24,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=317520.0, ans=0.125 2023-11-18 16:44:27,787 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2023-11-18 16:44:28,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=317586.6666666667, ans=0.125 2023-11-18 16:44:37,206 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.13 vs. limit=10.0 2023-11-18 16:44:41,087 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:44:52,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2023-11-18 16:45:00,898 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11600, loss[loss=0.1167, simple_loss=0.1251, pruned_loss=0.03967, audio_tagging_loss=0.0145, over 16487.00 frames. ], tot_loss[loss=0.1058, simple_loss=0.1194, pruned_loss=0.03457, audio_tagging_loss=0.01148, over 3051980.05 frames. ], batch size: 62, lr: 1.55e-02, grad_scale: 32.0 2023-11-18 16:45:01,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=317786.6666666667, ans=0.04949747468305833 2023-11-18 16:45:01,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=317786.6666666667, ans=0.0 2023-11-18 16:45:03,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=317786.6666666667, ans=0.125 2023-11-18 16:45:05,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.74 vs. limit=15.0 2023-11-18 16:45:31,360 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.283e-01 2023-11-18 16:45:44,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.91 vs. limit=22.5 2023-11-18 16:45:48,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=318053.3333333333, ans=0.0 2023-11-18 16:45:54,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=318053.3333333333, ans=0.0 2023-11-18 16:45:56,546 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11650, loss[loss=0.1134, simple_loss=0.1276, pruned_loss=0.03797, audio_tagging_loss=0.01165, over 15624.00 frames. ], tot_loss[loss=0.1058, simple_loss=0.1196, pruned_loss=0.03458, audio_tagging_loss=0.01143, over 3053519.15 frames. ], batch size: 57, lr: 1.55e-02, grad_scale: 32.0 2023-11-18 16:45:57,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=318120.0, ans=0.125 2023-11-18 16:46:15,784 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.644e+01 9.455e+01 1.056e+02 1.163e+02 1.452e+02, threshold=2.111e+02, percent-clipped=0.0 2023-11-18 16:46:29,080 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.52 vs. limit=6.0 2023-11-18 16:46:42,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=12.0 2023-11-18 16:46:50,462 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.60 vs. limit=12.0 2023-11-18 16:46:51,860 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11700, loss[loss=0.143, simple_loss=0.1663, pruned_loss=0.04904, audio_tagging_loss=0.01084, over 17332.00 frames. ], tot_loss[loss=0.1057, simple_loss=0.1193, pruned_loss=0.03455, audio_tagging_loss=0.01151, over 3058728.68 frames. ], batch size: 60, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:47:02,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=318520.0, ans=0.0 2023-11-18 16:47:20,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2023-11-18 16:47:33,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=318653.3333333333, ans=0.125 2023-11-18 16:47:45,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=318720.0, ans=0.125 2023-11-18 16:47:46,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=318786.6666666667, ans=0.0 2023-11-18 16:47:47,747 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11750, loss[loss=0.08163, simple_loss=0.0917, pruned_loss=0.02284, audio_tagging_loss=0.01294, over 15445.00 frames. ], tot_loss[loss=0.1055, simple_loss=0.1189, pruned_loss=0.03449, audio_tagging_loss=0.01158, over 3051188.92 frames. ], batch size: 58, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:47:57,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=318853.3333333333, ans=0.0 2023-11-18 16:48:03,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=318853.3333333333, ans=0.0 2023-11-18 16:48:06,358 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.155e+01 9.788e+01 1.106e+02 1.226e+02 1.834e+02, threshold=2.212e+02, percent-clipped=0.0 2023-11-18 16:48:13,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=318920.0, ans=0.125 2023-11-18 16:48:36,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2023-11-18 16:48:40,107 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.809e-02 2023-11-18 16:48:43,586 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11800, loss[loss=0.09847, simple_loss=0.1158, pruned_loss=0.029, audio_tagging_loss=0.01159, over 15961.00 frames. ], tot_loss[loss=0.1056, simple_loss=0.1189, pruned_loss=0.03452, audio_tagging_loss=0.01161, over 3052969.08 frames. ], batch size: 60, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:48:48,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=319120.0, ans=0.0 2023-11-18 16:49:27,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2023-11-18 16:49:31,544 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=9.775e-02 2023-11-18 16:49:33,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=319386.6666666667, ans=0.0 2023-11-18 16:49:36,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=319386.6666666667, ans=0.125 2023-11-18 16:49:39,792 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11850, loss[loss=0.1264, simple_loss=0.1331, pruned_loss=0.04505, audio_tagging_loss=0.01478, over 14968.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.1202, pruned_loss=0.03487, audio_tagging_loss=0.01158, over 3047927.34 frames. ], batch size: 57, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:49:47,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=319453.3333333333, ans=0.0 2023-11-18 16:49:50,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=319520.0, ans=0.5 2023-11-18 16:49:50,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=319520.0, ans=0.0 2023-11-18 16:49:52,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=319520.0, ans=0.0 2023-11-18 16:49:54,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=319520.0, ans=0.0 2023-11-18 16:49:58,406 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 9.740e+01 1.079e+02 1.230e+02 2.254e+02, threshold=2.157e+02, percent-clipped=1.0 2023-11-18 16:50:00,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2023-11-18 16:50:01,815 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=319586.6666666667, ans=0.05 2023-11-18 16:50:16,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=319653.3333333333, ans=0.125 2023-11-18 16:50:21,532 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.141e-02 2023-11-18 16:50:25,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=319720.0, ans=0.1 2023-11-18 16:50:33,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=319720.0, ans=0.1 2023-11-18 16:50:34,990 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11900, loss[loss=0.08153, simple_loss=0.08686, pruned_loss=0.02495, audio_tagging_loss=0.01315, over 16977.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1208, pruned_loss=0.03518, audio_tagging_loss=0.01162, over 3046007.32 frames. ], batch size: 68, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:50:37,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=319786.6666666667, ans=0.0 2023-11-18 16:50:55,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=319853.3333333333, ans=0.04949747468305833 2023-11-18 16:50:55,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-11-18 16:51:02,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=319920.0, ans=0.1 2023-11-18 16:51:16,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=319986.6666666667, ans=0.125 2023-11-18 16:51:26,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=320053.3333333333, ans=0.125 2023-11-18 16:51:31,451 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.31 vs. limit=15.0 2023-11-18 16:51:32,914 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 11950, loss[loss=0.1101, simple_loss=0.1236, pruned_loss=0.03604, audio_tagging_loss=0.01221, over 15728.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1192, pruned_loss=0.03467, audio_tagging_loss=0.01181, over 3042293.94 frames. ], batch size: 58, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:51:33,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=320120.0, ans=0.125 2023-11-18 16:51:43,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=320186.6666666667, ans=0.125 2023-11-18 16:51:46,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=320186.6666666667, ans=0.125 2023-11-18 16:51:52,478 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.985e+01 9.226e+01 1.013e+02 1.097e+02 1.681e+02, threshold=2.026e+02, percent-clipped=0.0 2023-11-18 16:52:25,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=320386.6666666667, ans=0.125 2023-11-18 16:52:27,181 INFO [train_asr.py:1115] (1/4) Epoch 4, batch 12000, loss[loss=0.1015, simple_loss=0.1237, pruned_loss=0.0303, audio_tagging_loss=0.009366, over 15456.00 frames. ], tot_loss[loss=0.1067, simple_loss=0.1199, pruned_loss=0.03473, audio_tagging_loss=0.01196, over 3040808.57 frames. ], batch size: 57, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:52:27,181 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 16:53:00,108 INFO [train_asr.py:1147] (1/4) Epoch 4, validation: loss=0.07553, simple_loss=0.06151, pruned_loss=0.009833, audio_tagging_loss=0.03495, over 4681554.00 frames. 2023-11-18 16:53:00,109 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 16:53:01,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=320453.3333333333, ans=0.0 2023-11-18 16:53:01,566 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2023-11-18 16:53:03,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=320453.3333333333, ans=0.125 2023-11-18 16:53:05,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.64 vs. limit=15.0 2023-11-18 16:54:03,927 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 0, loss[loss=0.1087, simple_loss=0.101, pruned_loss=0.02827, audio_tagging_loss=0.02989, over 16028.00 frames. ], tot_loss[loss=0.1087, simple_loss=0.101, pruned_loss=0.02827, audio_tagging_loss=0.02989, over 16028.00 frames. ], batch size: 60, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:54:03,928 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 16:54:35,508 INFO [train_asr.py:1147] (1/4) Epoch 5, validation: loss=0.07399, simple_loss=0.06162, pruned_loss=0.009934, audio_tagging_loss=0.03325, over 4681554.00 frames. 2023-11-18 16:54:35,508 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 16:54:39,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=320626.6666666667, ans=0.125 2023-11-18 16:54:44,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=320626.6666666667, ans=15.0 2023-11-18 16:54:52,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=15.0 2023-11-18 16:54:53,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=320693.3333333333, ans=0.05 2023-11-18 16:54:54,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=320693.3333333333, ans=0.0 2023-11-18 16:55:10,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2023-11-18 16:55:18,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=320826.6666666667, ans=0.1 2023-11-18 16:55:21,603 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.951e+01 9.535e+01 1.056e+02 1.198e+02 1.542e+02, threshold=2.112e+02, percent-clipped=0.0 2023-11-18 16:55:31,307 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 50, loss[loss=0.09691, simple_loss=0.09216, pruned_loss=0.02527, audio_tagging_loss=0.02555, over 15439.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1179, pruned_loss=0.03438, audio_tagging_loss=0.02263, over 690398.72 frames. ], batch size: 59, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:55:39,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=320960.0, ans=0.1 2023-11-18 16:55:52,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=321093.3333333333, ans=0.05 2023-11-18 16:55:55,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=321093.3333333333, ans=0.125 2023-11-18 16:56:00,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=321093.3333333333, ans=0.05 2023-11-18 16:56:13,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=321160.0, ans=0.0 2023-11-18 16:56:18,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2023-11-18 16:56:26,688 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 100, loss[loss=0.1148, simple_loss=0.1199, pruned_loss=0.03386, audio_tagging_loss=0.02097, over 15203.00 frames. ], tot_loss[loss=0.1139, simple_loss=0.1177, pruned_loss=0.03343, audio_tagging_loss=0.02165, over 1209066.98 frames. ], batch size: 56, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:56:30,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=321293.3333333333, ans=0.125 2023-11-18 16:56:51,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=321426.6666666667, ans=0.125 2023-11-18 16:57:12,265 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 9.566e+01 1.064e+02 1.154e+02 1.620e+02, threshold=2.127e+02, percent-clipped=0.0 2023-11-18 16:57:16,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=321560.0, ans=0.025 2023-11-18 16:57:22,393 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 150, loss[loss=0.08969, simple_loss=0.09331, pruned_loss=0.02633, audio_tagging_loss=0.01671, over 15304.00 frames. ], tot_loss[loss=0.1114, simple_loss=0.1177, pruned_loss=0.03322, audio_tagging_loss=0.01932, over 1620400.10 frames. ], batch size: 59, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:57:23,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=321626.6666666667, ans=0.1 2023-11-18 16:57:38,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.77 vs. limit=10.0 2023-11-18 16:57:40,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=321693.3333333333, ans=0.125 2023-11-18 16:57:44,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=321760.0, ans=0.125 2023-11-18 16:57:48,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=321760.0, ans=0.125 2023-11-18 16:58:11,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=321893.3333333333, ans=0.125 2023-11-18 16:58:11,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=321893.3333333333, ans=0.125 2023-11-18 16:58:12,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=23.80 vs. limit=22.5 2023-11-18 16:58:12,599 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:58:13,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=321893.3333333333, ans=0.025 2023-11-18 16:58:17,754 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 200, loss[loss=0.08372, simple_loss=0.09679, pruned_loss=0.02367, audio_tagging_loss=0.01166, over 14430.00 frames. ], tot_loss[loss=0.1115, simple_loss=0.1208, pruned_loss=0.03436, audio_tagging_loss=0.01675, over 1936852.17 frames. ], batch size: 56, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:58:19,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=321960.0, ans=0.125 2023-11-18 16:58:22,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=321960.0, ans=0.2 2023-11-18 16:58:56,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=8.0 2023-11-18 16:59:03,728 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.692e+01 9.231e+01 1.044e+02 1.147e+02 1.591e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 16:59:14,445 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 250, loss[loss=0.09099, simple_loss=0.1011, pruned_loss=0.02715, audio_tagging_loss=0.01327, over 14684.00 frames. ], tot_loss[loss=0.1107, simple_loss=0.1216, pruned_loss=0.03481, audio_tagging_loss=0.01509, over 2178060.09 frames. ], batch size: 55, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:59:28,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.85 vs. limit=15.0 2023-11-18 16:59:30,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=322360.0, ans=0.05 2023-11-18 16:59:42,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=322426.6666666667, ans=0.125 2023-11-18 16:59:44,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=322426.6666666667, ans=0.125 2023-11-18 17:00:09,719 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 300, loss[loss=0.1388, simple_loss=0.1568, pruned_loss=0.05268, audio_tagging_loss=0.007657, over 15655.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1222, pruned_loss=0.03479, audio_tagging_loss=0.01393, over 2375168.07 frames. ], batch size: 58, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:00:12,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=322626.6666666667, ans=0.1 2023-11-18 17:00:13,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=15.0 2023-11-18 17:00:50,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=322826.6666666667, ans=0.1 2023-11-18 17:00:56,003 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 9.144e+01 1.032e+02 1.177e+02 1.892e+02, threshold=2.064e+02, percent-clipped=0.0 2023-11-18 17:00:56,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=322893.3333333333, ans=0.1 2023-11-18 17:01:06,801 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 350, loss[loss=0.1366, simple_loss=0.1442, pruned_loss=0.05268, audio_tagging_loss=0.01184, over 15148.00 frames. ], tot_loss[loss=0.109, simple_loss=0.1222, pruned_loss=0.03478, audio_tagging_loss=0.01312, over 2525561.27 frames. ], batch size: 57, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:01:07,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=322960.0, ans=0.125 2023-11-18 17:01:10,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=322960.0, ans=0.125 2023-11-18 17:01:17,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=323026.6666666667, ans=0.125 2023-11-18 17:01:18,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=323026.6666666667, ans=0.125 2023-11-18 17:01:21,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2023-11-18 17:01:36,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.35 vs. limit=22.5 2023-11-18 17:02:03,603 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 400, loss[loss=0.07945, simple_loss=0.08669, pruned_loss=0.02007, audio_tagging_loss=0.01603, over 15089.00 frames. ], tot_loss[loss=0.108, simple_loss=0.1215, pruned_loss=0.03454, audio_tagging_loss=0.01268, over 2645608.89 frames. ], batch size: 59, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:02:04,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=323293.3333333333, ans=0.125 2023-11-18 17:02:10,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=323293.3333333333, ans=15.0 2023-11-18 17:02:27,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=323426.6666666667, ans=0.0 2023-11-18 17:02:34,824 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-11-18 17:02:35,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=323493.3333333333, ans=0.2 2023-11-18 17:02:37,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=323493.3333333333, ans=0.0 2023-11-18 17:02:39,581 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.42 vs. limit=15.0 2023-11-18 17:02:49,149 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 9.956e+01 1.111e+02 1.271e+02 1.658e+02, threshold=2.223e+02, percent-clipped=0.0 2023-11-18 17:02:51,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=323560.0, ans=0.07 2023-11-18 17:02:55,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=323560.0, ans=0.1 2023-11-18 17:02:58,839 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 450, loss[loss=0.09613, simple_loss=0.1079, pruned_loss=0.03126, audio_tagging_loss=0.01094, over 15329.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1212, pruned_loss=0.03475, audio_tagging_loss=0.01239, over 2736425.76 frames. ], batch size: 58, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:03:14,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.79 vs. limit=10.0 2023-11-18 17:03:18,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=323693.3333333333, ans=0.2 2023-11-18 17:03:22,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.07 vs. limit=22.5 2023-11-18 17:03:43,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=323893.3333333333, ans=0.125 2023-11-18 17:03:54,628 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 500, loss[loss=0.107, simple_loss=0.1193, pruned_loss=0.03386, audio_tagging_loss=0.01347, over 14289.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.121, pruned_loss=0.03464, audio_tagging_loss=0.01203, over 2795886.42 frames. ], batch size: 56, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:04:14,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=324026.6666666667, ans=0.2 2023-11-18 17:04:28,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=324160.0, ans=0.2 2023-11-18 17:04:29,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=324160.0, ans=0.125 2023-11-18 17:04:30,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=324160.0, ans=0.0 2023-11-18 17:04:37,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=324160.0, ans=0.0 2023-11-18 17:04:40,508 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.977e+01 9.068e+01 9.771e+01 1.090e+02 1.763e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-18 17:04:47,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=324226.6666666667, ans=0.1 2023-11-18 17:04:51,723 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 550, loss[loss=0.09005, simple_loss=0.1071, pruned_loss=0.02657, audio_tagging_loss=0.009932, over 15953.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1207, pruned_loss=0.0345, audio_tagging_loss=0.01193, over 2851180.45 frames. ], batch size: 59, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:04:56,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=324293.3333333333, ans=0.125 2023-11-18 17:05:05,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=12.0 2023-11-18 17:05:41,058 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.61 vs. limit=15.0 2023-11-18 17:05:41,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=324560.0, ans=0.125 2023-11-18 17:05:46,822 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 600, loss[loss=0.1404, simple_loss=0.163, pruned_loss=0.04991, audio_tagging_loss=0.009016, over 14779.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.1204, pruned_loss=0.03444, audio_tagging_loss=0.01175, over 2894607.82 frames. ], batch size: 55, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:05:50,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=324626.6666666667, ans=0.125 2023-11-18 17:06:01,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=324693.3333333333, ans=0.125 2023-11-18 17:06:18,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=324760.0, ans=0.0 2023-11-18 17:06:28,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2023-11-18 17:06:32,197 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.460e+01 9.095e+01 1.023e+02 1.155e+02 1.808e+02, threshold=2.046e+02, percent-clipped=0.0 2023-11-18 17:06:34,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=324893.3333333333, ans=0.2 2023-11-18 17:06:39,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=324893.3333333333, ans=0.2 2023-11-18 17:06:42,441 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 650, loss[loss=0.08376, simple_loss=0.09242, pruned_loss=0.02694, audio_tagging_loss=0.01061, over 14926.00 frames. ], tot_loss[loss=0.1067, simple_loss=0.1211, pruned_loss=0.03454, audio_tagging_loss=0.01163, over 2933295.51 frames. ], batch size: 57, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:06:49,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=324960.0, ans=0.125 2023-11-18 17:06:51,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=324960.0, ans=0.125 2023-11-18 17:07:00,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2023-11-18 17:07:12,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=325093.3333333333, ans=10.0 2023-11-18 17:07:13,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=325093.3333333333, ans=0.125 2023-11-18 17:07:20,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2023-11-18 17:07:26,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=325226.6666666667, ans=0.0 2023-11-18 17:07:37,857 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 700, loss[loss=0.09419, simple_loss=0.112, pruned_loss=0.02594, audio_tagging_loss=0.01226, over 15446.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1222, pruned_loss=0.03478, audio_tagging_loss=0.01149, over 2961417.79 frames. ], batch size: 57, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:07:52,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=325360.0, ans=0.125 2023-11-18 17:07:55,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.00 vs. limit=15.0 2023-11-18 17:07:57,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=325360.0, ans=0.125 2023-11-18 17:08:04,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=325426.6666666667, ans=0.125 2023-11-18 17:08:24,350 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 9.264e+01 9.972e+01 1.138e+02 1.580e+02, threshold=1.994e+02, percent-clipped=0.0 2023-11-18 17:08:26,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=325560.0, ans=0.0 2023-11-18 17:08:29,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=325560.0, ans=0.125 2023-11-18 17:08:34,004 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 750, loss[loss=0.1429, simple_loss=0.1642, pruned_loss=0.05293, audio_tagging_loss=0.007924, over 16409.00 frames. ], tot_loss[loss=0.106, simple_loss=0.1206, pruned_loss=0.03412, audio_tagging_loss=0.01154, over 2983808.75 frames. ], batch size: 59, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:09:06,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=325826.6666666667, ans=0.125 2023-11-18 17:09:14,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=325826.6666666667, ans=0.125 2023-11-18 17:09:29,749 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 800, loss[loss=0.09293, simple_loss=0.1075, pruned_loss=0.0276, audio_tagging_loss=0.01159, over 15527.00 frames. ], tot_loss[loss=0.1057, simple_loss=0.1205, pruned_loss=0.03388, audio_tagging_loss=0.0115, over 3008121.47 frames. ], batch size: 57, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:09:29,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=325960.0, ans=0.1 2023-11-18 17:09:57,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=326093.3333333333, ans=0.125 2023-11-18 17:10:03,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.84 vs. limit=15.0 2023-11-18 17:10:15,469 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.292e+01 9.696e+01 1.116e+02 1.261e+02 1.745e+02, threshold=2.231e+02, percent-clipped=0.0 2023-11-18 17:10:24,988 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 850, loss[loss=0.1052, simple_loss=0.1084, pruned_loss=0.03568, audio_tagging_loss=0.01532, over 13784.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.12, pruned_loss=0.03365, audio_tagging_loss=0.01165, over 3017486.11 frames. ], batch size: 56, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:10:25,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=326293.3333333333, ans=0.125 2023-11-18 17:11:15,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=326560.0, ans=0.125 2023-11-18 17:11:20,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=326560.0, ans=0.0 2023-11-18 17:11:21,919 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 900, loss[loss=0.09698, simple_loss=0.1162, pruned_loss=0.02955, audio_tagging_loss=0.009317, over 15504.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.1213, pruned_loss=0.03421, audio_tagging_loss=0.01162, over 3026086.95 frames. ], batch size: 58, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:11:26,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=326626.6666666667, ans=0.125 2023-11-18 17:11:26,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.07 vs. limit=10.0 2023-11-18 17:11:33,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=326693.3333333333, ans=0.125 2023-11-18 17:11:56,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=326826.6666666667, ans=0.0 2023-11-18 17:12:06,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.04 vs. limit=15.0 2023-11-18 17:12:08,184 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 9.412e+01 1.033e+02 1.138e+02 1.840e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 17:12:17,652 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 950, loss[loss=0.1073, simple_loss=0.1321, pruned_loss=0.03253, audio_tagging_loss=0.008739, over 16617.00 frames. ], tot_loss[loss=0.1059, simple_loss=0.1207, pruned_loss=0.034, audio_tagging_loss=0.01155, over 3035662.81 frames. ], batch size: 60, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:12:27,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=327026.6666666667, ans=0.0 2023-11-18 17:12:28,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=327026.6666666667, ans=0.0 2023-11-18 17:12:31,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=327026.6666666667, ans=0.0 2023-11-18 17:12:35,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=327026.6666666667, ans=0.07 2023-11-18 17:12:38,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=327026.6666666667, ans=0.0 2023-11-18 17:12:52,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=327160.0, ans=10.0 2023-11-18 17:12:54,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=327160.0, ans=0.1 2023-11-18 17:12:54,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=327160.0, ans=0.125 2023-11-18 17:13:03,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=327226.6666666667, ans=0.125 2023-11-18 17:13:06,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=327226.6666666667, ans=0.2 2023-11-18 17:13:13,609 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1000, loss[loss=0.1161, simple_loss=0.1276, pruned_loss=0.0424, audio_tagging_loss=0.009885, over 15862.00 frames. ], tot_loss[loss=0.1052, simple_loss=0.1199, pruned_loss=0.03381, audio_tagging_loss=0.01145, over 3039936.40 frames. ], batch size: 57, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:13:13,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=327293.3333333333, ans=0.125 2023-11-18 17:13:24,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=327360.0, ans=0.125 2023-11-18 17:13:25,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2023-11-18 17:13:32,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=327360.0, ans=0.04949747468305833 2023-11-18 17:13:38,758 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:13:41,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.69 vs. limit=15.0 2023-11-18 17:13:44,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=327426.6666666667, ans=0.125 2023-11-18 17:14:00,045 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 9.245e+01 1.040e+02 1.129e+02 1.708e+02, threshold=2.081e+02, percent-clipped=0.0 2023-11-18 17:14:10,346 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1050, loss[loss=0.1389, simple_loss=0.1726, pruned_loss=0.04435, audio_tagging_loss=0.008294, over 15581.00 frames. ], tot_loss[loss=0.1051, simple_loss=0.1198, pruned_loss=0.03383, audio_tagging_loss=0.01137, over 3041079.02 frames. ], batch size: 57, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:14:21,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=327693.3333333333, ans=0.125 2023-11-18 17:14:37,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=327760.0, ans=0.125 2023-11-18 17:14:56,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=327893.3333333333, ans=0.0 2023-11-18 17:15:05,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=327960.0, ans=0.125 2023-11-18 17:15:06,212 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1100, loss[loss=0.07697, simple_loss=0.07762, pruned_loss=0.02538, audio_tagging_loss=0.01277, over 14315.00 frames. ], tot_loss[loss=0.105, simple_loss=0.1201, pruned_loss=0.03372, audio_tagging_loss=0.0113, over 3046369.50 frames. ], batch size: 57, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:15:09,917 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:15:11,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=327960.0, ans=0.0 2023-11-18 17:15:11,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.44 vs. limit=15.0 2023-11-18 17:15:14,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=327960.0, ans=0.2 2023-11-18 17:15:29,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=328093.3333333333, ans=0.1 2023-11-18 17:15:35,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=328093.3333333333, ans=0.125 2023-11-18 17:15:43,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.89 vs. limit=15.0 2023-11-18 17:15:52,625 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.960e+01 8.982e+01 9.877e+01 1.122e+02 1.591e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-18 17:15:54,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=328226.6666666667, ans=0.1 2023-11-18 17:15:59,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=328226.6666666667, ans=0.0 2023-11-18 17:16:02,177 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1150, loss[loss=0.08158, simple_loss=0.09217, pruned_loss=0.02125, audio_tagging_loss=0.01424, over 14109.00 frames. ], tot_loss[loss=0.1042, simple_loss=0.119, pruned_loss=0.03341, audio_tagging_loss=0.01131, over 3033913.69 frames. ], batch size: 54, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:16:16,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=328360.0, ans=0.05 2023-11-18 17:16:52,853 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2023-11-18 17:16:58,662 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1200, loss[loss=0.1147, simple_loss=0.1293, pruned_loss=0.03897, audio_tagging_loss=0.01102, over 16396.00 frames. ], tot_loss[loss=0.1045, simple_loss=0.1191, pruned_loss=0.03359, audio_tagging_loss=0.01135, over 3043188.13 frames. ], batch size: 57, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:17:06,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.27 vs. limit=10.0 2023-11-18 17:17:13,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=328693.3333333333, ans=0.125 2023-11-18 17:17:19,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=328693.3333333333, ans=0.125 2023-11-18 17:17:32,995 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-11-18 17:17:36,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=328826.6666666667, ans=0.025 2023-11-18 17:17:44,330 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 9.512e+01 1.070e+02 1.237e+02 2.001e+02, threshold=2.140e+02, percent-clipped=1.0 2023-11-18 17:17:46,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=328893.3333333333, ans=0.0 2023-11-18 17:17:54,557 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1250, loss[loss=0.08194, simple_loss=0.09201, pruned_loss=0.02366, audio_tagging_loss=0.01228, over 15398.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1185, pruned_loss=0.03348, audio_tagging_loss=0.01124, over 3046285.24 frames. ], batch size: 62, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:18:15,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=329026.6666666667, ans=0.0 2023-11-18 17:18:30,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=15.0 2023-11-18 17:18:34,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=329160.0, ans=0.2 2023-11-18 17:18:38,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=329226.6666666667, ans=0.125 2023-11-18 17:18:49,794 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.36 vs. limit=10.0 2023-11-18 17:18:50,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.60 vs. limit=5.0 2023-11-18 17:18:50,346 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1300, loss[loss=0.1138, simple_loss=0.1333, pruned_loss=0.03684, audio_tagging_loss=0.01031, over 15318.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1182, pruned_loss=0.03332, audio_tagging_loss=0.01131, over 3041251.12 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:19:08,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=329360.0, ans=0.125 2023-11-18 17:19:11,892 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.525e-03 2023-11-18 17:19:17,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=329426.6666666667, ans=0.125 2023-11-18 17:19:21,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=329426.6666666667, ans=0.0 2023-11-18 17:19:29,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.54 vs. limit=15.0 2023-11-18 17:19:29,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=329493.3333333333, ans=0.125 2023-11-18 17:19:35,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=329560.0, ans=0.125 2023-11-18 17:19:36,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 9.287e+01 1.028e+02 1.154e+02 1.858e+02, threshold=2.056e+02, percent-clipped=0.0 2023-11-18 17:19:46,105 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.936e-01 2023-11-18 17:19:46,959 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1350, loss[loss=0.1084, simple_loss=0.1295, pruned_loss=0.03164, audio_tagging_loss=0.01203, over 15010.00 frames. ], tot_loss[loss=0.1042, simple_loss=0.1184, pruned_loss=0.03356, audio_tagging_loss=0.01142, over 3036131.11 frames. ], batch size: 57, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:20:07,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=329693.3333333333, ans=0.125 2023-11-18 17:20:13,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=329760.0, ans=0.125 2023-11-18 17:20:19,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2023-11-18 17:20:25,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=329826.6666666667, ans=0.0 2023-11-18 17:20:25,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=329826.6666666667, ans=0.0 2023-11-18 17:20:27,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=329826.6666666667, ans=0.125 2023-11-18 17:20:28,585 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:20:34,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.88 vs. limit=15.0 2023-11-18 17:20:36,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=329893.3333333333, ans=0.125 2023-11-18 17:20:42,465 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1400, loss[loss=0.1125, simple_loss=0.12, pruned_loss=0.03831, audio_tagging_loss=0.01417, over 15463.00 frames. ], tot_loss[loss=0.1041, simple_loss=0.1182, pruned_loss=0.03353, audio_tagging_loss=0.01143, over 3039575.01 frames. ], batch size: 61, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:20:44,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=329960.0, ans=0.04949747468305833 2023-11-18 17:20:56,666 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.44 vs. limit=22.5 2023-11-18 17:21:02,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2023-11-18 17:21:20,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=330160.0, ans=0.125 2023-11-18 17:21:27,665 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 9.034e+01 1.046e+02 1.162e+02 2.116e+02, threshold=2.092e+02, percent-clipped=1.0 2023-11-18 17:21:27,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=330226.6666666667, ans=0.1 2023-11-18 17:21:38,685 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1450, loss[loss=0.09804, simple_loss=0.1149, pruned_loss=0.02884, audio_tagging_loss=0.01176, over 15173.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.1177, pruned_loss=0.03339, audio_tagging_loss=0.0115, over 3038135.21 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:21:45,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=12.0 2023-11-18 17:21:52,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=330360.0, ans=0.0 2023-11-18 17:22:18,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=330493.3333333333, ans=0.015 2023-11-18 17:22:18,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=330493.3333333333, ans=0.0 2023-11-18 17:22:18,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=330493.3333333333, ans=0.125 2023-11-18 17:22:34,985 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1500, loss[loss=0.1018, simple_loss=0.1189, pruned_loss=0.03082, audio_tagging_loss=0.01151, over 16208.00 frames. ], tot_loss[loss=0.1044, simple_loss=0.1188, pruned_loss=0.03345, audio_tagging_loss=0.01154, over 3047822.72 frames. ], batch size: 61, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:23:05,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=330760.0, ans=0.125 2023-11-18 17:23:11,060 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.49 vs. limit=12.0 2023-11-18 17:23:11,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.05 vs. limit=10.0 2023-11-18 17:23:17,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=330826.6666666667, ans=0.0 2023-11-18 17:23:18,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=330826.6666666667, ans=0.125 2023-11-18 17:23:20,979 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.810e+01 9.325e+01 1.048e+02 1.222e+02 1.829e+02, threshold=2.095e+02, percent-clipped=0.0 2023-11-18 17:23:30,606 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1550, loss[loss=0.09196, simple_loss=0.1084, pruned_loss=0.02763, audio_tagging_loss=0.01012, over 16633.00 frames. ], tot_loss[loss=0.1047, simple_loss=0.1193, pruned_loss=0.03345, audio_tagging_loss=0.01164, over 3044787.47 frames. ], batch size: 62, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:23:49,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=331026.6666666667, ans=0.1 2023-11-18 17:24:00,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=331093.3333333333, ans=0.125 2023-11-18 17:24:06,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=331160.0, ans=0.125 2023-11-18 17:24:09,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.22 vs. limit=6.0 2023-11-18 17:24:14,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=331226.6666666667, ans=0.0 2023-11-18 17:24:15,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=331226.6666666667, ans=0.1 2023-11-18 17:24:25,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=331293.3333333333, ans=0.125 2023-11-18 17:24:26,283 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1600, loss[loss=0.1154, simple_loss=0.1453, pruned_loss=0.03348, audio_tagging_loss=0.009285, over 15708.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.1179, pruned_loss=0.03304, audio_tagging_loss=0.01183, over 3051072.42 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:24:44,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2023-11-18 17:25:01,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=331493.3333333333, ans=0.1 2023-11-18 17:25:06,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=331493.3333333333, ans=0.0 2023-11-18 17:25:12,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 9.229e+01 1.020e+02 1.137e+02 1.712e+02, threshold=2.040e+02, percent-clipped=0.0 2023-11-18 17:25:18,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-18 17:25:23,399 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1650, loss[loss=0.1328, simple_loss=0.1616, pruned_loss=0.04207, audio_tagging_loss=0.009958, over 15375.00 frames. ], tot_loss[loss=0.1042, simple_loss=0.1182, pruned_loss=0.03318, audio_tagging_loss=0.01191, over 3054859.42 frames. ], batch size: 55, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:25:23,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2023-11-18 17:25:31,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=331626.6666666667, ans=0.05 2023-11-18 17:25:37,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=331693.3333333333, ans=0.125 2023-11-18 17:25:46,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=331760.0, ans=0.125 2023-11-18 17:26:10,537 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=331893.3333333333, ans=0.125 2023-11-18 17:26:18,874 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1700, loss[loss=0.1394, simple_loss=0.1572, pruned_loss=0.05016, audio_tagging_loss=0.01068, over 14621.00 frames. ], tot_loss[loss=0.1048, simple_loss=0.1187, pruned_loss=0.03356, audio_tagging_loss=0.01182, over 3052629.39 frames. ], batch size: 55, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:26:19,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=331960.0, ans=0.1 2023-11-18 17:26:19,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-18 17:26:24,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=331960.0, ans=0.125 2023-11-18 17:26:26,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=331960.0, ans=0.2 2023-11-18 17:26:37,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332026.6666666667, ans=0.1 2023-11-18 17:27:03,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=332226.6666666667, ans=0.05 2023-11-18 17:27:06,290 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.601e+01 9.403e+01 1.012e+02 1.120e+02 1.645e+02, threshold=2.024e+02, percent-clipped=0.0 2023-11-18 17:27:12,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=332226.6666666667, ans=0.1 2023-11-18 17:27:15,297 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1750, loss[loss=0.07779, simple_loss=0.0884, pruned_loss=0.02096, audio_tagging_loss=0.01263, over 15151.00 frames. ], tot_loss[loss=0.1051, simple_loss=0.1193, pruned_loss=0.0337, audio_tagging_loss=0.0117, over 3047413.89 frames. ], batch size: 58, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:27:16,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=332293.3333333333, ans=0.125 2023-11-18 17:27:19,970 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=22.5 2023-11-18 17:27:23,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=332293.3333333333, ans=0.125 2023-11-18 17:27:37,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=332426.6666666667, ans=0.125 2023-11-18 17:27:50,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.14 vs. limit=22.5 2023-11-18 17:28:07,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.98 vs. limit=15.0 2023-11-18 17:28:11,844 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1800, loss[loss=0.1206, simple_loss=0.132, pruned_loss=0.04222, audio_tagging_loss=0.0124, over 15192.00 frames. ], tot_loss[loss=0.1049, simple_loss=0.1192, pruned_loss=0.03377, audio_tagging_loss=0.01158, over 3052411.05 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:28:32,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=332760.0, ans=0.125 2023-11-18 17:28:52,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=15.0 2023-11-18 17:28:59,147 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 9.003e+01 9.997e+01 1.070e+02 1.437e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-18 17:29:07,660 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1850, loss[loss=0.1067, simple_loss=0.127, pruned_loss=0.03057, audio_tagging_loss=0.01263, over 15868.00 frames. ], tot_loss[loss=0.1041, simple_loss=0.1183, pruned_loss=0.03334, audio_tagging_loss=0.01158, over 3044359.67 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:29:17,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=333026.6666666667, ans=10.0 2023-11-18 17:29:24,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=333026.6666666667, ans=0.0 2023-11-18 17:29:26,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=333026.6666666667, ans=0.125 2023-11-18 17:29:30,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=333093.3333333333, ans=0.125 2023-11-18 17:29:31,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=333093.3333333333, ans=0.05 2023-11-18 17:29:32,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=333093.3333333333, ans=0.1 2023-11-18 17:29:53,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.31 vs. limit=15.0 2023-11-18 17:29:55,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=333226.6666666667, ans=0.0 2023-11-18 17:30:00,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2023-11-18 17:30:04,070 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1900, loss[loss=0.08968, simple_loss=0.1055, pruned_loss=0.02469, audio_tagging_loss=0.01223, over 16259.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1168, pruned_loss=0.03282, audio_tagging_loss=0.01143, over 3036051.53 frames. ], batch size: 62, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:30:04,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=333293.3333333333, ans=0.1 2023-11-18 17:30:32,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=22.5 2023-11-18 17:30:36,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=333493.3333333333, ans=0.2 2023-11-18 17:30:50,670 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.811e+01 9.778e+01 1.068e+02 1.411e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-18 17:30:56,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=333560.0, ans=0.0 2023-11-18 17:30:59,231 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 1950, loss[loss=0.08537, simple_loss=0.09565, pruned_loss=0.02639, audio_tagging_loss=0.01116, over 15935.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1165, pruned_loss=0.03278, audio_tagging_loss=0.01134, over 3040916.67 frames. ], batch size: 63, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:31:20,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=333693.3333333333, ans=0.0 2023-11-18 17:31:22,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=333760.0, ans=0.125 2023-11-18 17:31:46,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=333893.3333333333, ans=0.0 2023-11-18 17:31:56,044 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2000, loss[loss=0.07849, simple_loss=0.08284, pruned_loss=0.02625, audio_tagging_loss=0.01082, over 15018.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1159, pruned_loss=0.03264, audio_tagging_loss=0.01136, over 3038117.79 frames. ], batch size: 58, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:32:03,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=333960.0, ans=0.125 2023-11-18 17:32:04,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=333960.0, ans=0.125 2023-11-18 17:32:06,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=334026.6666666667, ans=0.125 2023-11-18 17:32:14,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=334026.6666666667, ans=0.0 2023-11-18 17:32:36,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.5 2023-11-18 17:32:42,816 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 9.028e+01 9.681e+01 1.107e+02 1.913e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-18 17:32:51,953 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2050, loss[loss=0.1123, simple_loss=0.1386, pruned_loss=0.03278, audio_tagging_loss=0.01021, over 15920.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1172, pruned_loss=0.03291, audio_tagging_loss=0.01133, over 3047672.75 frames. ], batch size: 58, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:32:52,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=334293.3333333333, ans=0.0 2023-11-18 17:32:59,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=334293.3333333333, ans=0.125 2023-11-18 17:33:08,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=334360.0, ans=0.1 2023-11-18 17:33:33,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=334493.3333333333, ans=0.2 2023-11-18 17:33:36,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=334560.0, ans=0.0 2023-11-18 17:33:47,586 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2100, loss[loss=0.1297, simple_loss=0.1539, pruned_loss=0.04202, audio_tagging_loss=0.01079, over 15705.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.1181, pruned_loss=0.03309, audio_tagging_loss=0.01138, over 3039191.97 frames. ], batch size: 56, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:33:50,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.11 vs. limit=10.0 2023-11-18 17:33:52,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=334626.6666666667, ans=0.125 2023-11-18 17:34:24,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=334826.6666666667, ans=0.0 2023-11-18 17:34:25,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2023-11-18 17:34:29,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=334826.6666666667, ans=0.125 2023-11-18 17:34:30,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=334826.6666666667, ans=0.1 2023-11-18 17:34:34,905 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.950e+01 9.682e+01 1.081e+02 1.226e+02 1.656e+02, threshold=2.162e+02, percent-clipped=0.0 2023-11-18 17:34:42,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=334893.3333333333, ans=0.125 2023-11-18 17:34:44,635 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2150, loss[loss=0.08153, simple_loss=0.09442, pruned_loss=0.02308, audio_tagging_loss=0.01124, over 14182.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1178, pruned_loss=0.03296, audio_tagging_loss=0.01142, over 3041301.65 frames. ], batch size: 58, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:34:49,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=334960.0, ans=0.0 2023-11-18 17:34:53,664 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.49 vs. limit=22.5 2023-11-18 17:34:55,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=335026.6666666667, ans=0.125 2023-11-18 17:35:10,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2023-11-18 17:35:11,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=335093.3333333333, ans=0.125 2023-11-18 17:35:11,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=335093.3333333333, ans=0.125 2023-11-18 17:35:11,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=335093.3333333333, ans=0.1 2023-11-18 17:35:13,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=335093.3333333333, ans=10.0 2023-11-18 17:35:17,635 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:35:23,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=335160.0, ans=0.0 2023-11-18 17:35:24,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=335160.0, ans=0.125 2023-11-18 17:35:25,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-11-18 17:35:28,972 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.22 vs. limit=15.0 2023-11-18 17:35:30,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=335226.6666666667, ans=0.125 2023-11-18 17:35:32,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=335226.6666666667, ans=0.1 2023-11-18 17:35:39,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.71 vs. limit=22.5 2023-11-18 17:35:39,925 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2200, loss[loss=0.1276, simple_loss=0.1403, pruned_loss=0.04546, audio_tagging_loss=0.01201, over 15160.00 frames. ], tot_loss[loss=0.103, simple_loss=0.1172, pruned_loss=0.0329, audio_tagging_loss=0.0115, over 3047494.70 frames. ], batch size: 57, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:35:40,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=335293.3333333333, ans=0.09899494936611666 2023-11-18 17:35:44,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=335293.3333333333, ans=0.125 2023-11-18 17:35:56,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=335360.0, ans=0.125 2023-11-18 17:36:20,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=335493.3333333333, ans=0.09899494936611666 2023-11-18 17:36:27,475 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.184e+01 9.433e+01 1.069e+02 1.154e+02 1.802e+02, threshold=2.138e+02, percent-clipped=0.0 2023-11-18 17:36:31,017 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=335560.0, ans=0.5 2023-11-18 17:36:36,106 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2250, loss[loss=0.1035, simple_loss=0.1151, pruned_loss=0.03528, audio_tagging_loss=0.01065, over 15089.00 frames. ], tot_loss[loss=0.1034, simple_loss=0.1175, pruned_loss=0.03316, audio_tagging_loss=0.01144, over 3049837.37 frames. ], batch size: 57, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:36:46,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=335693.3333333333, ans=0.125 2023-11-18 17:37:12,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-11-18 17:37:19,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=335826.6666666667, ans=0.0 2023-11-18 17:37:28,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=335893.3333333333, ans=0.125 2023-11-18 17:37:33,140 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2300, loss[loss=0.1072, simple_loss=0.1143, pruned_loss=0.03842, audio_tagging_loss=0.01166, over 16908.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1168, pruned_loss=0.03275, audio_tagging_loss=0.0115, over 3045963.28 frames. ], batch size: 62, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:37:38,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=335960.0, ans=0.0 2023-11-18 17:37:45,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=15.0 2023-11-18 17:38:20,664 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 9.497e+01 1.027e+02 1.187e+02 1.652e+02, threshold=2.054e+02, percent-clipped=0.0 2023-11-18 17:38:22,769 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:38:29,672 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2350, loss[loss=0.111, simple_loss=0.1229, pruned_loss=0.03621, audio_tagging_loss=0.01333, over 14885.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1171, pruned_loss=0.03279, audio_tagging_loss=0.01157, over 3042075.52 frames. ], batch size: 56, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:38:42,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=336360.0, ans=0.125 2023-11-18 17:38:51,151 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:39:00,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2023-11-18 17:39:01,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=336426.6666666667, ans=0.0 2023-11-18 17:39:25,502 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2400, loss[loss=0.1125, simple_loss=0.1309, pruned_loss=0.03611, audio_tagging_loss=0.01094, over 16365.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1185, pruned_loss=0.03312, audio_tagging_loss=0.01168, over 3047110.47 frames. ], batch size: 61, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:39:27,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=336626.6666666667, ans=0.125 2023-11-18 17:39:47,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.45 vs. limit=10.0 2023-11-18 17:39:48,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=336760.0, ans=0.125 2023-11-18 17:39:54,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=336760.0, ans=0.1 2023-11-18 17:39:54,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=336760.0, ans=0.125 2023-11-18 17:40:13,220 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 9.087e+01 9.616e+01 1.102e+02 1.303e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-18 17:40:21,843 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2450, loss[loss=0.09791, simple_loss=0.1077, pruned_loss=0.02835, audio_tagging_loss=0.01572, over 15236.00 frames. ], tot_loss[loss=0.1039, simple_loss=0.1183, pruned_loss=0.03302, audio_tagging_loss=0.01174, over 3044126.58 frames. ], batch size: 60, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:40:37,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.12 vs. limit=15.0 2023-11-18 17:40:51,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=337093.3333333333, ans=0.125 2023-11-18 17:41:03,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=337160.0, ans=0.0 2023-11-18 17:41:04,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2023-11-18 17:41:11,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=337226.6666666667, ans=0.0 2023-11-18 17:41:17,247 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2500, loss[loss=0.09898, simple_loss=0.1165, pruned_loss=0.0287, audio_tagging_loss=0.01201, over 14339.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1177, pruned_loss=0.03303, audio_tagging_loss=0.0118, over 3040606.90 frames. ], batch size: 53, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:41:30,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=12.0 2023-11-18 17:41:35,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=337360.0, ans=0.0 2023-11-18 17:41:45,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=337426.6666666667, ans=0.0 2023-11-18 17:41:50,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=337493.3333333333, ans=0.125 2023-11-18 17:42:05,673 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 9.278e+01 1.050e+02 1.171e+02 1.497e+02, threshold=2.099e+02, percent-clipped=0.0 2023-11-18 17:42:13,639 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2550, loss[loss=0.1229, simple_loss=0.1439, pruned_loss=0.04135, audio_tagging_loss=0.009661, over 15935.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1172, pruned_loss=0.03286, audio_tagging_loss=0.01167, over 3044807.81 frames. ], batch size: 59, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:42:34,009 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:42:51,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=337826.6666666667, ans=0.0 2023-11-18 17:43:04,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=337893.3333333333, ans=0.125 2023-11-18 17:43:10,216 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2600, loss[loss=0.08969, simple_loss=0.09974, pruned_loss=0.02806, audio_tagging_loss=0.01176, over 14403.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1161, pruned_loss=0.03258, audio_tagging_loss=0.01157, over 3042282.66 frames. ], batch size: 55, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:43:11,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=337960.0, ans=0.1 2023-11-18 17:43:41,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=338093.3333333333, ans=0.125 2023-11-18 17:43:57,878 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.138e+01 8.896e+01 9.646e+01 1.065e+02 1.578e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-18 17:44:05,249 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2650, loss[loss=0.1041, simple_loss=0.1276, pruned_loss=0.03041, audio_tagging_loss=0.009933, over 15502.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1168, pruned_loss=0.03262, audio_tagging_loss=0.01141, over 3044768.71 frames. ], batch size: 57, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:44:07,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=338293.3333333333, ans=0.125 2023-11-18 17:44:10,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=338293.3333333333, ans=0.1 2023-11-18 17:44:14,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=338293.3333333333, ans=0.1 2023-11-18 17:44:18,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=338360.0, ans=0.04949747468305833 2023-11-18 17:44:25,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=338360.0, ans=0.0 2023-11-18 17:44:25,197 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=15.0 2023-11-18 17:44:39,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.07 vs. limit=15.0 2023-11-18 17:45:01,126 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2700, loss[loss=0.08717, simple_loss=0.09837, pruned_loss=0.024, audio_tagging_loss=0.01399, over 15019.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1171, pruned_loss=0.03269, audio_tagging_loss=0.0114, over 3047512.24 frames. ], batch size: 56, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:45:01,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=338626.6666666667, ans=0.125 2023-11-18 17:45:13,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=338693.3333333333, ans=0.125 2023-11-18 17:45:15,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=338693.3333333333, ans=0.125 2023-11-18 17:45:21,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=338693.3333333333, ans=0.2 2023-11-18 17:45:37,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=338826.6666666667, ans=0.2 2023-11-18 17:45:49,090 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.160e+01 8.914e+01 9.942e+01 1.124e+02 1.692e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-18 17:45:56,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=338960.0, ans=0.07 2023-11-18 17:45:57,621 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2750, loss[loss=0.07569, simple_loss=0.08754, pruned_loss=0.02224, audio_tagging_loss=0.009679, over 15369.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1177, pruned_loss=0.03284, audio_tagging_loss=0.01142, over 3048599.29 frames. ], batch size: 59, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:45:58,200 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2023-11-18 17:46:04,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=338960.0, ans=0.125 2023-11-18 17:46:09,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2023-11-18 17:46:22,074 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2023-11-18 17:46:27,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=339093.3333333333, ans=0.125 2023-11-18 17:46:45,064 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:46:50,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=339226.6666666667, ans=0.0 2023-11-18 17:46:52,443 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2800, loss[loss=0.08396, simple_loss=0.09987, pruned_loss=0.02364, audio_tagging_loss=0.01038, over 15178.00 frames. ], tot_loss[loss=0.1036, simple_loss=0.1183, pruned_loss=0.03313, audio_tagging_loss=0.01133, over 3048896.36 frames. ], batch size: 60, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:47:03,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=339360.0, ans=0.125 2023-11-18 17:47:07,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=12.0 2023-11-18 17:47:08,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=339360.0, ans=0.125 2023-11-18 17:47:24,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.24 vs. limit=15.0 2023-11-18 17:47:32,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=339493.3333333333, ans=0.0 2023-11-18 17:47:39,926 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 9.210e+01 1.044e+02 1.186e+02 2.162e+02, threshold=2.088e+02, percent-clipped=1.0 2023-11-18 17:47:47,891 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2850, loss[loss=0.0957, simple_loss=0.1081, pruned_loss=0.02957, audio_tagging_loss=0.01209, over 14191.00 frames. ], tot_loss[loss=0.1036, simple_loss=0.1181, pruned_loss=0.03324, audio_tagging_loss=0.01126, over 3040656.79 frames. ], batch size: 55, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:47:59,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.58 vs. limit=22.5 2023-11-18 17:48:15,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=339760.0, ans=0.125 2023-11-18 17:48:28,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=339826.6666666667, ans=0.1 2023-11-18 17:48:39,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=339893.3333333333, ans=0.05 2023-11-18 17:48:44,329 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2900, loss[loss=0.1111, simple_loss=0.1334, pruned_loss=0.03038, audio_tagging_loss=0.01399, over 14921.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1179, pruned_loss=0.03298, audio_tagging_loss=0.01129, over 3046165.66 frames. ], batch size: 56, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:48:47,379 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:49:02,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2023-11-18 17:49:03,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=340026.6666666667, ans=0.125 2023-11-18 17:49:13,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=340093.3333333333, ans=0.0 2023-11-18 17:49:18,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-11-18 17:49:33,078 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 9.214e+01 1.048e+02 1.170e+02 1.772e+02, threshold=2.096e+02, percent-clipped=0.0 2023-11-18 17:49:40,490 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 2950, loss[loss=0.1076, simple_loss=0.1239, pruned_loss=0.03465, audio_tagging_loss=0.011, over 15161.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.1181, pruned_loss=0.03312, audio_tagging_loss=0.01135, over 3046119.54 frames. ], batch size: 57, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:49:45,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2023-11-18 17:49:48,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.82 vs. limit=22.5 2023-11-18 17:49:56,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=340360.0, ans=0.0 2023-11-18 17:50:08,086 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.96 vs. limit=12.0 2023-11-18 17:50:11,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=340426.6666666667, ans=0.0 2023-11-18 17:50:36,777 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3000, loss[loss=0.1148, simple_loss=0.1242, pruned_loss=0.04119, audio_tagging_loss=0.01153, over 14349.00 frames. ], tot_loss[loss=0.103, simple_loss=0.1175, pruned_loss=0.03278, audio_tagging_loss=0.01149, over 3035444.63 frames. ], batch size: 55, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:50:36,778 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 17:51:09,278 INFO [train_asr.py:1147] (1/4) Epoch 5, validation: loss=0.07345, simple_loss=0.06093, pruned_loss=0.009446, audio_tagging_loss=0.03354, over 4681554.00 frames. 2023-11-18 17:51:09,279 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 17:51:24,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=340693.3333333333, ans=0.125 2023-11-18 17:51:32,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2023-11-18 17:51:46,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=340826.6666666667, ans=0.125 2023-11-18 17:51:56,992 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 9.087e+01 9.878e+01 1.115e+02 1.743e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-18 17:51:59,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=340893.3333333333, ans=0.125 2023-11-18 17:52:04,481 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3050, loss[loss=0.09489, simple_loss=0.1058, pruned_loss=0.02842, audio_tagging_loss=0.01356, over 15637.00 frames. ], tot_loss[loss=0.1036, simple_loss=0.1183, pruned_loss=0.03295, audio_tagging_loss=0.01152, over 3046826.94 frames. ], batch size: 61, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:52:36,792 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:52:40,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=341160.0, ans=0.125 2023-11-18 17:52:59,728 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3100, loss[loss=0.1214, simple_loss=0.1428, pruned_loss=0.03901, audio_tagging_loss=0.01099, over 15272.00 frames. ], tot_loss[loss=0.1047, simple_loss=0.1195, pruned_loss=0.03335, audio_tagging_loss=0.01156, over 3050062.07 frames. ], batch size: 55, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:53:36,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=341493.3333333333, ans=0.125 2023-11-18 17:53:41,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=341493.3333333333, ans=0.0 2023-11-18 17:53:42,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=341493.3333333333, ans=0.95 2023-11-18 17:53:44,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=341560.0, ans=0.0 2023-11-18 17:53:47,379 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 9.303e+01 9.886e+01 1.114e+02 1.331e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-18 17:53:48,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=341560.0, ans=0.125 2023-11-18 17:53:50,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=341560.0, ans=0.0 2023-11-18 17:53:55,383 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3150, loss[loss=0.09398, simple_loss=0.1001, pruned_loss=0.02948, audio_tagging_loss=0.01443, over 15121.00 frames. ], tot_loss[loss=0.1041, simple_loss=0.1185, pruned_loss=0.03308, audio_tagging_loss=0.01177, over 3044892.25 frames. ], batch size: 60, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:54:39,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=341893.3333333333, ans=0.0 2023-11-18 17:54:48,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=341893.3333333333, ans=0.125 2023-11-18 17:54:51,848 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3200, loss[loss=0.07128, simple_loss=0.0783, pruned_loss=0.01901, audio_tagging_loss=0.01312, over 14668.00 frames. ], tot_loss[loss=0.1045, simple_loss=0.1192, pruned_loss=0.03328, audio_tagging_loss=0.01165, over 3051211.95 frames. ], batch size: 56, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:54:59,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=341960.0, ans=0.0 2023-11-18 17:55:09,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=342026.6666666667, ans=0.07 2023-11-18 17:55:10,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.60 vs. limit=6.0 2023-11-18 17:55:13,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=342093.3333333333, ans=0.1 2023-11-18 17:55:31,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=342160.0, ans=0.0 2023-11-18 17:55:39,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 9.174e+01 9.896e+01 1.084e+02 1.894e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-18 17:55:47,524 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3250, loss[loss=0.09131, simple_loss=0.1037, pruned_loss=0.02897, audio_tagging_loss=0.01049, over 15807.00 frames. ], tot_loss[loss=0.1043, simple_loss=0.1191, pruned_loss=0.03307, audio_tagging_loss=0.01165, over 3049579.67 frames. ], batch size: 62, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:55:56,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=342293.3333333333, ans=0.0 2023-11-18 17:55:57,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=342360.0, ans=0.125 2023-11-18 17:56:05,943 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=22.5 2023-11-18 17:56:07,805 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=342360.0, ans=0.125 2023-11-18 17:56:11,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=342426.6666666667, ans=0.125 2023-11-18 17:56:21,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=342493.3333333333, ans=10.0 2023-11-18 17:56:21,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.27 vs. limit=6.0 2023-11-18 17:56:31,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2023-11-18 17:56:42,533 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3300, loss[loss=0.07441, simple_loss=0.07816, pruned_loss=0.0199, audio_tagging_loss=0.01543, over 16033.00 frames. ], tot_loss[loss=0.1039, simple_loss=0.1186, pruned_loss=0.03281, audio_tagging_loss=0.01177, over 3055550.71 frames. ], batch size: 60, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:56:42,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=342626.6666666667, ans=0.125 2023-11-18 17:56:50,485 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:56:51,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=342626.6666666667, ans=0.125 2023-11-18 17:56:54,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342693.3333333333, ans=0.1 2023-11-18 17:57:00,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=342693.3333333333, ans=0.0 2023-11-18 17:57:01,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.93 vs. limit=22.5 2023-11-18 17:57:08,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=342760.0, ans=0.1 2023-11-18 17:57:30,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=342893.3333333333, ans=0.125 2023-11-18 17:57:31,971 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.617e+01 9.162e+01 1.022e+02 1.144e+02 1.543e+02, threshold=2.045e+02, percent-clipped=0.0 2023-11-18 17:57:39,985 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3350, loss[loss=0.1285, simple_loss=0.1412, pruned_loss=0.04529, audio_tagging_loss=0.01259, over 15514.00 frames. ], tot_loss[loss=0.1044, simple_loss=0.1194, pruned_loss=0.03308, audio_tagging_loss=0.01161, over 3056157.08 frames. ], batch size: 57, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:57:43,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2023-11-18 17:57:53,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=343026.6666666667, ans=0.0 2023-11-18 17:58:04,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.07 vs. limit=15.0 2023-11-18 17:58:06,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=343093.3333333333, ans=0.0 2023-11-18 17:58:29,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=343226.6666666667, ans=0.125 2023-11-18 17:58:34,533 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=343293.3333333333, ans=0.2 2023-11-18 17:58:35,870 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3400, loss[loss=0.1093, simple_loss=0.1265, pruned_loss=0.03475, audio_tagging_loss=0.01132, over 15824.00 frames. ], tot_loss[loss=0.1046, simple_loss=0.1199, pruned_loss=0.03316, audio_tagging_loss=0.0115, over 3055295.03 frames. ], batch size: 60, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:58:57,434 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=343426.6666666667, ans=0.125 2023-11-18 17:58:58,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=343426.6666666667, ans=0.1 2023-11-18 17:59:06,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-11-18 17:59:16,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=343493.3333333333, ans=0.04949747468305833 2023-11-18 17:59:17,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=343493.3333333333, ans=0.125 2023-11-18 17:59:23,780 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 9.786e+01 1.073e+02 1.222e+02 1.705e+02, threshold=2.147e+02, percent-clipped=0.0 2023-11-18 17:59:27,228 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:59:31,139 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3450, loss[loss=0.1022, simple_loss=0.1136, pruned_loss=0.03246, audio_tagging_loss=0.01293, over 14517.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.1187, pruned_loss=0.03293, audio_tagging_loss=0.0115, over 3050067.79 frames. ], batch size: 54, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:59:39,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=343626.6666666667, ans=0.1 2023-11-18 17:59:54,702 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.19 vs. limit=15.0 2023-11-18 18:00:00,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=343760.0, ans=0.125 2023-11-18 18:00:02,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=343760.0, ans=0.0 2023-11-18 18:00:11,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=343826.6666666667, ans=0.125 2023-11-18 18:00:17,210 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.65 vs. limit=15.0 2023-11-18 18:00:20,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=343893.3333333333, ans=0.125 2023-11-18 18:00:24,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=343893.3333333333, ans=0.0 2023-11-18 18:00:27,465 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3500, loss[loss=0.1018, simple_loss=0.1098, pruned_loss=0.03007, audio_tagging_loss=0.01679, over 15113.00 frames. ], tot_loss[loss=0.1044, simple_loss=0.1194, pruned_loss=0.0333, audio_tagging_loss=0.01138, over 3056894.65 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:00:27,730 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:00:33,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=343960.0, ans=0.1 2023-11-18 18:00:42,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=344026.6666666667, ans=0.0 2023-11-18 18:00:43,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=344026.6666666667, ans=0.125 2023-11-18 18:00:53,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=344093.3333333333, ans=0.2 2023-11-18 18:00:55,474 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:01:05,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=344160.0, ans=0.125 2023-11-18 18:01:16,266 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 9.216e+01 1.044e+02 1.195e+02 1.654e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 18:01:19,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=344226.6666666667, ans=0.5 2023-11-18 18:01:23,670 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3550, loss[loss=0.09617, simple_loss=0.1092, pruned_loss=0.02912, audio_tagging_loss=0.01247, over 14196.00 frames. ], tot_loss[loss=0.1034, simple_loss=0.118, pruned_loss=0.03294, audio_tagging_loss=0.0114, over 3062510.96 frames. ], batch size: 54, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:01:27,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=344293.3333333333, ans=0.95 2023-11-18 18:01:46,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=344426.6666666667, ans=0.0 2023-11-18 18:01:51,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=344426.6666666667, ans=0.125 2023-11-18 18:02:06,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=344493.3333333333, ans=0.2 2023-11-18 18:02:19,415 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3600, loss[loss=0.08951, simple_loss=0.1146, pruned_loss=0.02436, audio_tagging_loss=0.007835, over 16277.00 frames. ], tot_loss[loss=0.1036, simple_loss=0.1186, pruned_loss=0.03301, audio_tagging_loss=0.01128, over 3056156.97 frames. ], batch size: 60, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:02:25,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=344626.6666666667, ans=0.1 2023-11-18 18:02:39,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.02 vs. limit=15.0 2023-11-18 18:02:41,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=344760.0, ans=0.0 2023-11-18 18:02:47,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=344760.0, ans=0.1 2023-11-18 18:02:54,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=344826.6666666667, ans=0.0 2023-11-18 18:03:08,090 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.942e+01 9.105e+01 1.020e+02 1.125e+02 1.503e+02, threshold=2.039e+02, percent-clipped=0.0 2023-11-18 18:03:12,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.98 vs. limit=22.5 2023-11-18 18:03:16,178 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3650, loss[loss=0.1053, simple_loss=0.121, pruned_loss=0.03535, audio_tagging_loss=0.009486, over 15560.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.1187, pruned_loss=0.03322, audio_tagging_loss=0.01127, over 3050259.80 frames. ], batch size: 59, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:03:28,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=345026.6666666667, ans=0.125 2023-11-18 18:03:31,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=345026.6666666667, ans=0.0 2023-11-18 18:03:31,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=345026.6666666667, ans=0.5 2023-11-18 18:03:35,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=345026.6666666667, ans=0.0 2023-11-18 18:03:38,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=345093.3333333333, ans=0.1 2023-11-18 18:03:57,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=345160.0, ans=0.125 2023-11-18 18:04:01,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2023-11-18 18:04:11,801 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3700, loss[loss=0.1274, simple_loss=0.1536, pruned_loss=0.03892, audio_tagging_loss=0.01168, over 16371.00 frames. ], tot_loss[loss=0.1039, simple_loss=0.1187, pruned_loss=0.03322, audio_tagging_loss=0.01133, over 3052993.88 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:04:21,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=345293.3333333333, ans=0.2 2023-11-18 18:05:00,335 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.024e+01 9.436e+01 1.012e+02 1.107e+02 1.712e+02, threshold=2.024e+02, percent-clipped=0.0 2023-11-18 18:05:07,826 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3750, loss[loss=0.1114, simple_loss=0.1319, pruned_loss=0.03522, audio_tagging_loss=0.01027, over 15199.00 frames. ], tot_loss[loss=0.1049, simple_loss=0.1198, pruned_loss=0.03361, audio_tagging_loss=0.0114, over 3059504.07 frames. ], batch size: 55, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:05:34,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=345760.0, ans=0.0 2023-11-18 18:05:35,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=345760.0, ans=0.0 2023-11-18 18:05:45,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=345826.6666666667, ans=0.125 2023-11-18 18:05:46,201 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:05:57,626 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=345893.3333333333, ans=0.1 2023-11-18 18:06:04,348 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3800, loss[loss=0.08826, simple_loss=0.1002, pruned_loss=0.02759, audio_tagging_loss=0.01058, over 15073.00 frames. ], tot_loss[loss=0.1047, simple_loss=0.1194, pruned_loss=0.03349, audio_tagging_loss=0.01152, over 3054002.14 frames. ], batch size: 57, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:06:04,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=345960.0, ans=10.0 2023-11-18 18:06:05,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=345960.0, ans=0.125 2023-11-18 18:06:05,720 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:06:41,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=346160.0, ans=0.125 2023-11-18 18:06:46,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2023-11-18 18:06:50,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2023-11-18 18:06:52,215 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 9.330e+01 1.017e+02 1.159e+02 1.442e+02, threshold=2.034e+02, percent-clipped=0.0 2023-11-18 18:06:59,704 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3850, loss[loss=0.09081, simple_loss=0.1006, pruned_loss=0.02753, audio_tagging_loss=0.013, over 14818.00 frames. ], tot_loss[loss=0.1048, simple_loss=0.1195, pruned_loss=0.03345, audio_tagging_loss=0.01154, over 3050112.54 frames. ], batch size: 57, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:07:07,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2023-11-18 18:07:13,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=346360.0, ans=15.0 2023-11-18 18:07:16,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=346360.0, ans=0.1 2023-11-18 18:07:19,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=15.0 2023-11-18 18:07:26,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=346426.6666666667, ans=0.2 2023-11-18 18:07:31,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=346426.6666666667, ans=0.125 2023-11-18 18:07:46,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=346560.0, ans=0.125 2023-11-18 18:07:54,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=346626.6666666667, ans=0.05 2023-11-18 18:07:55,542 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3900, loss[loss=0.1089, simple_loss=0.1347, pruned_loss=0.03332, audio_tagging_loss=0.008263, over 15270.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.1199, pruned_loss=0.0338, audio_tagging_loss=0.01152, over 3045580.32 frames. ], batch size: 58, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:08:00,070 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:08:10,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.79 vs. limit=15.0 2023-11-18 18:08:12,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=346693.3333333333, ans=0.1 2023-11-18 18:08:24,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=346760.0, ans=0.125 2023-11-18 18:08:45,462 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 9.163e+01 1.014e+02 1.129e+02 1.556e+02, threshold=2.028e+02, percent-clipped=0.0 2023-11-18 18:08:53,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=346960.0, ans=0.125 2023-11-18 18:08:53,953 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 3950, loss[loss=0.09112, simple_loss=0.1003, pruned_loss=0.02764, audio_tagging_loss=0.01334, over 15495.00 frames. ], tot_loss[loss=0.1039, simple_loss=0.1181, pruned_loss=0.03311, audio_tagging_loss=0.01179, over 3048124.38 frames. ], batch size: 60, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:08:55,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=346960.0, ans=0.09899494936611666 2023-11-18 18:09:03,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=347026.6666666667, ans=0.125 2023-11-18 18:09:04,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.09 vs. limit=15.0 2023-11-18 18:09:14,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=347093.3333333333, ans=0.0 2023-11-18 18:09:25,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=15.0 2023-11-18 18:09:26,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=347160.0, ans=0.2 2023-11-18 18:09:49,355 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4000, loss[loss=0.1111, simple_loss=0.1234, pruned_loss=0.03715, audio_tagging_loss=0.01225, over 14720.00 frames. ], tot_loss[loss=0.1045, simple_loss=0.1189, pruned_loss=0.03342, audio_tagging_loss=0.01169, over 3044568.70 frames. ], batch size: 55, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:09:57,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-11-18 18:09:59,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=347360.0, ans=10.0 2023-11-18 18:10:25,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=347493.3333333333, ans=0.125 2023-11-18 18:10:26,232 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2023-11-18 18:10:30,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=347493.3333333333, ans=0.1 2023-11-18 18:10:37,501 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 9.209e+01 1.022e+02 1.112e+02 1.476e+02, threshold=2.045e+02, percent-clipped=0.0 2023-11-18 18:10:40,858 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2023-11-18 18:10:46,046 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4050, loss[loss=0.128, simple_loss=0.1365, pruned_loss=0.04647, audio_tagging_loss=0.01327, over 14899.00 frames. ], tot_loss[loss=0.1049, simple_loss=0.1197, pruned_loss=0.03346, audio_tagging_loss=0.01162, over 3041226.85 frames. ], batch size: 55, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:10:47,156 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:10:47,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=347626.6666666667, ans=0.025 2023-11-18 18:11:00,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=347693.3333333333, ans=0.0 2023-11-18 18:11:00,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=347693.3333333333, ans=0.0 2023-11-18 18:11:07,085 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=15.0 2023-11-18 18:11:23,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=347826.6666666667, ans=0.0 2023-11-18 18:11:31,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2023-11-18 18:11:40,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=347893.3333333333, ans=0.0 2023-11-18 18:11:42,703 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4100, loss[loss=0.1018, simple_loss=0.1055, pruned_loss=0.03517, audio_tagging_loss=0.01386, over 14525.00 frames. ], tot_loss[loss=0.1055, simple_loss=0.1204, pruned_loss=0.03366, audio_tagging_loss=0.01161, over 3042023.23 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:11:49,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=347960.0, ans=0.1 2023-11-18 18:11:51,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=347960.0, ans=0.2 2023-11-18 18:11:51,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=347960.0, ans=0.2 2023-11-18 18:11:55,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=348026.6666666667, ans=0.0 2023-11-18 18:12:32,087 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 9.111e+01 1.020e+02 1.147e+02 2.406e+02, threshold=2.040e+02, percent-clipped=1.0 2023-11-18 18:12:38,566 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4150, loss[loss=0.1024, simple_loss=0.1169, pruned_loss=0.03209, audio_tagging_loss=0.01188, over 15287.00 frames. ], tot_loss[loss=0.1044, simple_loss=0.1191, pruned_loss=0.03334, audio_tagging_loss=0.01152, over 3039881.28 frames. ], batch size: 58, lr: 1.38e-02, grad_scale: 16.0 2023-11-18 18:12:52,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2023-11-18 18:12:59,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=348426.6666666667, ans=0.125 2023-11-18 18:13:03,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=348426.6666666667, ans=0.0 2023-11-18 18:13:05,034 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.45 vs. limit=22.5 2023-11-18 18:13:06,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=348426.6666666667, ans=0.0 2023-11-18 18:13:11,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-11-18 18:13:12,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=348493.3333333333, ans=0.125 2023-11-18 18:13:13,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=348493.3333333333, ans=0.0 2023-11-18 18:13:17,905 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:13:20,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=348493.3333333333, ans=0.125 2023-11-18 18:13:20,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=348493.3333333333, ans=0.0 2023-11-18 18:13:24,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.17 vs. limit=10.0 2023-11-18 18:13:34,492 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4200, loss[loss=0.1006, simple_loss=0.1155, pruned_loss=0.0338, audio_tagging_loss=0.00906, over 15192.00 frames. ], tot_loss[loss=0.1043, simple_loss=0.1192, pruned_loss=0.03329, audio_tagging_loss=0.01137, over 3051749.81 frames. ], batch size: 58, lr: 1.38e-02, grad_scale: 16.0 2023-11-18 18:13:36,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=348626.6666666667, ans=0.125 2023-11-18 18:13:41,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=348626.6666666667, ans=0.95 2023-11-18 18:13:46,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=348693.3333333333, ans=0.2 2023-11-18 18:13:49,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2023-11-18 18:13:51,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=348693.3333333333, ans=0.125 2023-11-18 18:13:51,445 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.62 vs. limit=22.5 2023-11-18 18:13:58,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=348760.0, ans=0.0 2023-11-18 18:14:08,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=348826.6666666667, ans=0.1 2023-11-18 18:14:22,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.57 vs. limit=12.0 2023-11-18 18:14:23,314 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 9.025e+01 9.781e+01 1.065e+02 1.508e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-18 18:14:30,712 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4250, loss[loss=0.08366, simple_loss=0.0881, pruned_loss=0.02787, audio_tagging_loss=0.01174, over 14199.00 frames. ], tot_loss[loss=0.105, simple_loss=0.1199, pruned_loss=0.03373, audio_tagging_loss=0.01134, over 3048671.39 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 16.0 2023-11-18 18:14:49,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=349026.6666666667, ans=0.07 2023-11-18 18:14:59,642 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.562e-02 2023-11-18 18:15:10,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=349160.0, ans=0.125 2023-11-18 18:15:10,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=349160.0, ans=0.2 2023-11-18 18:15:20,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.96 vs. limit=10.0 2023-11-18 18:15:22,629 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2023-11-18 18:15:26,398 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4300, loss[loss=0.1184, simple_loss=0.1409, pruned_loss=0.03936, audio_tagging_loss=0.008573, over 14803.00 frames. ], tot_loss[loss=0.1059, simple_loss=0.121, pruned_loss=0.03426, audio_tagging_loss=0.01116, over 3048458.19 frames. ], batch size: 55, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:15:27,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=349293.3333333333, ans=0.1 2023-11-18 18:15:40,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=349360.0, ans=0.0 2023-11-18 18:15:43,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=349360.0, ans=0.125 2023-11-18 18:16:15,085 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.043e+01 8.907e+01 1.023e+02 1.143e+02 1.661e+02, threshold=2.046e+02, percent-clipped=0.0 2023-11-18 18:16:18,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.32 vs. limit=10.0 2023-11-18 18:16:21,903 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4350, loss[loss=0.09531, simple_loss=0.1099, pruned_loss=0.02768, audio_tagging_loss=0.01267, over 15526.00 frames. ], tot_loss[loss=0.1054, simple_loss=0.121, pruned_loss=0.03385, audio_tagging_loss=0.01108, over 3047800.02 frames. ], batch size: 59, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:16:52,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=349760.0, ans=0.05 2023-11-18 18:16:57,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=349826.6666666667, ans=0.09899494936611666 2023-11-18 18:17:13,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=349893.3333333333, ans=22.5 2023-11-18 18:17:17,985 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4400, loss[loss=0.09793, simple_loss=0.1154, pruned_loss=0.03199, audio_tagging_loss=0.008226, over 15596.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.1223, pruned_loss=0.03415, audio_tagging_loss=0.01106, over 3046862.04 frames. ], batch size: 57, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:17:18,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.70 vs. limit=12.0 2023-11-18 18:17:24,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=349960.0, ans=0.125 2023-11-18 18:17:27,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=349960.0, ans=0.0 2023-11-18 18:17:34,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=350026.6666666667, ans=0.0 2023-11-18 18:17:34,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=350026.6666666667, ans=0.5 2023-11-18 18:17:48,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=350093.3333333333, ans=0.2 2023-11-18 18:17:55,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=350160.0, ans=0.2 2023-11-18 18:18:01,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=350226.6666666667, ans=0.1 2023-11-18 18:18:04,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2023-11-18 18:18:07,460 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 9.072e+01 1.020e+02 1.136e+02 1.526e+02, threshold=2.040e+02, percent-clipped=0.0 2023-11-18 18:18:12,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.01 vs. limit=15.0 2023-11-18 18:18:13,846 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4450, loss[loss=0.1261, simple_loss=0.158, pruned_loss=0.03798, audio_tagging_loss=0.009073, over 16032.00 frames. ], tot_loss[loss=0.1058, simple_loss=0.1219, pruned_loss=0.03383, audio_tagging_loss=0.01098, over 3048976.62 frames. ], batch size: 57, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:18:21,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=350293.3333333333, ans=0.125 2023-11-18 18:18:28,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=350360.0, ans=0.125 2023-11-18 18:18:36,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=350426.6666666667, ans=0.2 2023-11-18 18:18:38,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=350426.6666666667, ans=0.125 2023-11-18 18:18:41,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=350426.6666666667, ans=0.1 2023-11-18 18:18:50,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=350493.3333333333, ans=0.0 2023-11-18 18:18:54,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=12.0 2023-11-18 18:19:08,906 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4500, loss[loss=0.07616, simple_loss=0.08599, pruned_loss=0.02282, audio_tagging_loss=0.01034, over 16842.00 frames. ], tot_loss[loss=0.105, simple_loss=0.1206, pruned_loss=0.03362, audio_tagging_loss=0.01105, over 3050129.97 frames. ], batch size: 68, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:19:10,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2023-11-18 18:19:20,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=350693.3333333333, ans=0.07 2023-11-18 18:19:26,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=350693.3333333333, ans=0.125 2023-11-18 18:19:30,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=350693.3333333333, ans=0.125 2023-11-18 18:19:39,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=350760.0, ans=15.0 2023-11-18 18:19:41,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=350760.0, ans=0.0 2023-11-18 18:19:47,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.36 vs. limit=15.0 2023-11-18 18:19:50,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=350826.6666666667, ans=0.0 2023-11-18 18:19:58,691 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 9.243e+01 1.009e+02 1.115e+02 1.767e+02, threshold=2.018e+02, percent-clipped=0.0 2023-11-18 18:20:04,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=350960.0, ans=0.0 2023-11-18 18:20:05,097 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4550, loss[loss=0.1143, simple_loss=0.1268, pruned_loss=0.03833, audio_tagging_loss=0.01262, over 14771.00 frames. ], tot_loss[loss=0.1041, simple_loss=0.1193, pruned_loss=0.03334, audio_tagging_loss=0.01109, over 3052885.67 frames. ], batch size: 58, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:20:22,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=351026.6666666667, ans=0.125 2023-11-18 18:20:37,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=351093.3333333333, ans=0.125 2023-11-18 18:20:46,543 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:21:02,026 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4600, loss[loss=0.1223, simple_loss=0.149, pruned_loss=0.03483, audio_tagging_loss=0.01295, over 15730.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1191, pruned_loss=0.03324, audio_tagging_loss=0.01118, over 3050707.18 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:21:02,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=351293.3333333333, ans=0.125 2023-11-18 18:21:02,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=351293.3333333333, ans=0.125 2023-11-18 18:21:08,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.29 vs. limit=15.0 2023-11-18 18:21:11,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=351360.0, ans=0.1 2023-11-18 18:21:21,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=351360.0, ans=0.0 2023-11-18 18:21:21,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=351360.0, ans=0.0 2023-11-18 18:21:50,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.856e+01 9.307e+01 1.009e+02 1.129e+02 1.665e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-18 18:21:53,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=351560.0, ans=0.125 2023-11-18 18:21:57,345 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4650, loss[loss=0.1057, simple_loss=0.1336, pruned_loss=0.02829, audio_tagging_loss=0.01063, over 16053.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1192, pruned_loss=0.03318, audio_tagging_loss=0.01124, over 3048071.05 frames. ], batch size: 57, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:22:22,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.72 vs. limit=10.0 2023-11-18 18:22:27,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.70 vs. limit=22.5 2023-11-18 18:22:29,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=351760.0, ans=0.125 2023-11-18 18:22:39,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2023-11-18 18:22:52,863 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4700, loss[loss=0.1161, simple_loss=0.1317, pruned_loss=0.03759, audio_tagging_loss=0.01261, over 15973.00 frames. ], tot_loss[loss=0.1043, simple_loss=0.1193, pruned_loss=0.03321, audio_tagging_loss=0.01139, over 3050090.23 frames. ], batch size: 60, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:22:58,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=351960.0, ans=0.125 2023-11-18 18:23:03,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.43 vs. limit=15.0 2023-11-18 18:23:04,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=352026.6666666667, ans=0.09899494936611666 2023-11-18 18:23:23,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=352093.3333333333, ans=0.07 2023-11-18 18:23:35,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=352160.0, ans=0.0 2023-11-18 18:23:40,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=12.0 2023-11-18 18:23:42,223 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 9.156e+01 9.801e+01 1.107e+02 1.485e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-18 18:23:49,047 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4750, loss[loss=0.0879, simple_loss=0.08806, pruned_loss=0.02756, audio_tagging_loss=0.01632, over 14053.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1176, pruned_loss=0.03252, audio_tagging_loss=0.0115, over 3040663.22 frames. ], batch size: 55, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:24:10,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=352426.6666666667, ans=0.125 2023-11-18 18:24:15,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=352426.6666666667, ans=0.125 2023-11-18 18:24:18,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=352426.6666666667, ans=0.05 2023-11-18 18:24:32,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=352493.3333333333, ans=0.125 2023-11-18 18:24:45,207 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4800, loss[loss=0.08557, simple_loss=0.09932, pruned_loss=0.02402, audio_tagging_loss=0.0119, over 15420.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1168, pruned_loss=0.03223, audio_tagging_loss=0.01167, over 3045985.54 frames. ], batch size: 55, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:24:48,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=352626.6666666667, ans=0.0 2023-11-18 18:24:51,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=352626.6666666667, ans=0.0 2023-11-18 18:25:18,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=352826.6666666667, ans=0.2 2023-11-18 18:25:22,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=352826.6666666667, ans=0.125 2023-11-18 18:25:29,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=352893.3333333333, ans=0.5 2023-11-18 18:25:31,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352893.3333333333, ans=0.1 2023-11-18 18:25:32,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=15.0 2023-11-18 18:25:32,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2023-11-18 18:25:34,881 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.158e+01 9.461e+01 1.064e+02 1.235e+02 1.881e+02, threshold=2.128e+02, percent-clipped=0.0 2023-11-18 18:25:36,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=352893.3333333333, ans=0.0 2023-11-18 18:25:38,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=352893.3333333333, ans=0.125 2023-11-18 18:25:41,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.86 vs. limit=15.0 2023-11-18 18:25:41,305 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4850, loss[loss=0.09882, simple_loss=0.1136, pruned_loss=0.02977, audio_tagging_loss=0.01224, over 15617.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1166, pruned_loss=0.03222, audio_tagging_loss=0.01189, over 3042140.77 frames. ], batch size: 57, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:25:57,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=353026.6666666667, ans=0.1 2023-11-18 18:26:06,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=353093.3333333333, ans=0.2 2023-11-18 18:26:08,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=353093.3333333333, ans=0.125 2023-11-18 18:26:08,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=353093.3333333333, ans=0.1 2023-11-18 18:26:37,627 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4900, loss[loss=0.08762, simple_loss=0.09259, pruned_loss=0.02709, audio_tagging_loss=0.01423, over 14331.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1167, pruned_loss=0.03217, audio_tagging_loss=0.01177, over 3046316.47 frames. ], batch size: 55, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:26:40,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=353293.3333333333, ans=0.0 2023-11-18 18:26:52,495 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.02 vs. limit=15.0 2023-11-18 18:27:09,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.18 vs. limit=6.0 2023-11-18 18:27:28,032 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 9.441e+01 1.051e+02 1.165e+02 1.612e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-18 18:27:33,415 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 4950, loss[loss=0.09687, simple_loss=0.1088, pruned_loss=0.03164, audio_tagging_loss=0.01085, over 15118.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1163, pruned_loss=0.03209, audio_tagging_loss=0.01157, over 3041067.93 frames. ], batch size: 59, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:27:47,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=353693.3333333333, ans=0.125 2023-11-18 18:27:49,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=353693.3333333333, ans=0.0 2023-11-18 18:27:52,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=353693.3333333333, ans=0.125 2023-11-18 18:28:12,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=353826.6666666667, ans=0.125 2023-11-18 18:28:30,014 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5000, loss[loss=0.09367, simple_loss=0.1059, pruned_loss=0.02597, audio_tagging_loss=0.01474, over 15534.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1164, pruned_loss=0.03211, audio_tagging_loss=0.01149, over 3038516.46 frames. ], batch size: 58, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:28:53,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2023-11-18 18:28:58,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.10 vs. limit=12.0 2023-11-18 18:29:02,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=354160.0, ans=0.125 2023-11-18 18:29:02,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=354160.0, ans=15.0 2023-11-18 18:29:13,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=354226.6666666667, ans=0.125 2023-11-18 18:29:19,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=354226.6666666667, ans=0.125 2023-11-18 18:29:19,887 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.888e+01 9.416e+01 1.033e+02 1.125e+02 1.808e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 18:29:26,420 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5050, loss[loss=0.08601, simple_loss=0.09501, pruned_loss=0.0234, audio_tagging_loss=0.0151, over 15501.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1158, pruned_loss=0.03192, audio_tagging_loss=0.01147, over 3045180.24 frames. ], batch size: 59, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:29:32,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=354293.3333333333, ans=0.1 2023-11-18 18:29:35,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=354293.3333333333, ans=0.125 2023-11-18 18:29:41,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=354360.0, ans=0.0 2023-11-18 18:29:54,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=354426.6666666667, ans=0.125 2023-11-18 18:29:58,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=354493.3333333333, ans=0.09899494936611666 2023-11-18 18:29:58,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.99 vs. limit=10.0 2023-11-18 18:30:15,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=354560.0, ans=0.1 2023-11-18 18:30:21,624 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5100, loss[loss=0.1268, simple_loss=0.152, pruned_loss=0.04227, audio_tagging_loss=0.008564, over 15251.00 frames. ], tot_loss[loss=0.1008, simple_loss=0.1151, pruned_loss=0.03184, audio_tagging_loss=0.01142, over 3038698.67 frames. ], batch size: 55, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:30:31,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.45 vs. limit=15.0 2023-11-18 18:30:51,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=354760.0, ans=0.0 2023-11-18 18:31:00,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=354826.6666666667, ans=0.125 2023-11-18 18:31:11,570 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.005e+01 9.700e+01 1.061e+02 1.154e+02 1.523e+02, threshold=2.123e+02, percent-clipped=0.0 2023-11-18 18:31:17,930 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5150, loss[loss=0.09672, simple_loss=0.1235, pruned_loss=0.02554, audio_tagging_loss=0.009454, over 14962.00 frames. ], tot_loss[loss=0.1001, simple_loss=0.1145, pruned_loss=0.03155, audio_tagging_loss=0.0113, over 3036984.04 frames. ], batch size: 56, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:31:24,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2023-11-18 18:31:32,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=355026.6666666667, ans=0.0 2023-11-18 18:32:05,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=355226.6666666667, ans=10.0 2023-11-18 18:32:13,657 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5200, loss[loss=0.09776, simple_loss=0.114, pruned_loss=0.03051, audio_tagging_loss=0.01025, over 14833.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.1158, pruned_loss=0.03192, audio_tagging_loss=0.01127, over 3034620.50 frames. ], batch size: 57, lr: 1.36e-02, grad_scale: 32.0 2023-11-18 18:32:22,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=355293.3333333333, ans=0.025 2023-11-18 18:32:23,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=355360.0, ans=0.2 2023-11-18 18:32:33,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.22 vs. limit=22.5 2023-11-18 18:32:36,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=355426.6666666667, ans=0.125 2023-11-18 18:32:45,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=355426.6666666667, ans=0.09899494936611666 2023-11-18 18:32:47,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.65 vs. limit=15.0 2023-11-18 18:32:58,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=355560.0, ans=0.2 2023-11-18 18:33:03,979 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 9.196e+01 1.027e+02 1.112e+02 1.442e+02, threshold=2.053e+02, percent-clipped=0.0 2023-11-18 18:33:09,311 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5250, loss[loss=0.1176, simple_loss=0.1411, pruned_loss=0.03963, audio_tagging_loss=0.007411, over 15252.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1169, pruned_loss=0.03232, audio_tagging_loss=0.01121, over 3047539.02 frames. ], batch size: 54, lr: 1.36e-02, grad_scale: 32.0 2023-11-18 18:33:17,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=355626.6666666667, ans=0.125 2023-11-18 18:33:30,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=355760.0, ans=0.125 2023-11-18 18:33:44,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=355826.6666666667, ans=0.1 2023-11-18 18:33:46,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=355826.6666666667, ans=0.2 2023-11-18 18:34:04,634 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5300, loss[loss=0.1075, simple_loss=0.13, pruned_loss=0.03317, audio_tagging_loss=0.00937, over 15472.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1172, pruned_loss=0.03237, audio_tagging_loss=0.01129, over 3042004.71 frames. ], batch size: 59, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:34:18,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.00 vs. limit=22.5 2023-11-18 18:34:36,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=356093.3333333333, ans=0.2 2023-11-18 18:34:36,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=356093.3333333333, ans=0.125 2023-11-18 18:34:37,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=356160.0, ans=0.0 2023-11-18 18:34:55,614 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 8.052e+01 9.440e+01 1.053e+02 1.166e+02 1.714e+02, threshold=2.106e+02, percent-clipped=0.0 2023-11-18 18:34:56,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=15.0 2023-11-18 18:34:57,366 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=356226.6666666667, ans=0.125 2023-11-18 18:34:57,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.89 vs. limit=15.0 2023-11-18 18:35:00,877 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5350, loss[loss=0.1096, simple_loss=0.1364, pruned_loss=0.0319, audio_tagging_loss=0.009458, over 14824.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1171, pruned_loss=0.03227, audio_tagging_loss=0.01127, over 3042880.52 frames. ], batch size: 54, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:35:18,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=356360.0, ans=0.125 2023-11-18 18:35:55,907 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5400, loss[loss=0.1092, simple_loss=0.1115, pruned_loss=0.04029, audio_tagging_loss=0.01316, over 15953.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1167, pruned_loss=0.03215, audio_tagging_loss=0.01135, over 3047862.86 frames. ], batch size: 60, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:36:23,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=356760.0, ans=0.0 2023-11-18 18:36:28,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=12.0 2023-11-18 18:36:46,630 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.262e+01 8.854e+01 9.931e+01 1.101e+02 1.556e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-18 18:36:51,439 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5450, loss[loss=0.09208, simple_loss=0.1047, pruned_loss=0.02879, audio_tagging_loss=0.01093, over 15825.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1178, pruned_loss=0.03248, audio_tagging_loss=0.01129, over 3043744.63 frames. ], batch size: 61, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:37:04,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=357026.6666666667, ans=0.125 2023-11-18 18:37:13,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=357093.3333333333, ans=0.125 2023-11-18 18:37:25,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=357160.0, ans=0.125 2023-11-18 18:37:42,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=357226.6666666667, ans=0.125 2023-11-18 18:37:46,494 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5500, loss[loss=0.1096, simple_loss=0.1297, pruned_loss=0.03499, audio_tagging_loss=0.009727, over 16571.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1179, pruned_loss=0.03248, audio_tagging_loss=0.01133, over 3042641.42 frames. ], batch size: 61, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:38:05,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=357360.0, ans=0.125 2023-11-18 18:38:08,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=357426.6666666667, ans=0.2 2023-11-18 18:38:09,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.58 vs. limit=22.5 2023-11-18 18:38:22,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2023-11-18 18:38:24,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2023-11-18 18:38:33,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=357560.0, ans=0.0 2023-11-18 18:38:36,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=357560.0, ans=0.1 2023-11-18 18:38:38,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.010e+01 9.009e+01 9.946e+01 1.101e+02 1.690e+02, threshold=1.989e+02, percent-clipped=0.0 2023-11-18 18:38:41,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.70 vs. limit=22.5 2023-11-18 18:38:42,428 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5550, loss[loss=0.0954, simple_loss=0.1117, pruned_loss=0.02668, audio_tagging_loss=0.01286, over 13776.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1174, pruned_loss=0.0323, audio_tagging_loss=0.01147, over 3039128.40 frames. ], batch size: 53, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:39:32,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=357893.3333333333, ans=0.0 2023-11-18 18:39:34,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=357893.3333333333, ans=0.0 2023-11-18 18:39:37,592 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5600, loss[loss=0.1247, simple_loss=0.1462, pruned_loss=0.04257, audio_tagging_loss=0.00899, over 15682.00 frames. ], tot_loss[loss=0.103, simple_loss=0.1184, pruned_loss=0.03235, audio_tagging_loss=0.01148, over 3044729.27 frames. ], batch size: 57, lr: 1.36e-02, grad_scale: 32.0 2023-11-18 18:39:43,598 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:39:57,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=358026.6666666667, ans=0.125 2023-11-18 18:40:01,565 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:40:09,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=358093.3333333333, ans=0.125 2023-11-18 18:40:15,563 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:40:29,638 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 9.084e+01 1.022e+02 1.205e+02 1.640e+02, threshold=2.044e+02, percent-clipped=0.0 2023-11-18 18:40:32,849 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5650, loss[loss=0.06895, simple_loss=0.0678, pruned_loss=0.02106, audio_tagging_loss=0.014, over 14882.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1178, pruned_loss=0.03214, audio_tagging_loss=0.01163, over 3050197.47 frames. ], batch size: 59, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:40:43,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=358360.0, ans=0.0 2023-11-18 18:40:54,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=358426.6666666667, ans=0.02 2023-11-18 18:40:56,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=358426.6666666667, ans=0.125 2023-11-18 18:40:57,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.82 vs. limit=22.5 2023-11-18 18:41:23,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=358560.0, ans=0.0 2023-11-18 18:41:29,433 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5700, loss[loss=0.1085, simple_loss=0.1305, pruned_loss=0.03279, audio_tagging_loss=0.01041, over 16290.00 frames. ], tot_loss[loss=0.103, simple_loss=0.1183, pruned_loss=0.03236, audio_tagging_loss=0.01149, over 3053557.24 frames. ], batch size: 57, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:42:05,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=358826.6666666667, ans=0.0 2023-11-18 18:42:21,625 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.993e+01 9.865e+01 1.099e+02 1.758e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 18:42:24,822 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5750, loss[loss=0.1019, simple_loss=0.1233, pruned_loss=0.02886, audio_tagging_loss=0.0114, over 15752.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1166, pruned_loss=0.03194, audio_tagging_loss=0.01151, over 3055261.55 frames. ], batch size: 61, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:42:37,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=359026.6666666667, ans=0.125 2023-11-18 18:42:38,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=359026.6666666667, ans=0.2 2023-11-18 18:42:57,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=359160.0, ans=10.0 2023-11-18 18:43:08,457 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:43:13,545 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2023-11-18 18:43:20,445 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5800, loss[loss=0.1024, simple_loss=0.1226, pruned_loss=0.03151, audio_tagging_loss=0.009567, over 15538.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1169, pruned_loss=0.03207, audio_tagging_loss=0.0112, over 3055291.32 frames. ], batch size: 57, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:43:20,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359293.3333333333, ans=0.1 2023-11-18 18:43:21,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=359293.3333333333, ans=0.04949747468305833 2023-11-18 18:43:25,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=359293.3333333333, ans=0.95 2023-11-18 18:43:29,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=359293.3333333333, ans=0.0 2023-11-18 18:43:29,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=359293.3333333333, ans=0.2 2023-11-18 18:43:33,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=359360.0, ans=0.125 2023-11-18 18:43:43,907 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2023-11-18 18:43:45,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=359426.6666666667, ans=0.0 2023-11-18 18:44:12,847 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.377e+01 8.910e+01 9.863e+01 1.080e+02 1.378e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 18:44:16,603 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5850, loss[loss=0.0848, simple_loss=0.08679, pruned_loss=0.0252, audio_tagging_loss=0.0162, over 15670.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1166, pruned_loss=0.03201, audio_tagging_loss=0.01122, over 3050135.80 frames. ], batch size: 58, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:44:20,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=359626.6666666667, ans=0.0 2023-11-18 18:44:36,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=359693.3333333333, ans=0.0 2023-11-18 18:44:54,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=359826.6666666667, ans=0.0 2023-11-18 18:44:57,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=359826.6666666667, ans=0.125 2023-11-18 18:45:06,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=359893.3333333333, ans=0.0 2023-11-18 18:45:12,057 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5900, loss[loss=0.08554, simple_loss=0.09785, pruned_loss=0.0253, audio_tagging_loss=0.01131, over 14350.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1164, pruned_loss=0.0321, audio_tagging_loss=0.01119, over 3049544.96 frames. ], batch size: 55, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:45:16,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=359960.0, ans=0.125 2023-11-18 18:45:29,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=360026.6666666667, ans=10.0 2023-11-18 18:45:34,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=360093.3333333333, ans=0.125 2023-11-18 18:46:00,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=360226.6666666667, ans=0.2 2023-11-18 18:46:03,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=360226.6666666667, ans=0.0 2023-11-18 18:46:04,526 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 9.214e+01 1.011e+02 1.146e+02 1.411e+02, threshold=2.022e+02, percent-clipped=0.0 2023-11-18 18:46:07,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.20 vs. limit=22.5 2023-11-18 18:46:07,759 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 5950, loss[loss=0.07702, simple_loss=0.08222, pruned_loss=0.02001, audio_tagging_loss=0.0159, over 15983.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1162, pruned_loss=0.0321, audio_tagging_loss=0.01123, over 3047958.05 frames. ], batch size: 61, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:46:18,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=360360.0, ans=0.125 2023-11-18 18:46:23,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=360360.0, ans=0.1 2023-11-18 18:46:41,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=360493.3333333333, ans=0.0 2023-11-18 18:46:48,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=360493.3333333333, ans=0.0 2023-11-18 18:46:52,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.50 vs. limit=10.0 2023-11-18 18:47:03,806 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6000, loss[loss=0.1014, simple_loss=0.1199, pruned_loss=0.02542, audio_tagging_loss=0.01598, over 15440.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.1158, pruned_loss=0.03186, audio_tagging_loss=0.01136, over 3051610.53 frames. ], batch size: 58, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:47:03,807 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 18:47:36,966 INFO [train_asr.py:1147] (1/4) Epoch 5, validation: loss=0.0732, simple_loss=0.06039, pruned_loss=0.009139, audio_tagging_loss=0.03386, over 4681554.00 frames. 2023-11-18 18:47:36,968 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 18:47:38,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.51 vs. limit=22.5 2023-11-18 18:47:48,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=360693.3333333333, ans=0.0 2023-11-18 18:48:13,923 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:48:15,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.65 vs. limit=22.5 2023-11-18 18:48:19,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=360826.6666666667, ans=0.125 2023-11-18 18:48:28,700 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 9.155e+01 9.916e+01 1.075e+02 1.410e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-18 18:48:29,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=360893.3333333333, ans=0.125 2023-11-18 18:48:31,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=22.5 2023-11-18 18:48:31,969 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6050, loss[loss=0.09759, simple_loss=0.1224, pruned_loss=0.02801, audio_tagging_loss=0.008383, over 14374.00 frames. ], tot_loss[loss=0.1008, simple_loss=0.1155, pruned_loss=0.03174, audio_tagging_loss=0.01133, over 3053536.70 frames. ], batch size: 52, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:48:52,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=361026.6666666667, ans=0.025 2023-11-18 18:48:56,639 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:49:16,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=361226.6666666667, ans=0.0 2023-11-18 18:49:23,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=361226.6666666667, ans=0.125 2023-11-18 18:49:28,172 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6100, loss[loss=0.1064, simple_loss=0.129, pruned_loss=0.03407, audio_tagging_loss=0.007793, over 14959.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1158, pruned_loss=0.03177, audio_tagging_loss=0.01124, over 3053299.79 frames. ], batch size: 58, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:49:36,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=361293.3333333333, ans=0.2 2023-11-18 18:49:56,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=361426.6666666667, ans=0.0 2023-11-18 18:49:57,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=361426.6666666667, ans=0.0 2023-11-18 18:50:05,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=361493.3333333333, ans=0.1 2023-11-18 18:50:05,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=361493.3333333333, ans=0.1 2023-11-18 18:50:11,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.91 vs. limit=22.5 2023-11-18 18:50:21,524 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 9.202e+01 1.052e+02 1.142e+02 1.737e+02, threshold=2.103e+02, percent-clipped=0.0 2023-11-18 18:50:23,665 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6150, loss[loss=0.07397, simple_loss=0.06887, pruned_loss=0.02445, audio_tagging_loss=0.01508, over 13566.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1161, pruned_loss=0.03201, audio_tagging_loss=0.01129, over 3055158.59 frames. ], batch size: 54, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:50:23,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=361626.6666666667, ans=0.1 2023-11-18 18:51:00,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=361826.6666666667, ans=0.04949747468305833 2023-11-18 18:51:08,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=361893.3333333333, ans=0.0 2023-11-18 18:51:20,188 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6200, loss[loss=0.06905, simple_loss=0.07214, pruned_loss=0.01785, audio_tagging_loss=0.01513, over 16038.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1164, pruned_loss=0.03219, audio_tagging_loss=0.01146, over 3056629.83 frames. ], batch size: 63, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:51:21,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-11-18 18:51:23,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=361960.0, ans=0.1 2023-11-18 18:51:26,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=361960.0, ans=0.2 2023-11-18 18:51:27,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=361960.0, ans=0.125 2023-11-18 18:51:37,762 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2023-11-18 18:51:41,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=362093.3333333333, ans=0.1 2023-11-18 18:51:49,374 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2023-11-18 18:51:54,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.91 vs. limit=10.0 2023-11-18 18:52:14,242 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 9.346e+01 1.036e+02 1.107e+02 1.533e+02, threshold=2.072e+02, percent-clipped=0.0 2023-11-18 18:52:16,401 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6250, loss[loss=0.1174, simple_loss=0.1342, pruned_loss=0.04133, audio_tagging_loss=0.009032, over 14482.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1162, pruned_loss=0.03206, audio_tagging_loss=0.01166, over 3053390.49 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:52:25,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=362293.3333333333, ans=0.125 2023-11-18 18:52:36,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=362360.0, ans=0.0 2023-11-18 18:52:37,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=362426.6666666667, ans=0.125 2023-11-18 18:52:56,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.26 vs. limit=22.5 2023-11-18 18:53:07,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=362560.0, ans=0.1 2023-11-18 18:53:11,490 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6300, loss[loss=0.09463, simple_loss=0.1092, pruned_loss=0.02971, audio_tagging_loss=0.01033, over 14891.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1173, pruned_loss=0.03247, audio_tagging_loss=0.01159, over 3054855.27 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:53:12,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=362626.6666666667, ans=0.125 2023-11-18 18:53:17,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=362626.6666666667, ans=0.0 2023-11-18 18:53:18,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=362626.6666666667, ans=0.1 2023-11-18 18:53:21,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=362626.6666666667, ans=0.125 2023-11-18 18:53:31,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-11-18 18:53:36,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.94 vs. limit=15.0 2023-11-18 18:53:40,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=362760.0, ans=0.0 2023-11-18 18:54:04,969 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 9.079e+01 9.861e+01 1.090e+02 1.541e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-18 18:54:07,085 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6350, loss[loss=0.07941, simple_loss=0.0774, pruned_loss=0.02526, audio_tagging_loss=0.01545, over 14008.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.117, pruned_loss=0.0324, audio_tagging_loss=0.01162, over 3053383.12 frames. ], batch size: 55, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:54:13,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=362960.0, ans=0.125 2023-11-18 18:54:21,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=363026.6666666667, ans=0.125 2023-11-18 18:54:31,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=363093.3333333333, ans=0.2 2023-11-18 18:54:31,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=15.0 2023-11-18 18:54:40,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.43 vs. limit=15.0 2023-11-18 18:54:48,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=363160.0, ans=0.2 2023-11-18 18:55:01,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=363226.6666666667, ans=0.125 2023-11-18 18:55:03,939 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6400, loss[loss=0.105, simple_loss=0.129, pruned_loss=0.03104, audio_tagging_loss=0.009475, over 15115.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1183, pruned_loss=0.03291, audio_tagging_loss=0.01164, over 3048405.03 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:55:30,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=363426.6666666667, ans=0.1 2023-11-18 18:55:33,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=363426.6666666667, ans=0.0 2023-11-18 18:55:42,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363493.3333333333, ans=0.1 2023-11-18 18:55:47,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.61 vs. limit=15.0 2023-11-18 18:55:56,641 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.151e+01 9.321e+01 1.035e+02 1.143e+02 1.548e+02, threshold=2.069e+02, percent-clipped=0.0 2023-11-18 18:55:58,772 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6450, loss[loss=0.08023, simple_loss=0.09932, pruned_loss=0.01982, audio_tagging_loss=0.01074, over 16801.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1173, pruned_loss=0.03256, audio_tagging_loss=0.01163, over 3044367.63 frames. ], batch size: 62, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:56:03,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.46 vs. limit=6.0 2023-11-18 18:56:10,122 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:56:18,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=12.0 2023-11-18 18:56:54,222 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6500, loss[loss=0.1217, simple_loss=0.1409, pruned_loss=0.04079, audio_tagging_loss=0.0105, over 14654.00 frames. ], tot_loss[loss=0.1034, simple_loss=0.1185, pruned_loss=0.03271, audio_tagging_loss=0.01146, over 3050880.88 frames. ], batch size: 54, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:57:02,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=363960.0, ans=0.125 2023-11-18 18:57:16,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364093.3333333333, ans=0.1 2023-11-18 18:57:21,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=364093.3333333333, ans=0.0 2023-11-18 18:57:37,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=364160.0, ans=0.0 2023-11-18 18:57:40,137 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.715e-03 2023-11-18 18:57:48,371 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 9.392e+01 1.004e+02 1.100e+02 1.543e+02, threshold=2.007e+02, percent-clipped=0.0 2023-11-18 18:57:50,510 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6550, loss[loss=0.09741, simple_loss=0.1168, pruned_loss=0.03047, audio_tagging_loss=0.008543, over 14410.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1177, pruned_loss=0.0326, audio_tagging_loss=0.01137, over 3049364.04 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:57:53,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=364293.3333333333, ans=0.125 2023-11-18 18:57:55,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=364293.3333333333, ans=0.1 2023-11-18 18:57:55,837 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2023-11-18 18:58:23,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=364493.3333333333, ans=0.0 2023-11-18 18:58:33,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=364493.3333333333, ans=0.2 2023-11-18 18:58:41,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=364560.0, ans=0.125 2023-11-18 18:58:46,368 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6600, loss[loss=0.07427, simple_loss=0.08148, pruned_loss=0.01771, audio_tagging_loss=0.01581, over 14674.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1172, pruned_loss=0.03246, audio_tagging_loss=0.01127, over 3047134.26 frames. ], batch size: 54, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:58:51,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=364626.6666666667, ans=0.125 2023-11-18 18:59:01,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=364693.3333333333, ans=0.2 2023-11-18 18:59:03,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=364693.3333333333, ans=0.125 2023-11-18 18:59:05,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=364693.3333333333, ans=0.04949747468305833 2023-11-18 18:59:14,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=364760.0, ans=0.125 2023-11-18 18:59:39,646 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.947e+01 1.006e+02 1.140e+02 1.601e+02, threshold=2.013e+02, percent-clipped=0.0 2023-11-18 18:59:39,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=364893.3333333333, ans=0.1 2023-11-18 18:59:41,801 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6650, loss[loss=0.09487, simple_loss=0.1057, pruned_loss=0.03058, audio_tagging_loss=0.01143, over 16052.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1178, pruned_loss=0.0325, audio_tagging_loss=0.01127, over 3044062.47 frames. ], batch size: 60, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:59:56,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=365026.6666666667, ans=0.125 2023-11-18 18:59:59,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=365026.6666666667, ans=0.5 2023-11-18 19:00:23,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2023-11-18 19:00:34,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=365226.6666666667, ans=0.2 2023-11-18 19:00:37,640 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6700, loss[loss=0.1243, simple_loss=0.1464, pruned_loss=0.04283, audio_tagging_loss=0.008241, over 15087.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.119, pruned_loss=0.03302, audio_tagging_loss=0.01123, over 3048590.18 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:00:46,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=365293.3333333333, ans=0.09899494936611666 2023-11-18 19:01:14,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2023-11-18 19:01:24,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=365560.0, ans=0.2 2023-11-18 19:01:32,112 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 9.414e+01 1.042e+02 1.183e+02 1.878e+02, threshold=2.084e+02, percent-clipped=0.0 2023-11-18 19:01:32,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=365560.0, ans=0.0 2023-11-18 19:01:34,262 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6750, loss[loss=0.1178, simple_loss=0.1256, pruned_loss=0.04125, audio_tagging_loss=0.01372, over 15655.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.1187, pruned_loss=0.03297, audio_tagging_loss=0.01121, over 3045154.93 frames. ], batch size: 58, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:01:41,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=365626.6666666667, ans=0.125 2023-11-18 19:02:12,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=365826.6666666667, ans=0.125 2023-11-18 19:02:14,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=365826.6666666667, ans=0.125 2023-11-18 19:02:16,860 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.544e-03 2023-11-18 19:02:25,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=365893.3333333333, ans=0.07 2023-11-18 19:02:28,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=365960.0, ans=0.125 2023-11-18 19:02:29,905 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6800, loss[loss=0.1045, simple_loss=0.1221, pruned_loss=0.03019, audio_tagging_loss=0.01323, over 14983.00 frames. ], tot_loss[loss=0.1044, simple_loss=0.12, pruned_loss=0.03326, audio_tagging_loss=0.01115, over 3044676.51 frames. ], batch size: 55, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:02:44,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-18 19:03:00,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.65 vs. limit=15.0 2023-11-18 19:03:04,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=366160.0, ans=0.2 2023-11-18 19:03:22,850 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.984e+01 9.907e+01 1.134e+02 1.555e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-18 19:03:24,932 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6850, loss[loss=0.1042, simple_loss=0.1231, pruned_loss=0.03143, audio_tagging_loss=0.01125, over 15318.00 frames. ], tot_loss[loss=0.1036, simple_loss=0.1194, pruned_loss=0.03271, audio_tagging_loss=0.01118, over 3041413.10 frames. ], batch size: 55, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:03:29,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=366293.3333333333, ans=0.0 2023-11-18 19:04:21,393 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6900, loss[loss=0.09988, simple_loss=0.1216, pruned_loss=0.02904, audio_tagging_loss=0.01005, over 16276.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1193, pruned_loss=0.03247, audio_tagging_loss=0.01107, over 3043182.85 frames. ], batch size: 60, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:04:37,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=366693.3333333333, ans=0.125 2023-11-18 19:04:37,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=366693.3333333333, ans=0.0 2023-11-18 19:04:53,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=12.0 2023-11-18 19:04:55,408 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2023-11-18 19:05:02,947 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:05:14,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=366893.3333333333, ans=0.125 2023-11-18 19:05:15,600 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 9.152e+01 1.012e+02 1.130e+02 1.420e+02, threshold=2.024e+02, percent-clipped=0.0 2023-11-18 19:05:17,769 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 6950, loss[loss=0.0953, simple_loss=0.117, pruned_loss=0.02603, audio_tagging_loss=0.01078, over 16248.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1193, pruned_loss=0.03241, audio_tagging_loss=0.01103, over 3044238.99 frames. ], batch size: 58, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:05:28,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.11 vs. limit=10.0 2023-11-18 19:05:31,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=367026.6666666667, ans=0.05 2023-11-18 19:05:42,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=367093.3333333333, ans=0.95 2023-11-18 19:06:12,702 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7000, loss[loss=0.08757, simple_loss=0.1003, pruned_loss=0.02788, audio_tagging_loss=0.009544, over 16423.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1179, pruned_loss=0.0321, audio_tagging_loss=0.0111, over 3040653.20 frames. ], batch size: 66, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:06:21,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.00 vs. limit=22.5 2023-11-18 19:06:35,286 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=22.5 2023-11-18 19:06:43,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=367426.6666666667, ans=0.0 2023-11-18 19:06:56,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=367560.0, ans=0.0 2023-11-18 19:06:57,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=367560.0, ans=0.05 2023-11-18 19:07:06,685 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 9.236e+01 1.008e+02 1.142e+02 1.683e+02, threshold=2.016e+02, percent-clipped=0.0 2023-11-18 19:07:08,811 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7050, loss[loss=0.1339, simple_loss=0.1536, pruned_loss=0.0485, audio_tagging_loss=0.008596, over 15689.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1176, pruned_loss=0.03216, audio_tagging_loss=0.01127, over 3040106.07 frames. ], batch size: 55, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:07:18,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=367693.3333333333, ans=0.125 2023-11-18 19:07:34,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=367760.0, ans=0.125 2023-11-18 19:07:57,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=367893.3333333333, ans=0.125 2023-11-18 19:08:04,461 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7100, loss[loss=0.105, simple_loss=0.1124, pruned_loss=0.03587, audio_tagging_loss=0.01295, over 15334.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1171, pruned_loss=0.03192, audio_tagging_loss=0.01141, over 3049356.94 frames. ], batch size: 58, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:08:17,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.40 vs. limit=15.0 2023-11-18 19:08:17,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.99 vs. limit=15.0 2023-11-18 19:08:17,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=368026.6666666667, ans=0.125 2023-11-18 19:08:57,713 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 9.190e+01 1.032e+02 1.164e+02 1.806e+02, threshold=2.063e+02, percent-clipped=0.0 2023-11-18 19:08:58,309 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=15.0 2023-11-18 19:08:59,843 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7150, loss[loss=0.1247, simple_loss=0.1354, pruned_loss=0.04441, audio_tagging_loss=0.01253, over 14751.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1174, pruned_loss=0.03216, audio_tagging_loss=0.01146, over 3052220.16 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:09:03,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2023-11-18 19:09:09,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=368293.3333333333, ans=0.0 2023-11-18 19:09:33,113 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.21 vs. limit=15.0 2023-11-18 19:09:39,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.23 vs. limit=10.0 2023-11-18 19:09:48,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2023-11-18 19:09:55,903 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7200, loss[loss=0.109, simple_loss=0.1327, pruned_loss=0.03128, audio_tagging_loss=0.01139, over 15659.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1181, pruned_loss=0.03228, audio_tagging_loss=0.01143, over 3058423.26 frames. ], batch size: 59, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:10:16,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=368693.3333333333, ans=0.125 2023-11-18 19:10:20,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2023-11-18 19:10:25,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=368760.0, ans=0.125 2023-11-18 19:10:49,019 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 9.156e+01 1.033e+02 1.136e+02 1.885e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 19:10:51,157 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7250, loss[loss=0.099, simple_loss=0.1177, pruned_loss=0.02924, audio_tagging_loss=0.0109, over 15084.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1169, pruned_loss=0.03198, audio_tagging_loss=0.01147, over 3058597.18 frames. ], batch size: 57, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:10:56,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=368960.0, ans=0.04949747468305833 2023-11-18 19:10:58,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2023-11-18 19:10:58,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2023-11-18 19:11:06,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=369026.6666666667, ans=0.0 2023-11-18 19:11:07,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=369026.6666666667, ans=0.125 2023-11-18 19:11:14,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=369093.3333333333, ans=0.125 2023-11-18 19:11:16,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=369093.3333333333, ans=0.1 2023-11-18 19:11:22,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=369093.3333333333, ans=0.0 2023-11-18 19:11:34,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=369226.6666666667, ans=0.2 2023-11-18 19:11:36,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=369226.6666666667, ans=0.0 2023-11-18 19:11:46,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=369293.3333333333, ans=0.125 2023-11-18 19:11:47,521 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7300, loss[loss=0.06353, simple_loss=0.06789, pruned_loss=0.01755, audio_tagging_loss=0.01203, over 14685.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1169, pruned_loss=0.03205, audio_tagging_loss=0.01128, over 3050797.18 frames. ], batch size: 59, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:11:58,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.48 vs. limit=10.0 2023-11-18 19:12:04,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=369360.0, ans=0.125 2023-11-18 19:12:07,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.28 vs. limit=6.0 2023-11-18 19:12:23,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=369493.3333333333, ans=15.0 2023-11-18 19:12:29,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=369493.3333333333, ans=0.125 2023-11-18 19:12:41,595 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.861e+01 9.294e+01 1.042e+02 1.203e+02 1.669e+02, threshold=2.084e+02, percent-clipped=0.0 2023-11-18 19:12:44,273 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7350, loss[loss=0.1246, simple_loss=0.1491, pruned_loss=0.04262, audio_tagging_loss=0.007409, over 15682.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1171, pruned_loss=0.03236, audio_tagging_loss=0.01116, over 3042431.99 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:12:44,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=15.0 2023-11-18 19:12:57,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=369693.3333333333, ans=0.125 2023-11-18 19:13:00,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=369693.3333333333, ans=0.125 2023-11-18 19:13:39,073 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7400, loss[loss=0.1004, simple_loss=0.1189, pruned_loss=0.03031, audio_tagging_loss=0.0106, over 15050.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1185, pruned_loss=0.03274, audio_tagging_loss=0.01106, over 3043821.16 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:13:39,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=369960.0, ans=0.1 2023-11-18 19:13:46,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.35 vs. limit=10.0 2023-11-18 19:13:53,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=370026.6666666667, ans=0.0 2023-11-18 19:14:04,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=370093.3333333333, ans=0.125 2023-11-18 19:14:10,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=370093.3333333333, ans=0.0 2023-11-18 19:14:13,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2023-11-18 19:14:14,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=370160.0, ans=0.0 2023-11-18 19:14:32,447 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.641e+01 9.039e+01 9.551e+01 1.074e+02 1.292e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-18 19:14:33,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=370293.3333333333, ans=0.1 2023-11-18 19:14:34,631 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7450, loss[loss=0.09067, simple_loss=0.1053, pruned_loss=0.02686, audio_tagging_loss=0.01118, over 16441.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1185, pruned_loss=0.03272, audio_tagging_loss=0.01097, over 3043621.30 frames. ], batch size: 65, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:14:53,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=370360.0, ans=0.0 2023-11-18 19:14:55,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=370360.0, ans=0.0 2023-11-18 19:15:13,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=370493.3333333333, ans=0.125 2023-11-18 19:15:18,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=370560.0, ans=0.1 2023-11-18 19:15:23,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=370560.0, ans=0.125 2023-11-18 19:15:29,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=370626.6666666667, ans=0.125 2023-11-18 19:15:30,686 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7500, loss[loss=0.1019, simple_loss=0.1142, pruned_loss=0.03107, audio_tagging_loss=0.01373, over 15097.00 frames. ], tot_loss[loss=0.103, simple_loss=0.1183, pruned_loss=0.03279, audio_tagging_loss=0.01109, over 3047273.62 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:15:37,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2023-11-18 19:15:45,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=370693.3333333333, ans=0.125 2023-11-18 19:15:53,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=370760.0, ans=0.125 2023-11-18 19:15:54,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=370760.0, ans=0.125 2023-11-18 19:16:12,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2023-11-18 19:16:24,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 9.004e+01 9.847e+01 1.087e+02 1.456e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-18 19:16:26,722 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7550, loss[loss=0.094, simple_loss=0.115, pruned_loss=0.02716, audio_tagging_loss=0.009318, over 15221.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1185, pruned_loss=0.03282, audio_tagging_loss=0.01103, over 3050379.61 frames. ], batch size: 55, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:17:18,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=371226.6666666667, ans=0.125 2023-11-18 19:17:22,448 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7600, loss[loss=0.1086, simple_loss=0.1346, pruned_loss=0.03219, audio_tagging_loss=0.009135, over 15548.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1179, pruned_loss=0.03257, audio_tagging_loss=0.01111, over 3055692.15 frames. ], batch size: 57, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:17:58,023 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2023-11-18 19:18:15,320 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 9.066e+01 9.750e+01 1.073e+02 2.127e+02, threshold=1.950e+02, percent-clipped=2.0 2023-11-18 19:18:18,594 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7650, loss[loss=0.0906, simple_loss=0.1008, pruned_loss=0.02961, audio_tagging_loss=0.0106, over 14539.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1173, pruned_loss=0.03215, audio_tagging_loss=0.01113, over 3052956.61 frames. ], batch size: 55, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:18:44,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=371760.0, ans=0.2 2023-11-18 19:18:55,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=371826.6666666667, ans=0.125 2023-11-18 19:19:10,603 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=15.0 2023-11-18 19:19:14,370 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7700, loss[loss=0.08244, simple_loss=0.08714, pruned_loss=0.02541, audio_tagging_loss=0.01346, over 14297.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1168, pruned_loss=0.0319, audio_tagging_loss=0.01118, over 3046800.26 frames. ], batch size: 58, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:19:15,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=371960.0, ans=0.125 2023-11-18 19:19:28,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2023-11-18 19:19:50,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=372160.0, ans=0.04949747468305833 2023-11-18 19:19:51,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=372160.0, ans=0.09899494936611666 2023-11-18 19:20:00,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=372226.6666666667, ans=0.1 2023-11-18 19:20:05,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=372226.6666666667, ans=0.0 2023-11-18 19:20:08,181 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 8.781e+01 9.756e+01 1.085e+02 1.598e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-18 19:20:10,363 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7750, loss[loss=0.1132, simple_loss=0.1296, pruned_loss=0.03637, audio_tagging_loss=0.01205, over 15341.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1173, pruned_loss=0.03236, audio_tagging_loss=0.01133, over 3048560.64 frames. ], batch size: 56, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:20:14,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=372293.3333333333, ans=0.125 2023-11-18 19:20:16,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=372293.3333333333, ans=0.0 2023-11-18 19:20:26,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=372360.0, ans=0.5 2023-11-18 19:20:33,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=372426.6666666667, ans=0.0 2023-11-18 19:20:51,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.93 vs. limit=12.0 2023-11-18 19:21:00,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=372560.0, ans=0.125 2023-11-18 19:21:05,710 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7800, loss[loss=0.1039, simple_loss=0.1193, pruned_loss=0.03365, audio_tagging_loss=0.01065, over 14888.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1172, pruned_loss=0.03235, audio_tagging_loss=0.01131, over 3045462.01 frames. ], batch size: 56, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:21:45,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=372826.6666666667, ans=0.125 2023-11-18 19:22:00,496 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.836e+01 9.770e+01 1.067e+02 1.448e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-18 19:22:02,619 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7850, loss[loss=0.07755, simple_loss=0.08005, pruned_loss=0.02211, audio_tagging_loss=0.01541, over 15684.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1171, pruned_loss=0.0324, audio_tagging_loss=0.01147, over 3046349.74 frames. ], batch size: 58, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:22:07,313 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.54 vs. limit=10.0 2023-11-18 19:22:10,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=372960.0, ans=0.125 2023-11-18 19:22:12,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=373026.6666666667, ans=0.1 2023-11-18 19:22:37,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=373160.0, ans=0.125 2023-11-18 19:22:37,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=373160.0, ans=0.2 2023-11-18 19:22:39,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=373160.0, ans=0.125 2023-11-18 19:22:58,343 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7900, loss[loss=0.1154, simple_loss=0.1304, pruned_loss=0.04011, audio_tagging_loss=0.01009, over 15368.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1173, pruned_loss=0.03233, audio_tagging_loss=0.01158, over 3050420.28 frames. ], batch size: 59, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:23:10,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=373360.0, ans=0.95 2023-11-18 19:23:35,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.20 vs. limit=6.0 2023-11-18 19:23:46,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=373560.0, ans=0.0 2023-11-18 19:23:47,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=373560.0, ans=0.1 2023-11-18 19:23:53,283 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 9.204e+01 9.997e+01 1.093e+02 1.252e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-18 19:23:55,399 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 7950, loss[loss=0.07146, simple_loss=0.07597, pruned_loss=0.01994, audio_tagging_loss=0.01353, over 15846.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1167, pruned_loss=0.03214, audio_tagging_loss=0.01173, over 3052678.74 frames. ], batch size: 63, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:23:58,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=373626.6666666667, ans=0.125 2023-11-18 19:24:00,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=373626.6666666667, ans=0.125 2023-11-18 19:24:08,103 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:24:20,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=22.5 2023-11-18 19:24:22,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=373760.0, ans=0.1 2023-11-18 19:24:25,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=373760.0, ans=0.1 2023-11-18 19:24:30,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=373826.6666666667, ans=0.125 2023-11-18 19:24:30,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=373826.6666666667, ans=0.0 2023-11-18 19:24:42,330 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=22.5 2023-11-18 19:24:50,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=373960.0, ans=0.0 2023-11-18 19:24:51,735 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8000, loss[loss=0.09122, simple_loss=0.09981, pruned_loss=0.02721, audio_tagging_loss=0.0141, over 15663.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1168, pruned_loss=0.0322, audio_tagging_loss=0.01174, over 3054299.78 frames. ], batch size: 62, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:25:06,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=374026.6666666667, ans=0.04949747468305833 2023-11-18 19:25:24,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=374160.0, ans=0.2 2023-11-18 19:25:25,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=374160.0, ans=0.2 2023-11-18 19:25:31,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.46 vs. limit=15.0 2023-11-18 19:25:33,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.99 vs. limit=15.0 2023-11-18 19:25:46,712 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.997e+01 9.797e+01 1.056e+02 1.371e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-18 19:25:47,820 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8050, loss[loss=0.09809, simple_loss=0.1042, pruned_loss=0.03234, audio_tagging_loss=0.01365, over 15088.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.1155, pruned_loss=0.03162, audio_tagging_loss=0.01178, over 3057834.61 frames. ], batch size: 57, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:25:48,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.53 vs. limit=10.0 2023-11-18 19:26:03,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=374360.0, ans=0.2 2023-11-18 19:26:06,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=374360.0, ans=0.125 2023-11-18 19:26:08,074 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.69 vs. limit=15.0 2023-11-18 19:26:13,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=374426.6666666667, ans=0.125 2023-11-18 19:26:20,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=374493.3333333333, ans=0.1 2023-11-18 19:26:39,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.92 vs. limit=22.5 2023-11-18 19:26:42,903 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8100, loss[loss=0.1244, simple_loss=0.1425, pruned_loss=0.0423, audio_tagging_loss=0.01081, over 15466.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1161, pruned_loss=0.03162, audio_tagging_loss=0.01163, over 3051527.52 frames. ], batch size: 58, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:26:44,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=374626.6666666667, ans=0.0 2023-11-18 19:26:55,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=374693.3333333333, ans=0.125 2023-11-18 19:27:02,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=374693.3333333333, ans=0.0 2023-11-18 19:27:08,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=374760.0, ans=0.125 2023-11-18 19:27:19,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=374826.6666666667, ans=6.0 2023-11-18 19:27:35,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.09 vs. limit=6.0 2023-11-18 19:27:38,041 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.231e+01 9.480e+01 1.051e+02 1.132e+02 1.844e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-18 19:27:39,094 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8150, loss[loss=0.1026, simple_loss=0.1121, pruned_loss=0.03637, audio_tagging_loss=0.01018, over 14142.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1154, pruned_loss=0.03148, audio_tagging_loss=0.01152, over 3049129.64 frames. ], batch size: 57, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:27:47,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=374960.0, ans=0.125 2023-11-18 19:27:49,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=375026.6666666667, ans=0.125 2023-11-18 19:28:00,519 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:28:13,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=375160.0, ans=0.2 2023-11-18 19:28:23,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=375226.6666666667, ans=0.0 2023-11-18 19:28:30,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=375226.6666666667, ans=0.0 2023-11-18 19:28:31,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=375226.6666666667, ans=0.2 2023-11-18 19:28:34,125 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:28:35,181 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8200, loss[loss=0.1036, simple_loss=0.1169, pruned_loss=0.03438, audio_tagging_loss=0.01081, over 14487.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1158, pruned_loss=0.03147, audio_tagging_loss=0.0114, over 3054000.02 frames. ], batch size: 57, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:28:40,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=375293.3333333333, ans=0.015 2023-11-18 19:28:41,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=375293.3333333333, ans=0.0 2023-11-18 19:28:41,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=375293.3333333333, ans=0.0 2023-11-18 19:28:47,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=375360.0, ans=0.0 2023-11-18 19:29:09,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=375493.3333333333, ans=0.2 2023-11-18 19:29:09,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=375493.3333333333, ans=0.125 2023-11-18 19:29:28,970 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 9.797e+01 1.057e+02 1.238e+02 1.453e+02, threshold=2.115e+02, percent-clipped=0.0 2023-11-18 19:29:29,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=375626.6666666667, ans=0.2 2023-11-18 19:29:30,058 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8250, loss[loss=0.138, simple_loss=0.1507, pruned_loss=0.05471, audio_tagging_loss=0.007924, over 14156.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.1152, pruned_loss=0.03137, audio_tagging_loss=0.01137, over 3045178.46 frames. ], batch size: 54, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:29:35,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375626.6666666667, ans=0.1 2023-11-18 19:29:48,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=375693.3333333333, ans=0.1 2023-11-18 19:29:54,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-11-18 19:30:15,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.37 vs. limit=10.0 2023-11-18 19:30:25,318 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8300, loss[loss=0.07192, simple_loss=0.06944, pruned_loss=0.02355, audio_tagging_loss=0.01365, over 14764.00 frames. ], tot_loss[loss=0.09999, simple_loss=0.1148, pruned_loss=0.03123, audio_tagging_loss=0.01137, over 3047447.59 frames. ], batch size: 58, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:30:40,284 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:31:00,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=376160.0, ans=0.0 2023-11-18 19:31:11,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=376226.6666666667, ans=0.1 2023-11-18 19:31:19,572 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.771e+01 9.264e+01 1.007e+02 1.092e+02 1.530e+02, threshold=2.015e+02, percent-clipped=0.0 2023-11-18 19:31:21,239 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8350, loss[loss=0.1349, simple_loss=0.1586, pruned_loss=0.04633, audio_tagging_loss=0.009274, over 14683.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1156, pruned_loss=0.03164, audio_tagging_loss=0.0112, over 3046581.55 frames. ], batch size: 54, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:31:36,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=376360.0, ans=0.125 2023-11-18 19:31:38,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=376360.0, ans=0.2 2023-11-18 19:31:39,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=376360.0, ans=0.1 2023-11-18 19:31:51,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=376426.6666666667, ans=0.125 2023-11-18 19:31:55,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=376493.3333333333, ans=0.0 2023-11-18 19:32:01,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=376493.3333333333, ans=0.125 2023-11-18 19:32:06,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=376560.0, ans=0.125 2023-11-18 19:32:16,814 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8400, loss[loss=0.07713, simple_loss=0.09106, pruned_loss=0.02064, audio_tagging_loss=0.01095, over 13619.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1154, pruned_loss=0.03158, audio_tagging_loss=0.01124, over 3051033.85 frames. ], batch size: 55, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:32:21,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2023-11-18 19:32:22,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=376626.6666666667, ans=0.1 2023-11-18 19:32:31,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=376693.3333333333, ans=0.1 2023-11-18 19:32:44,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=376760.0, ans=0.125 2023-11-18 19:32:56,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=376826.6666666667, ans=0.5 2023-11-18 19:33:00,502 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=376893.3333333333, ans=0.0 2023-11-18 19:33:06,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=376893.3333333333, ans=10.0 2023-11-18 19:33:11,381 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.828e+01 9.983e+01 1.109e+02 1.398e+02, threshold=1.997e+02, percent-clipped=0.0 2023-11-18 19:33:13,012 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8450, loss[loss=0.08648, simple_loss=0.09226, pruned_loss=0.02782, audio_tagging_loss=0.01253, over 15517.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1159, pruned_loss=0.03174, audio_tagging_loss=0.01126, over 3055921.96 frames. ], batch size: 58, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:33:19,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=376960.0, ans=0.0 2023-11-18 19:33:26,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=377026.6666666667, ans=0.125 2023-11-18 19:33:38,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=377093.3333333333, ans=0.0 2023-11-18 19:33:41,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=377093.3333333333, ans=0.125 2023-11-18 19:34:07,980 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8500, loss[loss=0.09881, simple_loss=0.1091, pruned_loss=0.03153, audio_tagging_loss=0.01274, over 14655.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1164, pruned_loss=0.03194, audio_tagging_loss=0.01126, over 3061473.82 frames. ], batch size: 56, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:34:26,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=377360.0, ans=0.125 2023-11-18 19:34:27,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=377360.0, ans=0.125 2023-11-18 19:34:37,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=377426.6666666667, ans=0.0 2023-11-18 19:34:43,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=377493.3333333333, ans=0.1 2023-11-18 19:34:49,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=377493.3333333333, ans=0.125 2023-11-18 19:34:54,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=377560.0, ans=0.125 2023-11-18 19:34:54,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=377560.0, ans=0.2 2023-11-18 19:35:03,286 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.332e+01 8.638e+01 9.723e+01 1.079e+02 1.527e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-18 19:35:04,385 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8550, loss[loss=0.09577, simple_loss=0.111, pruned_loss=0.03028, audio_tagging_loss=0.009991, over 15069.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1166, pruned_loss=0.03184, audio_tagging_loss=0.01134, over 3062873.33 frames. ], batch size: 56, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:35:17,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=377693.3333333333, ans=0.125 2023-11-18 19:35:17,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=377693.3333333333, ans=0.07 2023-11-18 19:35:25,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=377760.0, ans=0.125 2023-11-18 19:35:30,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=377760.0, ans=0.125 2023-11-18 19:35:36,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2023-11-18 19:35:42,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=377826.6666666667, ans=0.0 2023-11-18 19:35:54,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=377893.3333333333, ans=0.125 2023-11-18 19:35:59,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=377960.0, ans=0.125 2023-11-18 19:36:00,029 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8600, loss[loss=0.09964, simple_loss=0.1213, pruned_loss=0.02812, audio_tagging_loss=0.01089, over 14730.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1168, pruned_loss=0.03204, audio_tagging_loss=0.01143, over 3050314.13 frames. ], batch size: 57, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:36:00,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=377960.0, ans=0.0 2023-11-18 19:36:08,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=377960.0, ans=0.0 2023-11-18 19:36:26,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=378093.3333333333, ans=0.0 2023-11-18 19:36:30,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=378093.3333333333, ans=0.0 2023-11-18 19:36:54,546 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.915e+01 9.697e+01 1.106e+02 1.523e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-18 19:36:55,630 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8650, loss[loss=0.09536, simple_loss=0.1092, pruned_loss=0.03005, audio_tagging_loss=0.01071, over 15754.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1163, pruned_loss=0.03173, audio_tagging_loss=0.01153, over 3047032.26 frames. ], batch size: 61, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:37:14,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=378360.0, ans=0.0 2023-11-18 19:37:35,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=378493.3333333333, ans=0.125 2023-11-18 19:37:51,203 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8700, loss[loss=0.09325, simple_loss=0.1055, pruned_loss=0.02933, audio_tagging_loss=0.01119, over 15961.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1171, pruned_loss=0.03227, audio_tagging_loss=0.01158, over 3054164.17 frames. ], batch size: 59, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:37:57,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=378626.6666666667, ans=0.0 2023-11-18 19:38:07,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.46 vs. limit=15.0 2023-11-18 19:38:25,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.06 vs. limit=15.0 2023-11-18 19:38:35,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=378893.3333333333, ans=0.1 2023-11-18 19:38:46,599 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.813e+01 9.292e+01 1.037e+02 1.144e+02 1.707e+02, threshold=2.074e+02, percent-clipped=0.0 2023-11-18 19:38:47,730 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8750, loss[loss=0.114, simple_loss=0.134, pruned_loss=0.03703, audio_tagging_loss=0.00994, over 14149.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1189, pruned_loss=0.0328, audio_tagging_loss=0.01145, over 3051286.19 frames. ], batch size: 53, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:39:05,693 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2023-11-18 19:39:24,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2023-11-18 19:39:28,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=379160.0, ans=0.04949747468305833 2023-11-18 19:39:37,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=379226.6666666667, ans=0.0 2023-11-18 19:39:43,192 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8800, loss[loss=0.0822, simple_loss=0.0998, pruned_loss=0.02217, audio_tagging_loss=0.01013, over 13992.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.1188, pruned_loss=0.03266, audio_tagging_loss=0.01149, over 3056702.90 frames. ], batch size: 53, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:39:52,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=379293.3333333333, ans=0.2 2023-11-18 19:40:08,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=379426.6666666667, ans=0.1 2023-11-18 19:40:13,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=379426.6666666667, ans=0.125 2023-11-18 19:40:28,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=379560.0, ans=0.125 2023-11-18 19:40:37,500 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 9.218e+01 1.050e+02 1.133e+02 1.971e+02, threshold=2.101e+02, percent-clipped=0.0 2023-11-18 19:40:38,566 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8850, loss[loss=0.1207, simple_loss=0.1477, pruned_loss=0.03838, audio_tagging_loss=0.0085, over 15492.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1177, pruned_loss=0.03203, audio_tagging_loss=0.01156, over 3054830.36 frames. ], batch size: 56, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:40:45,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=379626.6666666667, ans=10.0 2023-11-18 19:40:47,040 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:41:20,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=379826.6666666667, ans=0.125 2023-11-18 19:41:20,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=379826.6666666667, ans=0.2 2023-11-18 19:41:33,622 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8900, loss[loss=0.09424, simple_loss=0.1133, pruned_loss=0.02665, audio_tagging_loss=0.01092, over 15595.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1179, pruned_loss=0.03188, audio_tagging_loss=0.01133, over 3055154.51 frames. ], batch size: 59, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:41:45,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=22.5 2023-11-18 19:41:52,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=380026.6666666667, ans=0.0 2023-11-18 19:42:13,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=380160.0, ans=0.0 2023-11-18 19:42:15,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=380160.0, ans=0.125 2023-11-18 19:42:17,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=380226.6666666667, ans=0.1 2023-11-18 19:42:28,691 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 9.192e+01 1.013e+02 1.118e+02 1.605e+02, threshold=2.026e+02, percent-clipped=0.0 2023-11-18 19:42:29,777 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 8950, loss[loss=0.091, simple_loss=0.09838, pruned_loss=0.03089, audio_tagging_loss=0.01092, over 13915.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1185, pruned_loss=0.03214, audio_tagging_loss=0.01117, over 3045508.72 frames. ], batch size: 54, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:42:42,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.66 vs. limit=22.5 2023-11-18 19:42:47,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2023-11-18 19:43:10,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=380493.3333333333, ans=0.2 2023-11-18 19:43:20,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=380560.0, ans=0.1 2023-11-18 19:43:25,559 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9000, loss[loss=0.08097, simple_loss=0.08384, pruned_loss=0.02523, audio_tagging_loss=0.01382, over 13407.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1186, pruned_loss=0.03227, audio_tagging_loss=0.01111, over 3046072.46 frames. ], batch size: 53, lr: 1.32e-02, grad_scale: 16.0 2023-11-18 19:43:25,560 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 19:43:39,172 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9617, 3.5372, 3.9427, 3.7470], device='cuda:1') 2023-11-18 19:43:58,395 INFO [train_asr.py:1147] (1/4) Epoch 5, validation: loss=0.07332, simple_loss=0.06001, pruned_loss=0.008857, audio_tagging_loss=0.03446, over 4681554.00 frames. 2023-11-18 19:43:58,395 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 19:44:16,597 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:44:35,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=380826.6666666667, ans=0.125 2023-11-18 19:44:42,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=380893.3333333333, ans=0.1 2023-11-18 19:44:50,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=380893.3333333333, ans=0.125 2023-11-18 19:44:54,106 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 9.240e+01 1.024e+02 1.108e+02 1.437e+02, threshold=2.047e+02, percent-clipped=0.0 2023-11-18 19:44:54,132 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9050, loss[loss=0.09154, simple_loss=0.1129, pruned_loss=0.0243, audio_tagging_loss=0.01078, over 16288.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1188, pruned_loss=0.03227, audio_tagging_loss=0.01116, over 3051823.31 frames. ], batch size: 59, lr: 1.32e-02, grad_scale: 16.0 2023-11-18 19:44:54,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=380960.0, ans=0.0 2023-11-18 19:45:06,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=381026.6666666667, ans=0.1 2023-11-18 19:45:42,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=381226.6666666667, ans=0.07 2023-11-18 19:45:49,529 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9100, loss[loss=0.08945, simple_loss=0.1034, pruned_loss=0.02479, audio_tagging_loss=0.01296, over 15772.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1191, pruned_loss=0.03239, audio_tagging_loss=0.01113, over 3052132.85 frames. ], batch size: 57, lr: 1.32e-02, grad_scale: 16.0 2023-11-18 19:45:51,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=381293.3333333333, ans=0.09899494936611666 2023-11-18 19:45:56,340 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:45:56,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=381293.3333333333, ans=0.0 2023-11-18 19:46:02,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=381360.0, ans=0.1 2023-11-18 19:46:09,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=381360.0, ans=0.0 2023-11-18 19:46:11,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=381426.6666666667, ans=0.125 2023-11-18 19:46:34,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.58 vs. limit=15.0 2023-11-18 19:46:38,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=381560.0, ans=0.125 2023-11-18 19:46:45,535 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.962e+01 1.000e+02 1.098e+02 1.318e+02, threshold=2.000e+02, percent-clipped=0.0 2023-11-18 19:46:45,562 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9150, loss[loss=0.114, simple_loss=0.1406, pruned_loss=0.03403, audio_tagging_loss=0.009656, over 15501.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1194, pruned_loss=0.03248, audio_tagging_loss=0.01114, over 3054102.35 frames. ], batch size: 59, lr: 1.32e-02, grad_scale: 16.0 2023-11-18 19:47:30,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2023-11-18 19:47:42,468 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9200, loss[loss=0.09725, simple_loss=0.1184, pruned_loss=0.02845, audio_tagging_loss=0.009592, over 15062.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1181, pruned_loss=0.03222, audio_tagging_loss=0.01115, over 3052317.98 frames. ], batch size: 54, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:47:51,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=381960.0, ans=0.07 2023-11-18 19:47:59,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=382026.6666666667, ans=0.0 2023-11-18 19:48:34,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=382226.6666666667, ans=0.07 2023-11-18 19:48:37,890 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 9.282e+01 1.040e+02 1.122e+02 1.499e+02, threshold=2.080e+02, percent-clipped=0.0 2023-11-18 19:48:37,918 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9250, loss[loss=0.09346, simple_loss=0.1043, pruned_loss=0.03032, audio_tagging_loss=0.01099, over 14193.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1175, pruned_loss=0.03203, audio_tagging_loss=0.01114, over 3057945.02 frames. ], batch size: 53, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:48:39,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2023-11-18 19:48:47,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=382360.0, ans=0.2 2023-11-18 19:48:58,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=382360.0, ans=0.1 2023-11-18 19:49:33,081 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9300, loss[loss=0.1014, simple_loss=0.1198, pruned_loss=0.03128, audio_tagging_loss=0.01021, over 14546.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1177, pruned_loss=0.03209, audio_tagging_loss=0.01118, over 3054055.12 frames. ], batch size: 55, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:49:47,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=382693.3333333333, ans=0.125 2023-11-18 19:49:53,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=382693.3333333333, ans=0.125 2023-11-18 19:50:02,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=382760.0, ans=0.2 2023-11-18 19:50:05,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=382760.0, ans=0.125 2023-11-18 19:50:13,941 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:50:15,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=382826.6666666667, ans=0.125 2023-11-18 19:50:15,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=382826.6666666667, ans=0.0 2023-11-18 19:50:16,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2023-11-18 19:50:29,715 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.203e+01 9.054e+01 9.801e+01 1.113e+02 1.567e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-18 19:50:29,741 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9350, loss[loss=0.1036, simple_loss=0.1258, pruned_loss=0.02729, audio_tagging_loss=0.01338, over 16219.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.119, pruned_loss=0.03223, audio_tagging_loss=0.01114, over 3059869.80 frames. ], batch size: 59, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:50:37,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2023-11-18 19:50:51,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=383093.3333333333, ans=0.125 2023-11-18 19:51:12,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=383160.0, ans=0.1 2023-11-18 19:51:21,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=12.0 2023-11-18 19:51:23,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=383226.6666666667, ans=0.0 2023-11-18 19:51:25,413 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9400, loss[loss=0.0962, simple_loss=0.1049, pruned_loss=0.02865, audio_tagging_loss=0.01508, over 14535.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1181, pruned_loss=0.0323, audio_tagging_loss=0.01126, over 3055330.90 frames. ], batch size: 56, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:51:26,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=383293.3333333333, ans=0.05 2023-11-18 19:51:34,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=383293.3333333333, ans=0.0 2023-11-18 19:51:47,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.85 vs. limit=10.0 2023-11-18 19:51:47,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=383426.6666666667, ans=0.125 2023-11-18 19:52:02,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=383493.3333333333, ans=0.125 2023-11-18 19:52:09,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=383560.0, ans=0.125 2023-11-18 19:52:10,519 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=15.0 2023-11-18 19:52:17,519 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:52:20,608 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.864e+01 9.867e+01 1.096e+02 1.502e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 19:52:20,636 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9450, loss[loss=0.1122, simple_loss=0.1279, pruned_loss=0.03585, audio_tagging_loss=0.01238, over 15662.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1183, pruned_loss=0.0324, audio_tagging_loss=0.0113, over 3054594.71 frames. ], batch size: 57, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:52:35,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=383693.3333333333, ans=0.0 2023-11-18 19:52:36,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=383693.3333333333, ans=0.0 2023-11-18 19:52:47,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=383760.0, ans=0.125 2023-11-18 19:52:49,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=383760.0, ans=0.125 2023-11-18 19:52:52,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=383760.0, ans=0.125 2023-11-18 19:52:54,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.44 vs. limit=10.0 2023-11-18 19:52:56,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=383826.6666666667, ans=0.125 2023-11-18 19:52:58,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=383826.6666666667, ans=0.125 2023-11-18 19:53:03,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=383826.6666666667, ans=0.0 2023-11-18 19:53:07,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=383893.3333333333, ans=0.0 2023-11-18 19:53:11,573 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:53:14,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=383893.3333333333, ans=0.125 2023-11-18 19:53:16,808 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9500, loss[loss=0.1186, simple_loss=0.1323, pruned_loss=0.03762, audio_tagging_loss=0.01484, over 15202.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1179, pruned_loss=0.03238, audio_tagging_loss=0.01144, over 3052569.51 frames. ], batch size: 56, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:53:19,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=383960.0, ans=0.0 2023-11-18 19:53:26,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=383960.0, ans=0.1 2023-11-18 19:53:39,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-18 19:53:42,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=384093.3333333333, ans=0.125 2023-11-18 19:53:57,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=384160.0, ans=0.0 2023-11-18 19:54:11,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2023-11-18 19:54:13,475 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 9.208e+01 1.015e+02 1.091e+02 1.477e+02, threshold=2.029e+02, percent-clipped=0.0 2023-11-18 19:54:13,506 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9550, loss[loss=0.1039, simple_loss=0.1159, pruned_loss=0.03556, audio_tagging_loss=0.01034, over 15409.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1193, pruned_loss=0.03272, audio_tagging_loss=0.01136, over 3055245.83 frames. ], batch size: 59, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:54:48,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=384493.3333333333, ans=0.125 2023-11-18 19:54:54,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=384493.3333333333, ans=0.0 2023-11-18 19:55:03,871 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.76 vs. limit=6.0 2023-11-18 19:55:05,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=384560.0, ans=0.1 2023-11-18 19:55:08,353 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9600, loss[loss=0.1207, simple_loss=0.139, pruned_loss=0.03984, audio_tagging_loss=0.01136, over 15862.00 frames. ], tot_loss[loss=0.1034, simple_loss=0.1188, pruned_loss=0.0326, audio_tagging_loss=0.01143, over 3057817.01 frames. ], batch size: 57, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:55:22,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=384693.3333333333, ans=0.1 2023-11-18 19:55:24,685 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:55:28,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.97 vs. limit=10.0 2023-11-18 19:55:36,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.01 vs. limit=6.0 2023-11-18 19:55:37,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=384760.0, ans=0.125 2023-11-18 19:55:39,413 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2023-11-18 19:55:58,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=384893.3333333333, ans=0.0 2023-11-18 19:56:00,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=384893.3333333333, ans=0.125 2023-11-18 19:56:03,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=384960.0, ans=0.125 2023-11-18 19:56:04,713 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9650, loss[loss=0.1176, simple_loss=0.1425, pruned_loss=0.03605, audio_tagging_loss=0.01027, over 15327.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1192, pruned_loss=0.03277, audio_tagging_loss=0.01133, over 3057009.98 frames. ], batch size: 57, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:56:05,739 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.450e+01 8.741e+01 9.505e+01 1.064e+02 1.391e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-18 19:56:17,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=385026.6666666667, ans=0.07 2023-11-18 19:56:27,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2023-11-18 19:56:29,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=385093.3333333333, ans=0.025 2023-11-18 19:56:31,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=385093.3333333333, ans=0.2 2023-11-18 19:56:55,996 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:57:00,514 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9700, loss[loss=0.1456, simple_loss=0.1661, pruned_loss=0.05259, audio_tagging_loss=0.009912, over 16137.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1187, pruned_loss=0.03255, audio_tagging_loss=0.01125, over 3056125.35 frames. ], batch size: 58, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:57:34,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=385493.3333333333, ans=0.07 2023-11-18 19:57:38,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=385493.3333333333, ans=0.0 2023-11-18 19:57:43,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.14 vs. limit=22.5 2023-11-18 19:57:47,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=385560.0, ans=0.0 2023-11-18 19:57:56,529 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9750, loss[loss=0.08187, simple_loss=0.09529, pruned_loss=0.019, audio_tagging_loss=0.01523, over 14382.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1186, pruned_loss=0.03237, audio_tagging_loss=0.01123, over 3053534.80 frames. ], batch size: 55, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:57:57,521 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.008e+01 9.015e+01 1.026e+02 1.125e+02 1.667e+02, threshold=2.051e+02, percent-clipped=0.0 2023-11-18 19:58:07,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=385693.3333333333, ans=0.125 2023-11-18 19:58:07,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=385693.3333333333, ans=0.125 2023-11-18 19:58:27,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=385760.0, ans=0.125 2023-11-18 19:58:32,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=385826.6666666667, ans=0.0 2023-11-18 19:58:38,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2023-11-18 19:58:52,970 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9800, loss[loss=0.1117, simple_loss=0.1149, pruned_loss=0.04314, audio_tagging_loss=0.01113, over 15010.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1175, pruned_loss=0.03185, audio_tagging_loss=0.01118, over 3044778.72 frames. ], batch size: 55, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:58:58,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.66 vs. limit=10.0 2023-11-18 19:58:59,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.64 vs. limit=10.0 2023-11-18 19:59:00,766 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.535e-03 2023-11-18 19:59:10,780 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:59:20,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=386093.3333333333, ans=0.125 2023-11-18 19:59:23,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=386093.3333333333, ans=0.1 2023-11-18 19:59:34,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386160.0, ans=0.1 2023-11-18 19:59:40,933 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:59:44,662 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.73 vs. limit=12.0 2023-11-18 19:59:48,956 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9850, loss[loss=0.1261, simple_loss=0.1516, pruned_loss=0.03937, audio_tagging_loss=0.01096, over 14660.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1189, pruned_loss=0.03229, audio_tagging_loss=0.01107, over 3044201.71 frames. ], batch size: 53, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:59:49,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 9.044e+01 9.858e+01 1.082e+02 1.412e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-18 20:00:12,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=386426.6666666667, ans=0.0 2023-11-18 20:00:28,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=386493.3333333333, ans=0.125 2023-11-18 20:00:44,514 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9900, loss[loss=0.1116, simple_loss=0.1317, pruned_loss=0.03572, audio_tagging_loss=0.01003, over 15516.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1181, pruned_loss=0.03207, audio_tagging_loss=0.01102, over 3045288.26 frames. ], batch size: 56, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:01:11,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.17 vs. limit=15.0 2023-11-18 20:01:19,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=386826.6666666667, ans=0.0 2023-11-18 20:01:20,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-11-18 20:01:34,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386893.3333333333, ans=0.1 2023-11-18 20:01:38,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=386893.3333333333, ans=0.125 2023-11-18 20:01:41,650 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 9950, loss[loss=0.1011, simple_loss=0.1192, pruned_loss=0.02924, audio_tagging_loss=0.01229, over 16165.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1168, pruned_loss=0.03167, audio_tagging_loss=0.01116, over 3045920.37 frames. ], batch size: 58, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:01:42,671 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.684e+01 9.823e+01 1.146e+02 1.516e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-18 20:01:44,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=386960.0, ans=0.0 2023-11-18 20:01:50,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=12.0 2023-11-18 20:01:52,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=387026.6666666667, ans=0.1 2023-11-18 20:02:00,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=387026.6666666667, ans=0.0 2023-11-18 20:02:14,498 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2023-11-18 20:02:19,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=387160.0, ans=0.125 2023-11-18 20:02:19,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=387160.0, ans=0.125 2023-11-18 20:02:36,726 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10000, loss[loss=0.06317, simple_loss=0.0711, pruned_loss=0.01781, audio_tagging_loss=0.009819, over 14730.00 frames. ], tot_loss[loss=0.1001, simple_loss=0.1153, pruned_loss=0.03131, audio_tagging_loss=0.01116, over 3049319.04 frames. ], batch size: 58, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:02:38,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.66 vs. limit=15.0 2023-11-18 20:03:03,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=387426.6666666667, ans=0.125 2023-11-18 20:03:21,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=12.0 2023-11-18 20:03:24,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=387560.0, ans=0.035 2023-11-18 20:03:32,498 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10050, loss[loss=0.0901, simple_loss=0.1021, pruned_loss=0.02572, audio_tagging_loss=0.01331, over 16584.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.1163, pruned_loss=0.0318, audio_tagging_loss=0.01111, over 3049230.62 frames. ], batch size: 62, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:03:33,534 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 9.098e+01 9.898e+01 1.122e+02 1.719e+02, threshold=1.980e+02, percent-clipped=0.0 2023-11-18 20:03:38,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=387626.6666666667, ans=0.09899494936611666 2023-11-18 20:03:39,057 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=387626.6666666667, ans=0.0 2023-11-18 20:03:55,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=387760.0, ans=0.125 2023-11-18 20:04:07,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=387826.6666666667, ans=0.0 2023-11-18 20:04:07,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=387826.6666666667, ans=0.0 2023-11-18 20:04:20,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=387893.3333333333, ans=0.0 2023-11-18 20:04:25,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=387893.3333333333, ans=0.1 2023-11-18 20:04:28,971 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10100, loss[loss=0.1164, simple_loss=0.1369, pruned_loss=0.03616, audio_tagging_loss=0.01174, over 15345.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.116, pruned_loss=0.03154, audio_tagging_loss=0.01119, over 3049110.24 frames. ], batch size: 55, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:04:38,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=388026.6666666667, ans=0.125 2023-11-18 20:04:49,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2023-11-18 20:05:04,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.88 vs. limit=15.0 2023-11-18 20:05:12,175 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:05:23,845 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10150, loss[loss=0.07976, simple_loss=0.08665, pruned_loss=0.02541, audio_tagging_loss=0.01102, over 15407.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.115, pruned_loss=0.0314, audio_tagging_loss=0.0114, over 3047530.51 frames. ], batch size: 58, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:05:24,856 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.804e+01 9.203e+01 1.000e+02 1.096e+02 2.259e+02, threshold=2.001e+02, percent-clipped=1.0 2023-11-18 20:05:47,624 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:05:49,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=388426.6666666667, ans=0.0 2023-11-18 20:05:58,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.21 vs. limit=15.0 2023-11-18 20:05:59,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.51 vs. limit=6.0 2023-11-18 20:05:59,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388493.3333333333, ans=0.1 2023-11-18 20:06:16,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=388560.0, ans=0.09899494936611666 2023-11-18 20:06:19,303 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10200, loss[loss=0.1004, simple_loss=0.1188, pruned_loss=0.02858, audio_tagging_loss=0.01236, over 16415.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1159, pruned_loss=0.03124, audio_tagging_loss=0.0113, over 3051749.72 frames. ], batch size: 62, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:06:20,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388626.6666666667, ans=0.1 2023-11-18 20:06:36,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=388693.3333333333, ans=0.125 2023-11-18 20:06:36,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=388693.3333333333, ans=0.1 2023-11-18 20:06:40,047 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:06:42,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=388760.0, ans=0.125 2023-11-18 20:06:49,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=388760.0, ans=0.125 2023-11-18 20:07:09,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=388893.3333333333, ans=0.1 2023-11-18 20:07:11,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=388893.3333333333, ans=0.125 2023-11-18 20:07:14,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=388960.0, ans=0.0 2023-11-18 20:07:14,928 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10250, loss[loss=0.07336, simple_loss=0.07625, pruned_loss=0.02286, audio_tagging_loss=0.01238, over 14620.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1164, pruned_loss=0.03169, audio_tagging_loss=0.01143, over 3050069.55 frames. ], batch size: 56, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:07:15,959 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 9.102e+01 9.857e+01 1.065e+02 1.324e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-18 20:07:18,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=388960.0, ans=0.0 2023-11-18 20:07:19,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388960.0, ans=0.1 2023-11-18 20:07:25,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2023-11-18 20:07:38,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.77 vs. limit=15.0 2023-11-18 20:07:57,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=389160.0, ans=0.125 2023-11-18 20:08:03,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.58 vs. limit=15.0 2023-11-18 20:08:08,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=389226.6666666667, ans=0.95 2023-11-18 20:08:11,209 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10300, loss[loss=0.08165, simple_loss=0.0999, pruned_loss=0.02208, audio_tagging_loss=0.00962, over 15293.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1158, pruned_loss=0.03166, audio_tagging_loss=0.0116, over 3048864.66 frames. ], batch size: 59, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:08:37,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2023-11-18 20:08:46,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=389493.3333333333, ans=0.125 2023-11-18 20:08:47,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=389493.3333333333, ans=0.125 2023-11-18 20:09:07,684 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10350, loss[loss=0.1142, simple_loss=0.1362, pruned_loss=0.03894, audio_tagging_loss=0.007171, over 16454.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1165, pruned_loss=0.03183, audio_tagging_loss=0.01162, over 3057841.72 frames. ], batch size: 60, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:09:08,730 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 9.314e+01 1.056e+02 1.157e+02 1.834e+02, threshold=2.113e+02, percent-clipped=0.0 2023-11-18 20:09:21,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=389693.3333333333, ans=0.125 2023-11-18 20:09:23,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=389693.3333333333, ans=0.0 2023-11-18 20:09:28,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=389693.3333333333, ans=0.125 2023-11-18 20:09:54,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=389893.3333333333, ans=10.0 2023-11-18 20:10:02,905 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10400, loss[loss=0.08543, simple_loss=0.099, pruned_loss=0.02453, audio_tagging_loss=0.01139, over 15398.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1154, pruned_loss=0.03111, audio_tagging_loss=0.01175, over 3052023.28 frames. ], batch size: 59, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:10:16,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=390026.6666666667, ans=0.2 2023-11-18 20:10:24,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=390026.6666666667, ans=0.125 2023-11-18 20:10:25,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=390093.3333333333, ans=0.1 2023-11-18 20:10:28,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=390093.3333333333, ans=0.0 2023-11-18 20:10:42,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=390160.0, ans=0.1 2023-11-18 20:10:59,427 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10450, loss[loss=0.1034, simple_loss=0.1272, pruned_loss=0.03171, audio_tagging_loss=0.008096, over 14292.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1158, pruned_loss=0.03124, audio_tagging_loss=0.01161, over 3045577.47 frames. ], batch size: 55, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:11:00,428 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.809e+01 9.608e+01 1.086e+02 1.646e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-18 20:11:07,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=390293.3333333333, ans=0.0 2023-11-18 20:11:31,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=390426.6666666667, ans=0.1 2023-11-18 20:11:35,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=390493.3333333333, ans=0.125 2023-11-18 20:11:44,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=390560.0, ans=0.125 2023-11-18 20:11:55,859 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10500, loss[loss=0.1082, simple_loss=0.1266, pruned_loss=0.03449, audio_tagging_loss=0.01036, over 15479.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1163, pruned_loss=0.03136, audio_tagging_loss=0.0114, over 3047076.41 frames. ], batch size: 60, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:12:10,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=390693.3333333333, ans=0.125 2023-11-18 20:12:14,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.12 vs. limit=12.0 2023-11-18 20:12:22,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=390760.0, ans=0.05 2023-11-18 20:12:23,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=390760.0, ans=0.1 2023-11-18 20:12:26,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=390760.0, ans=0.0 2023-11-18 20:12:27,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=390760.0, ans=0.125 2023-11-18 20:12:51,592 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10550, loss[loss=0.1257, simple_loss=0.1522, pruned_loss=0.04072, audio_tagging_loss=0.008929, over 15610.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1159, pruned_loss=0.0313, audio_tagging_loss=0.01129, over 3046636.65 frames. ], batch size: 55, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:12:52,611 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.618e+01 8.716e+01 9.677e+01 1.046e+02 1.546e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-18 20:12:54,936 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.62 vs. limit=15.0 2023-11-18 20:13:00,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=390960.0, ans=0.0 2023-11-18 20:13:05,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=391026.6666666667, ans=0.125 2023-11-18 20:13:26,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.85 vs. limit=22.5 2023-11-18 20:13:33,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=391160.0, ans=0.125 2023-11-18 20:13:45,373 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:13:47,311 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10600, loss[loss=0.09944, simple_loss=0.1106, pruned_loss=0.03433, audio_tagging_loss=0.009823, over 15691.00 frames. ], tot_loss[loss=0.09961, simple_loss=0.1146, pruned_loss=0.03099, audio_tagging_loss=0.01131, over 3042384.39 frames. ], batch size: 60, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:13:49,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=391293.3333333333, ans=0.1 2023-11-18 20:13:59,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=391360.0, ans=0.125 2023-11-18 20:14:00,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=391360.0, ans=0.2 2023-11-18 20:14:22,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=391493.3333333333, ans=0.0 2023-11-18 20:14:24,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=391493.3333333333, ans=0.0 2023-11-18 20:14:43,650 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10650, loss[loss=0.1179, simple_loss=0.1383, pruned_loss=0.03882, audio_tagging_loss=0.009874, over 15776.00 frames. ], tot_loss[loss=0.1, simple_loss=0.1152, pruned_loss=0.03117, audio_tagging_loss=0.01128, over 3046863.48 frames. ], batch size: 58, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:14:44,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 9.141e+01 1.015e+02 1.176e+02 1.580e+02, threshold=2.030e+02, percent-clipped=0.0 2023-11-18 20:14:52,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=391626.6666666667, ans=0.125 2023-11-18 20:14:59,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=391693.3333333333, ans=0.0 2023-11-18 20:15:08,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=391760.0, ans=10.0 2023-11-18 20:15:16,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=391826.6666666667, ans=0.04949747468305833 2023-11-18 20:15:18,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=391826.6666666667, ans=0.0 2023-11-18 20:15:31,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2023-11-18 20:15:37,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=391960.0, ans=0.125 2023-11-18 20:15:38,737 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10700, loss[loss=0.1036, simple_loss=0.139, pruned_loss=0.02664, audio_tagging_loss=0.007473, over 16076.00 frames. ], tot_loss[loss=0.09971, simple_loss=0.1147, pruned_loss=0.03106, audio_tagging_loss=0.01131, over 3049132.65 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:15:50,263 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.41 vs. limit=15.0 2023-11-18 20:15:55,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=392026.6666666667, ans=0.1 2023-11-18 20:16:01,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.92 vs. limit=22.5 2023-11-18 20:16:07,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=392093.3333333333, ans=0.0 2023-11-18 20:16:12,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=392160.0, ans=0.2 2023-11-18 20:16:17,141 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2023-11-18 20:16:35,674 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10750, loss[loss=0.0901, simple_loss=0.108, pruned_loss=0.0273, audio_tagging_loss=0.008774, over 14574.00 frames. ], tot_loss[loss=0.09979, simple_loss=0.1148, pruned_loss=0.03111, audio_tagging_loss=0.01127, over 3049057.97 frames. ], batch size: 55, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:16:36,726 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 9.086e+01 9.851e+01 1.129e+02 1.490e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-18 20:16:43,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=392293.3333333333, ans=0.2 2023-11-18 20:16:43,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=392293.3333333333, ans=0.0 2023-11-18 20:16:52,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.92 vs. limit=22.5 2023-11-18 20:16:58,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=22.5 2023-11-18 20:17:11,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=392493.3333333333, ans=0.125 2023-11-18 20:17:14,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=392493.3333333333, ans=0.07 2023-11-18 20:17:31,498 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10800, loss[loss=0.109, simple_loss=0.1256, pruned_loss=0.03422, audio_tagging_loss=0.01198, over 16029.00 frames. ], tot_loss[loss=0.1001, simple_loss=0.1153, pruned_loss=0.03123, audio_tagging_loss=0.01125, over 3048893.95 frames. ], batch size: 59, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:17:46,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=392693.3333333333, ans=0.125 2023-11-18 20:17:47,641 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2023-11-18 20:17:51,978 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.03 vs. limit=22.5 2023-11-18 20:17:55,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2023-11-18 20:18:04,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=392826.6666666667, ans=0.125 2023-11-18 20:18:17,625 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:18:17,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=392893.3333333333, ans=0.0 2023-11-18 20:18:24,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=392893.3333333333, ans=0.125 2023-11-18 20:18:26,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=392960.0, ans=0.05 2023-11-18 20:18:27,574 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10850, loss[loss=0.09113, simple_loss=0.1112, pruned_loss=0.02652, audio_tagging_loss=0.008982, over 14197.00 frames. ], tot_loss[loss=0.09966, simple_loss=0.1148, pruned_loss=0.03108, audio_tagging_loss=0.0112, over 3049559.80 frames. ], batch size: 55, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:18:28,582 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.407e+01 9.217e+01 1.010e+02 1.123e+02 1.956e+02, threshold=2.020e+02, percent-clipped=0.0 2023-11-18 20:18:33,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=392960.0, ans=0.0 2023-11-18 20:18:46,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=393026.6666666667, ans=0.015 2023-11-18 20:18:47,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=12.0 2023-11-18 20:18:49,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=393093.3333333333, ans=0.09899494936611666 2023-11-18 20:19:11,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=393226.6666666667, ans=0.0 2023-11-18 20:19:14,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=393226.6666666667, ans=0.05 2023-11-18 20:19:19,164 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:19:20,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=393226.6666666667, ans=0.0 2023-11-18 20:19:24,008 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10900, loss[loss=0.1159, simple_loss=0.1463, pruned_loss=0.03485, audio_tagging_loss=0.007911, over 15572.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1162, pruned_loss=0.03132, audio_tagging_loss=0.01127, over 3049928.03 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:19:44,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=393360.0, ans=0.1 2023-11-18 20:19:56,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=393493.3333333333, ans=0.07 2023-11-18 20:20:04,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=393493.3333333333, ans=0.0 2023-11-18 20:20:13,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=393560.0, ans=0.0 2023-11-18 20:20:15,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=393560.0, ans=0.1 2023-11-18 20:20:20,066 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 10950, loss[loss=0.1086, simple_loss=0.1214, pruned_loss=0.03568, audio_tagging_loss=0.01226, over 15441.00 frames. ], tot_loss[loss=0.1001, simple_loss=0.1153, pruned_loss=0.03109, audio_tagging_loss=0.01138, over 3051326.62 frames. ], batch size: 58, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:20:21,116 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 9.174e+01 1.016e+02 1.114e+02 1.629e+02, threshold=2.031e+02, percent-clipped=0.0 2023-11-18 20:20:22,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=393626.6666666667, ans=0.125 2023-11-18 20:20:24,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=393626.6666666667, ans=0.2 2023-11-18 20:20:49,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2023-11-18 20:20:54,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2023-11-18 20:20:54,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.84 vs. limit=15.0 2023-11-18 20:21:09,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=16.07 vs. limit=15.0 2023-11-18 20:21:11,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=393893.3333333333, ans=0.125 2023-11-18 20:21:11,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=393893.3333333333, ans=0.125 2023-11-18 20:21:13,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.83 vs. limit=22.5 2023-11-18 20:21:15,286 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11000, loss[loss=0.1171, simple_loss=0.1369, pruned_loss=0.03977, audio_tagging_loss=0.00889, over 14976.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.1157, pruned_loss=0.03114, audio_tagging_loss=0.0114, over 3046781.22 frames. ], batch size: 56, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:21:23,343 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:22:01,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=394226.6666666667, ans=0.0 2023-11-18 20:22:12,149 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11050, loss[loss=0.09855, simple_loss=0.1092, pruned_loss=0.02993, audio_tagging_loss=0.01403, over 14603.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1161, pruned_loss=0.03136, audio_tagging_loss=0.01146, over 3038313.60 frames. ], batch size: 56, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:22:13,193 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 9.478e+01 1.012e+02 1.085e+02 1.543e+02, threshold=2.025e+02, percent-clipped=0.0 2023-11-18 20:22:20,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=394293.3333333333, ans=0.0 2023-11-18 20:22:44,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=394493.3333333333, ans=0.1 2023-11-18 20:23:07,218 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11100, loss[loss=0.1053, simple_loss=0.1173, pruned_loss=0.03438, audio_tagging_loss=0.01225, over 15303.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1175, pruned_loss=0.03195, audio_tagging_loss=0.01155, over 3044021.12 frames. ], batch size: 60, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:23:38,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.35 vs. limit=15.0 2023-11-18 20:23:51,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=394893.3333333333, ans=0.04949747468305833 2023-11-18 20:23:59,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=394893.3333333333, ans=0.07 2023-11-18 20:24:03,468 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11150, loss[loss=0.09388, simple_loss=0.1021, pruned_loss=0.03023, audio_tagging_loss=0.01262, over 15530.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1173, pruned_loss=0.03197, audio_tagging_loss=0.01167, over 3049119.26 frames. ], batch size: 59, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:24:04,473 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 9.395e+01 1.022e+02 1.169e+02 1.423e+02, threshold=2.044e+02, percent-clipped=0.0 2023-11-18 20:24:18,634 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2023-11-18 20:24:26,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=395093.3333333333, ans=0.125 2023-11-18 20:24:45,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2023-11-18 20:24:51,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=395226.6666666667, ans=0.125 2023-11-18 20:24:52,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.63 vs. limit=15.0 2023-11-18 20:24:59,126 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11200, loss[loss=0.1153, simple_loss=0.1383, pruned_loss=0.03475, audio_tagging_loss=0.01143, over 15261.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1169, pruned_loss=0.03158, audio_tagging_loss=0.01169, over 3050208.70 frames. ], batch size: 55, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:25:24,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=395426.6666666667, ans=0.2 2023-11-18 20:25:27,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=395426.6666666667, ans=0.125 2023-11-18 20:25:29,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=395426.6666666667, ans=0.125 2023-11-18 20:25:37,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=395493.3333333333, ans=0.125 2023-11-18 20:25:51,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=395560.0, ans=0.1 2023-11-18 20:25:55,415 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11250, loss[loss=0.1094, simple_loss=0.1261, pruned_loss=0.03665, audio_tagging_loss=0.009713, over 15739.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1165, pruned_loss=0.03148, audio_tagging_loss=0.0116, over 3049971.31 frames. ], batch size: 58, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:25:56,448 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.620e+01 9.426e+01 1.024e+02 1.146e+02 1.822e+02, threshold=2.048e+02, percent-clipped=0.0 2023-11-18 20:26:06,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=395693.3333333333, ans=0.1 2023-11-18 20:26:34,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=395826.6666666667, ans=0.125 2023-11-18 20:26:38,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=395893.3333333333, ans=0.125 2023-11-18 20:26:47,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2023-11-18 20:26:50,727 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11300, loss[loss=0.101, simple_loss=0.1205, pruned_loss=0.03124, audio_tagging_loss=0.009535, over 14440.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.1164, pruned_loss=0.03148, audio_tagging_loss=0.01139, over 3050290.23 frames. ], batch size: 54, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:26:54,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=395960.0, ans=0.125 2023-11-18 20:26:54,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=395960.0, ans=0.125 2023-11-18 20:26:58,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=395960.0, ans=0.0 2023-11-18 20:27:29,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=396160.0, ans=0.125 2023-11-18 20:27:40,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=15.0 2023-11-18 20:27:45,784 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11350, loss[loss=0.1243, simple_loss=0.1487, pruned_loss=0.04007, audio_tagging_loss=0.009815, over 16145.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1164, pruned_loss=0.03125, audio_tagging_loss=0.01124, over 3063188.75 frames. ], batch size: 57, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:27:46,823 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 9.361e+01 1.045e+02 1.135e+02 1.699e+02, threshold=2.091e+02, percent-clipped=0.0 2023-11-18 20:28:15,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=396426.6666666667, ans=0.0 2023-11-18 20:28:19,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=396493.3333333333, ans=0.125 2023-11-18 20:28:21,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=396493.3333333333, ans=0.1 2023-11-18 20:28:22,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=396493.3333333333, ans=0.1 2023-11-18 20:28:42,021 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11400, loss[loss=0.09514, simple_loss=0.1124, pruned_loss=0.02994, audio_tagging_loss=0.00898, over 15354.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1167, pruned_loss=0.03125, audio_tagging_loss=0.01116, over 3044004.59 frames. ], batch size: 56, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:28:44,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=396626.6666666667, ans=0.0 2023-11-18 20:28:50,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2023-11-18 20:29:00,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=396693.3333333333, ans=0.125 2023-11-18 20:29:01,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=396693.3333333333, ans=0.1 2023-11-18 20:29:15,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=396826.6666666667, ans=0.2 2023-11-18 20:29:29,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2023-11-18 20:29:31,134 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:29:37,107 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11450, loss[loss=0.09832, simple_loss=0.1115, pruned_loss=0.03081, audio_tagging_loss=0.01176, over 15741.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.1163, pruned_loss=0.03117, audio_tagging_loss=0.01106, over 3045361.20 frames. ], batch size: 59, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:29:38,112 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.945e+01 1.000e+02 1.081e+02 1.401e+02, threshold=2.001e+02, percent-clipped=0.0 2023-11-18 20:29:38,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=12.0 2023-11-18 20:29:43,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2023-11-18 20:29:59,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=397093.3333333333, ans=0.125 2023-11-18 20:30:13,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=397160.0, ans=0.1 2023-11-18 20:30:15,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=397160.0, ans=0.125 2023-11-18 20:30:32,397 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11500, loss[loss=0.07462, simple_loss=0.08591, pruned_loss=0.02066, audio_tagging_loss=0.011, over 16398.00 frames. ], tot_loss[loss=0.0997, simple_loss=0.1153, pruned_loss=0.03086, audio_tagging_loss=0.01119, over 3046836.21 frames. ], batch size: 62, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:31:02,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=397426.6666666667, ans=0.125 2023-11-18 20:31:13,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=397493.3333333333, ans=0.2 2023-11-18 20:31:19,950 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:31:26,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=397560.0, ans=0.125 2023-11-18 20:31:29,299 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11550, loss[loss=0.1119, simple_loss=0.1317, pruned_loss=0.03574, audio_tagging_loss=0.01033, over 16054.00 frames. ], tot_loss[loss=0.101, simple_loss=0.1169, pruned_loss=0.03155, audio_tagging_loss=0.011, over 3051062.23 frames. ], batch size: 59, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:31:30,301 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.929e+01 8.927e+01 9.792e+01 1.098e+02 1.308e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-18 20:31:36,884 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.816e-01 2023-11-18 20:31:45,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=397693.3333333333, ans=0.0 2023-11-18 20:31:49,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.25 vs. limit=12.0 2023-11-18 20:32:00,540 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:32:09,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=397826.6666666667, ans=0.1 2023-11-18 20:32:12,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=397893.3333333333, ans=0.2 2023-11-18 20:32:17,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=397893.3333333333, ans=0.1 2023-11-18 20:32:24,908 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11600, loss[loss=0.0982, simple_loss=0.1209, pruned_loss=0.028, audio_tagging_loss=0.009755, over 15148.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1166, pruned_loss=0.03154, audio_tagging_loss=0.01112, over 3046155.15 frames. ], batch size: 56, lr: 1.29e-02, grad_scale: 64.0 2023-11-18 20:32:28,317 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:32:40,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=398026.6666666667, ans=0.125 2023-11-18 20:32:56,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=398093.3333333333, ans=0.1 2023-11-18 20:33:05,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=398160.0, ans=0.035 2023-11-18 20:33:08,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=15.0 2023-11-18 20:33:10,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=398226.6666666667, ans=0.05 2023-11-18 20:33:12,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=398226.6666666667, ans=0.125 2023-11-18 20:33:15,350 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.50 vs. limit=22.5 2023-11-18 20:33:20,115 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11650, loss[loss=0.1085, simple_loss=0.123, pruned_loss=0.0333, audio_tagging_loss=0.01368, over 14937.00 frames. ], tot_loss[loss=0.1001, simple_loss=0.1158, pruned_loss=0.03113, audio_tagging_loss=0.01107, over 3049717.27 frames. ], batch size: 58, lr: 1.29e-02, grad_scale: 64.0 2023-11-18 20:33:20,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=398293.3333333333, ans=0.0 2023-11-18 20:33:21,155 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.987e+01 1.026e+02 1.150e+02 1.533e+02, threshold=2.053e+02, percent-clipped=0.0 2023-11-18 20:33:22,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=398293.3333333333, ans=0.0 2023-11-18 20:33:29,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=398293.3333333333, ans=0.125 2023-11-18 20:33:29,988 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.818e-01 2023-11-18 20:34:01,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=398493.3333333333, ans=0.0 2023-11-18 20:34:16,068 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11700, loss[loss=0.1095, simple_loss=0.1209, pruned_loss=0.03828, audio_tagging_loss=0.01077, over 14506.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.116, pruned_loss=0.03115, audio_tagging_loss=0.01112, over 3048229.36 frames. ], batch size: 55, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:34:22,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=398626.6666666667, ans=0.125 2023-11-18 20:34:30,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=398693.3333333333, ans=0.125 2023-11-18 20:34:37,048 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=12.0 2023-11-18 20:34:39,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=398760.0, ans=0.09899494936611666 2023-11-18 20:34:52,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=398826.6666666667, ans=0.0 2023-11-18 20:34:57,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.99 vs. limit=22.5 2023-11-18 20:35:03,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=398893.3333333333, ans=0.125 2023-11-18 20:35:06,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.28 vs. limit=10.0 2023-11-18 20:35:12,949 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11750, loss[loss=0.07763, simple_loss=0.0848, pruned_loss=0.02431, audio_tagging_loss=0.01091, over 15095.00 frames. ], tot_loss[loss=0.101, simple_loss=0.1167, pruned_loss=0.03161, audio_tagging_loss=0.01106, over 3052687.30 frames. ], batch size: 57, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:35:15,031 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.870e+01 9.922e+01 1.106e+02 1.477e+02, threshold=1.984e+02, percent-clipped=0.0 2023-11-18 20:35:21,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=398960.0, ans=0.05 2023-11-18 20:35:25,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=399026.6666666667, ans=0.2 2023-11-18 20:35:27,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=399026.6666666667, ans=0.0 2023-11-18 20:35:28,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=399026.6666666667, ans=0.2 2023-11-18 20:35:35,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=399093.3333333333, ans=0.1 2023-11-18 20:35:46,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-11-18 20:35:51,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=399160.0, ans=0.5 2023-11-18 20:35:58,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=399226.6666666667, ans=0.125 2023-11-18 20:36:05,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=399226.6666666667, ans=0.125 2023-11-18 20:36:08,090 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11800, loss[loss=0.1111, simple_loss=0.1325, pruned_loss=0.03357, audio_tagging_loss=0.01127, over 14076.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.116, pruned_loss=0.03134, audio_tagging_loss=0.01117, over 3045586.00 frames. ], batch size: 52, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:36:10,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=12.0 2023-11-18 20:36:18,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=399360.0, ans=0.0 2023-11-18 20:36:26,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=399360.0, ans=0.0 2023-11-18 20:36:31,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.79 vs. limit=10.0 2023-11-18 20:36:38,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.33 vs. limit=15.0 2023-11-18 20:36:44,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=399493.3333333333, ans=0.2 2023-11-18 20:37:04,186 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11850, loss[loss=0.09804, simple_loss=0.1161, pruned_loss=0.02778, audio_tagging_loss=0.01222, over 14983.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.117, pruned_loss=0.03154, audio_tagging_loss=0.01118, over 3046481.91 frames. ], batch size: 57, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:37:05,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=399626.6666666667, ans=0.125 2023-11-18 20:37:06,260 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.830e+01 9.778e+01 1.086e+02 1.428e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-18 20:37:13,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=399693.3333333333, ans=0.125 2023-11-18 20:37:21,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=399693.3333333333, ans=0.0 2023-11-18 20:37:28,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=399760.0, ans=0.125 2023-11-18 20:37:44,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=399826.6666666667, ans=0.1 2023-11-18 20:37:49,380 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2023-11-18 20:37:58,852 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11900, loss[loss=0.1253, simple_loss=0.1537, pruned_loss=0.03991, audio_tagging_loss=0.008516, over 14656.00 frames. ], tot_loss[loss=0.102, simple_loss=0.118, pruned_loss=0.03176, audio_tagging_loss=0.01124, over 3049645.93 frames. ], batch size: 55, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:38:31,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=400093.3333333333, ans=0.0 2023-11-18 20:38:31,993 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:38:37,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=400160.0, ans=0.2 2023-11-18 20:38:56,542 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 11950, loss[loss=0.113, simple_loss=0.1323, pruned_loss=0.03761, audio_tagging_loss=0.009245, over 15345.00 frames. ], tot_loss[loss=0.1016, simple_loss=0.1168, pruned_loss=0.03169, audio_tagging_loss=0.01152, over 3047559.46 frames. ], batch size: 56, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:38:58,616 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.829e+01 9.865e+01 1.129e+02 1.573e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 20:39:08,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.47 vs. limit=10.0 2023-11-18 20:39:14,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=400360.0, ans=0.0 2023-11-18 20:39:17,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=400426.6666666667, ans=0.125 2023-11-18 20:39:18,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=400426.6666666667, ans=0.07 2023-11-18 20:39:50,214 INFO [train_asr.py:1115] (1/4) Epoch 5, batch 12000, loss[loss=0.1279, simple_loss=0.151, pruned_loss=0.04272, audio_tagging_loss=0.009695, over 16124.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1172, pruned_loss=0.03184, audio_tagging_loss=0.01155, over 3044854.64 frames. ], batch size: 56, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:39:50,215 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 20:40:18,374 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([0.6315, 3.3819, 2.5909, 2.7819, 3.6691, 3.4267, 2.9866, 3.4809], device='cuda:1') 2023-11-18 20:40:23,255 INFO [train_asr.py:1147] (1/4) Epoch 5, validation: loss=0.07195, simple_loss=0.05986, pruned_loss=0.008725, audio_tagging_loss=0.0333, over 4681554.00 frames. 2023-11-18 20:40:23,256 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 20:41:23,806 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 0, loss[loss=0.139, simple_loss=0.1579, pruned_loss=0.03751, audio_tagging_loss=0.02251, over 14704.00 frames. ], tot_loss[loss=0.139, simple_loss=0.1579, pruned_loss=0.03751, audio_tagging_loss=0.02251, over 14704.00 frames. ], batch size: 53, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:41:23,807 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 20:41:41,404 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8694, 4.9842, 5.0324, 5.0426], device='cuda:1') 2023-11-18 20:41:55,532 INFO [train_asr.py:1147] (1/4) Epoch 6, validation: loss=0.07069, simple_loss=0.05989, pruned_loss=0.008764, audio_tagging_loss=0.03198, over 4681554.00 frames. 2023-11-18 20:41:55,533 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 20:41:59,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=400780.0, ans=0.0 2023-11-18 20:42:01,410 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2023-11-18 20:42:09,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400846.6666666667, ans=0.1 2023-11-18 20:42:10,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=400846.6666666667, ans=0.125 2023-11-18 20:42:10,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2023-11-18 20:42:12,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.81 vs. limit=15.0 2023-11-18 20:42:16,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=12.0 2023-11-18 20:42:25,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=400913.3333333333, ans=0.125 2023-11-18 20:42:26,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=400913.3333333333, ans=0.04949747468305833 2023-11-18 20:42:27,001 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.778e+01 9.356e+01 1.020e+02 1.152e+02 1.600e+02, threshold=2.040e+02, percent-clipped=0.0 2023-11-18 20:42:46,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=401046.6666666667, ans=0.125 2023-11-18 20:42:50,313 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 50, loss[loss=0.1241, simple_loss=0.1417, pruned_loss=0.03663, audio_tagging_loss=0.01658, over 15775.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1145, pruned_loss=0.03041, audio_tagging_loss=0.02168, over 681213.44 frames. ], batch size: 58, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:42:59,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=401113.3333333333, ans=0.0 2023-11-18 20:43:04,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=401180.0, ans=0.0 2023-11-18 20:43:07,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=401180.0, ans=0.125 2023-11-18 20:43:09,000 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2023-11-18 20:43:12,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.02 vs. limit=6.0 2023-11-18 20:43:20,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=401246.6666666667, ans=0.125 2023-11-18 20:43:47,271 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 100, loss[loss=0.09783, simple_loss=0.1163, pruned_loss=0.02273, audio_tagging_loss=0.01696, over 15607.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1149, pruned_loss=0.03001, audio_tagging_loss=0.02103, over 1198605.34 frames. ], batch size: 55, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:43:51,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=401446.6666666667, ans=0.2 2023-11-18 20:44:13,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=401580.0, ans=0.0 2023-11-18 20:44:15,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=401580.0, ans=0.1 2023-11-18 20:44:19,279 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 9.166e+01 9.950e+01 1.092e+02 1.419e+02, threshold=1.990e+02, percent-clipped=0.0 2023-11-18 20:44:28,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2023-11-18 20:44:31,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.40 vs. limit=22.5 2023-11-18 20:44:33,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=401713.3333333333, ans=0.125 2023-11-18 20:44:34,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=401713.3333333333, ans=0.125 2023-11-18 20:44:43,065 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 150, loss[loss=0.1101, simple_loss=0.1302, pruned_loss=0.03419, audio_tagging_loss=0.01083, over 15859.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1166, pruned_loss=0.03057, audio_tagging_loss=0.0188, over 1606791.17 frames. ], batch size: 57, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:44:52,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=12.0 2023-11-18 20:45:07,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=401913.3333333333, ans=0.0 2023-11-18 20:45:23,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=401980.0, ans=0.125 2023-11-18 20:45:27,279 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.19 vs. limit=15.0 2023-11-18 20:45:39,163 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 200, loss[loss=0.09162, simple_loss=0.11, pruned_loss=0.02308, audio_tagging_loss=0.01352, over 15369.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1186, pruned_loss=0.03143, audio_tagging_loss=0.01636, over 1925925.14 frames. ], batch size: 58, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:45:42,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=402113.3333333333, ans=0.125 2023-11-18 20:45:50,376 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.42 vs. limit=12.0 2023-11-18 20:45:53,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=402180.0, ans=0.0 2023-11-18 20:45:59,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=402180.0, ans=0.05 2023-11-18 20:46:11,545 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.743e+01 9.007e+01 1.004e+02 1.088e+02 1.464e+02, threshold=2.009e+02, percent-clipped=0.0 2023-11-18 20:46:13,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=402313.3333333333, ans=0.0 2023-11-18 20:46:32,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=402380.0, ans=0.125 2023-11-18 20:46:35,567 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 250, loss[loss=0.1212, simple_loss=0.1433, pruned_loss=0.03729, audio_tagging_loss=0.01229, over 16373.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.1197, pruned_loss=0.03178, audio_tagging_loss=0.0147, over 2183242.75 frames. ], batch size: 60, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:46:51,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=402513.3333333333, ans=0.0 2023-11-18 20:46:52,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=402513.3333333333, ans=0.125 2023-11-18 20:46:56,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=402580.0, ans=0.125 2023-11-18 20:46:57,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=402580.0, ans=0.5 2023-11-18 20:47:03,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=402580.0, ans=0.0 2023-11-18 20:47:31,865 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 300, loss[loss=0.1149, simple_loss=0.1296, pruned_loss=0.03833, audio_tagging_loss=0.01178, over 16545.00 frames. ], tot_loss[loss=0.1059, simple_loss=0.1204, pruned_loss=0.03214, audio_tagging_loss=0.01354, over 2378683.15 frames. ], batch size: 59, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:47:33,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.65 vs. limit=15.0 2023-11-18 20:47:34,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=402780.0, ans=0.1 2023-11-18 20:47:54,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=402913.3333333333, ans=0.125 2023-11-18 20:47:59,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=402913.3333333333, ans=0.125 2023-11-18 20:48:03,988 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.203e+01 9.372e+01 1.051e+02 1.173e+02 1.706e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-18 20:48:06,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=402980.0, ans=0.125 2023-11-18 20:48:10,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=402980.0, ans=0.125 2023-11-18 20:48:13,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-18 20:48:20,243 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=15.0 2023-11-18 20:48:24,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=403046.6666666667, ans=0.125 2023-11-18 20:48:27,619 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 350, loss[loss=0.09966, simple_loss=0.1152, pruned_loss=0.03256, audio_tagging_loss=0.009477, over 14992.00 frames. ], tot_loss[loss=0.1042, simple_loss=0.119, pruned_loss=0.03173, audio_tagging_loss=0.01296, over 2530203.31 frames. ], batch size: 57, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:48:28,261 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-11-18 20:48:31,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=403113.3333333333, ans=0.125 2023-11-18 20:48:45,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=403180.0, ans=0.2 2023-11-18 20:48:55,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=403246.6666666667, ans=0.0 2023-11-18 20:49:20,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=403380.0, ans=0.125 2023-11-18 20:49:23,918 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 400, loss[loss=0.109, simple_loss=0.1204, pruned_loss=0.0375, audio_tagging_loss=0.01132, over 15244.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1173, pruned_loss=0.03105, audio_tagging_loss=0.01249, over 2648836.30 frames. ], batch size: 57, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:49:34,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=403513.3333333333, ans=0.125 2023-11-18 20:49:36,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2023-11-18 20:49:45,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-11-18 20:49:48,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=12.0 2023-11-18 20:49:55,726 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.946e+01 9.366e+01 1.079e+02 1.287e+02 1.849e+02, threshold=2.157e+02, percent-clipped=0.0 2023-11-18 20:49:59,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=403646.6666666667, ans=0.1 2023-11-18 20:50:14,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=403713.3333333333, ans=0.125 2023-11-18 20:50:19,393 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 450, loss[loss=0.08153, simple_loss=0.09184, pruned_loss=0.0213, audio_tagging_loss=0.01432, over 16348.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1168, pruned_loss=0.03076, audio_tagging_loss=0.0122, over 2734615.80 frames. ], batch size: 62, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:50:22,136 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2023-11-18 20:50:36,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=403846.6666666667, ans=0.2 2023-11-18 20:50:42,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=403913.3333333333, ans=0.95 2023-11-18 20:51:05,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=404046.6666666667, ans=0.2 2023-11-18 20:51:12,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2023-11-18 20:51:15,174 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 500, loss[loss=0.08053, simple_loss=0.09473, pruned_loss=0.0218, audio_tagging_loss=0.01137, over 14887.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1158, pruned_loss=0.03068, audio_tagging_loss=0.012, over 2799264.43 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:51:20,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=404113.3333333333, ans=0.025 2023-11-18 20:51:20,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.38 vs. limit=15.0 2023-11-18 20:51:42,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=404246.6666666667, ans=0.0 2023-11-18 20:51:47,904 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.724e+01 9.545e+01 1.075e+02 1.901e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-18 20:51:48,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=404313.3333333333, ans=0.125 2023-11-18 20:51:50,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=404313.3333333333, ans=0.0 2023-11-18 20:52:11,358 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 550, loss[loss=0.123, simple_loss=0.1491, pruned_loss=0.03849, audio_tagging_loss=0.009955, over 15737.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.116, pruned_loss=0.03092, audio_tagging_loss=0.01181, over 2859017.70 frames. ], batch size: 57, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:52:30,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=404513.3333333333, ans=0.125 2023-11-18 20:52:43,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=404646.6666666667, ans=0.0 2023-11-18 20:52:51,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=404646.6666666667, ans=0.0 2023-11-18 20:52:54,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=404646.6666666667, ans=0.125 2023-11-18 20:53:00,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=404713.3333333333, ans=15.0 2023-11-18 20:53:02,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=404713.3333333333, ans=0.125 2023-11-18 20:53:07,246 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 600, loss[loss=0.07913, simple_loss=0.08941, pruned_loss=0.01968, audio_tagging_loss=0.01475, over 15594.00 frames. ], tot_loss[loss=0.09986, simple_loss=0.1153, pruned_loss=0.03056, audio_tagging_loss=0.01166, over 2893910.37 frames. ], batch size: 59, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:53:09,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.64 vs. limit=22.5 2023-11-18 20:53:15,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=404780.0, ans=0.0 2023-11-18 20:53:30,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=404913.3333333333, ans=0.125 2023-11-18 20:53:30,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=404913.3333333333, ans=0.5 2023-11-18 20:53:39,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=404913.3333333333, ans=0.1 2023-11-18 20:53:40,229 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.597e+01 9.522e+01 1.046e+02 1.696e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-18 20:53:43,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.81 vs. limit=22.5 2023-11-18 20:53:48,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=404980.0, ans=0.0 2023-11-18 20:53:51,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=405046.6666666667, ans=0.125 2023-11-18 20:54:02,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.76 vs. limit=22.5 2023-11-18 20:54:03,254 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 650, loss[loss=0.1382, simple_loss=0.1695, pruned_loss=0.04555, audio_tagging_loss=0.007941, over 15756.00 frames. ], tot_loss[loss=0.09965, simple_loss=0.1151, pruned_loss=0.03053, audio_tagging_loss=0.01158, over 2924150.02 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:54:03,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.30 vs. limit=10.0 2023-11-18 20:54:32,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=405246.6666666667, ans=0.2 2023-11-18 20:54:40,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=405313.3333333333, ans=0.125 2023-11-18 20:54:59,361 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 700, loss[loss=0.1003, simple_loss=0.1199, pruned_loss=0.03088, audio_tagging_loss=0.00951, over 16924.00 frames. ], tot_loss[loss=0.09896, simple_loss=0.1146, pruned_loss=0.03026, audio_tagging_loss=0.01139, over 2959322.57 frames. ], batch size: 60, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:55:06,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=405446.6666666667, ans=0.1 2023-11-18 20:55:31,766 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.367e+01 9.330e+01 1.028e+02 1.121e+02 2.477e+02, threshold=2.056e+02, percent-clipped=1.0 2023-11-18 20:55:36,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=405646.6666666667, ans=0.0 2023-11-18 20:55:45,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=405713.3333333333, ans=0.125 2023-11-18 20:55:55,650 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 750, loss[loss=0.1107, simple_loss=0.1254, pruned_loss=0.03515, audio_tagging_loss=0.01291, over 15059.00 frames. ], tot_loss[loss=0.09897, simple_loss=0.1141, pruned_loss=0.03031, audio_tagging_loss=0.0116, over 2980337.46 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:56:07,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=405846.6666666667, ans=0.1 2023-11-18 20:56:18,997 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.85 vs. limit=10.0 2023-11-18 20:56:22,148 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=12.0 2023-11-18 20:56:28,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=405980.0, ans=0.125 2023-11-18 20:56:37,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2023-11-18 20:56:40,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=406046.6666666667, ans=0.0 2023-11-18 20:56:51,381 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 800, loss[loss=0.1287, simple_loss=0.1539, pruned_loss=0.03934, audio_tagging_loss=0.01235, over 15135.00 frames. ], tot_loss[loss=0.09888, simple_loss=0.114, pruned_loss=0.03021, audio_tagging_loss=0.01169, over 2994695.10 frames. ], batch size: 53, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:57:24,276 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 9.553e+01 1.008e+02 1.085e+02 1.896e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-18 20:57:30,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=406313.3333333333, ans=0.2 2023-11-18 20:57:46,586 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 850, loss[loss=0.08739, simple_loss=0.1057, pruned_loss=0.02315, audio_tagging_loss=0.01137, over 14994.00 frames. ], tot_loss[loss=0.09914, simple_loss=0.1145, pruned_loss=0.03026, audio_tagging_loss=0.01166, over 3004269.80 frames. ], batch size: 57, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:58:03,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.92 vs. limit=15.0 2023-11-18 20:58:10,078 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=406580.0, ans=0.125 2023-11-18 20:58:13,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=406580.0, ans=0.125 2023-11-18 20:58:16,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=406580.0, ans=0.0 2023-11-18 20:58:20,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2023-11-18 20:58:25,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=406646.6666666667, ans=0.1 2023-11-18 20:58:37,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=406713.3333333333, ans=0.0 2023-11-18 20:58:43,503 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 900, loss[loss=0.09965, simple_loss=0.1183, pruned_loss=0.03061, audio_tagging_loss=0.009896, over 13980.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1166, pruned_loss=0.03079, audio_tagging_loss=0.01158, over 3008581.54 frames. ], batch size: 52, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:58:51,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=406780.0, ans=0.125 2023-11-18 20:58:54,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=406846.6666666667, ans=0.125 2023-11-18 20:58:57,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=406846.6666666667, ans=0.2 2023-11-18 20:59:15,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.061e+01 8.915e+01 9.624e+01 1.067e+02 1.384e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-18 20:59:20,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=406980.0, ans=0.0 2023-11-18 20:59:39,123 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 950, loss[loss=0.1098, simple_loss=0.127, pruned_loss=0.03772, audio_tagging_loss=0.008558, over 15063.00 frames. ], tot_loss[loss=0.1, simple_loss=0.1159, pruned_loss=0.03064, audio_tagging_loss=0.01141, over 3015917.76 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:59:44,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=407113.3333333333, ans=0.2 2023-11-18 20:59:47,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=407113.3333333333, ans=0.0 2023-11-18 21:00:02,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=407246.6666666667, ans=0.125 2023-11-18 21:00:02,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=407246.6666666667, ans=0.0 2023-11-18 21:00:05,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=407246.6666666667, ans=0.1 2023-11-18 21:00:16,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=407313.3333333333, ans=0.0 2023-11-18 21:00:25,378 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-18 21:00:34,302 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1000, loss[loss=0.08073, simple_loss=0.08867, pruned_loss=0.02244, audio_tagging_loss=0.01395, over 15401.00 frames. ], tot_loss[loss=0.09946, simple_loss=0.1151, pruned_loss=0.03053, audio_tagging_loss=0.01137, over 3020456.58 frames. ], batch size: 60, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:00:37,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=407446.6666666667, ans=0.125 2023-11-18 21:00:44,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=407446.6666666667, ans=0.125 2023-11-18 21:00:45,475 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.19 vs. limit=15.0 2023-11-18 21:00:49,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=407513.3333333333, ans=0.025 2023-11-18 21:00:50,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=407513.3333333333, ans=0.125 2023-11-18 21:00:58,794 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:01:07,167 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 8.744e+01 1.004e+02 1.144e+02 1.885e+02, threshold=2.008e+02, percent-clipped=0.0 2023-11-18 21:01:11,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=407646.6666666667, ans=0.1 2023-11-18 21:01:15,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=407646.6666666667, ans=0.125 2023-11-18 21:01:27,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2023-11-18 21:01:27,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=407713.3333333333, ans=0.1 2023-11-18 21:01:30,884 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1050, loss[loss=0.1085, simple_loss=0.1339, pruned_loss=0.03323, audio_tagging_loss=0.008302, over 14742.00 frames. ], tot_loss[loss=0.09816, simple_loss=0.1138, pruned_loss=0.02994, audio_tagging_loss=0.0113, over 3021973.09 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:01:40,186 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=15.0 2023-11-18 21:01:44,319 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.76 vs. limit=15.0 2023-11-18 21:01:44,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=407846.6666666667, ans=0.125 2023-11-18 21:01:58,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=407913.3333333333, ans=0.0 2023-11-18 21:02:04,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=407980.0, ans=0.0 2023-11-18 21:02:15,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=408046.6666666667, ans=0.125 2023-11-18 21:02:17,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=408046.6666666667, ans=0.2 2023-11-18 21:02:27,543 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1100, loss[loss=0.1064, simple_loss=0.1278, pruned_loss=0.03471, audio_tagging_loss=0.007802, over 15316.00 frames. ], tot_loss[loss=0.09781, simple_loss=0.1133, pruned_loss=0.02994, audio_tagging_loss=0.01123, over 3025106.20 frames. ], batch size: 57, lr: 1.19e-02, grad_scale: 16.0 2023-11-18 21:02:29,725 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:02:41,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=408180.0, ans=0.0 2023-11-18 21:02:42,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.44 vs. limit=22.5 2023-11-18 21:02:44,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=408180.0, ans=0.05 2023-11-18 21:02:54,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=408246.6666666667, ans=0.0 2023-11-18 21:03:00,441 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.573e+01 9.716e+01 1.058e+02 1.424e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-18 21:03:08,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=408313.3333333333, ans=0.125 2023-11-18 21:03:08,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=408313.3333333333, ans=0.0 2023-11-18 21:03:22,697 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1150, loss[loss=0.1293, simple_loss=0.1537, pruned_loss=0.04116, audio_tagging_loss=0.01134, over 14694.00 frames. ], tot_loss[loss=0.0989, simple_loss=0.1151, pruned_loss=0.03023, audio_tagging_loss=0.01113, over 3028797.23 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 16.0 2023-11-18 21:03:24,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.49 vs. limit=12.0 2023-11-18 21:03:25,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=408446.6666666667, ans=0.05 2023-11-18 21:03:55,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=408646.6666666667, ans=0.1 2023-11-18 21:03:58,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=408646.6666666667, ans=0.0 2023-11-18 21:04:00,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=408646.6666666667, ans=0.2 2023-11-18 21:04:18,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=408780.0, ans=22.5 2023-11-18 21:04:19,254 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1200, loss[loss=0.118, simple_loss=0.1403, pruned_loss=0.03791, audio_tagging_loss=0.009921, over 14312.00 frames. ], tot_loss[loss=0.09859, simple_loss=0.1146, pruned_loss=0.03022, audio_tagging_loss=0.01105, over 3027339.02 frames. ], batch size: 54, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:04:26,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=408780.0, ans=0.035 2023-11-18 21:04:29,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=408846.6666666667, ans=0.125 2023-11-18 21:04:52,322 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 9.018e+01 9.709e+01 1.057e+02 1.336e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-18 21:05:15,226 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1250, loss[loss=0.0879, simple_loss=0.1053, pruned_loss=0.02696, audio_tagging_loss=0.008266, over 15922.00 frames. ], tot_loss[loss=0.09805, simple_loss=0.1139, pruned_loss=0.03005, audio_tagging_loss=0.01105, over 3026843.42 frames. ], batch size: 58, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:05:20,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=409113.3333333333, ans=0.0 2023-11-18 21:05:25,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=409180.0, ans=0.125 2023-11-18 21:05:28,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=409180.0, ans=0.2 2023-11-18 21:05:58,578 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2023-11-18 21:05:58,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.40 vs. limit=12.0 2023-11-18 21:06:07,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=409380.0, ans=0.1 2023-11-18 21:06:11,361 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1300, loss[loss=0.07016, simple_loss=0.07441, pruned_loss=0.01927, audio_tagging_loss=0.01367, over 14708.00 frames. ], tot_loss[loss=0.09767, simple_loss=0.1132, pruned_loss=0.02998, audio_tagging_loss=0.01109, over 3019359.09 frames. ], batch size: 58, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:06:24,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=409513.3333333333, ans=0.125 2023-11-18 21:06:28,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2023-11-18 21:06:29,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=409513.3333333333, ans=0.125 2023-11-18 21:06:31,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=409513.3333333333, ans=0.1 2023-11-18 21:06:44,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=409646.6666666667, ans=0.2 2023-11-18 21:06:45,373 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.886e+01 9.349e+01 1.016e+02 1.502e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-18 21:06:52,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=409646.6666666667, ans=0.05 2023-11-18 21:06:55,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.66 vs. limit=22.5 2023-11-18 21:07:00,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=12.0 2023-11-18 21:07:05,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=409713.3333333333, ans=0.04949747468305833 2023-11-18 21:07:07,805 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1350, loss[loss=0.1045, simple_loss=0.1249, pruned_loss=0.02979, audio_tagging_loss=0.01228, over 15982.00 frames. ], tot_loss[loss=0.09793, simple_loss=0.1135, pruned_loss=0.03001, audio_tagging_loss=0.01116, over 3030829.79 frames. ], batch size: 57, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:07:10,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=409780.0, ans=0.125 2023-11-18 21:07:11,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=409780.0, ans=0.04949747468305833 2023-11-18 21:07:11,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=409780.0, ans=0.0 2023-11-18 21:07:17,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=409846.6666666667, ans=0.0 2023-11-18 21:07:19,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.58 vs. limit=22.5 2023-11-18 21:07:20,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2023-11-18 21:07:31,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=409913.3333333333, ans=0.2 2023-11-18 21:07:42,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=409980.0, ans=0.125 2023-11-18 21:07:47,877 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:08:03,680 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1400, loss[loss=0.1051, simple_loss=0.1273, pruned_loss=0.03134, audio_tagging_loss=0.01013, over 15310.00 frames. ], tot_loss[loss=0.09748, simple_loss=0.1125, pruned_loss=0.0298, audio_tagging_loss=0.01142, over 3035335.17 frames. ], batch size: 57, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:08:05,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=410113.3333333333, ans=0.07 2023-11-18 21:08:13,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=410180.0, ans=0.125 2023-11-18 21:08:26,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=410246.6666666667, ans=0.1 2023-11-18 21:08:37,130 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.960e+01 8.879e+01 9.810e+01 1.048e+02 1.417e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-18 21:08:38,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=410313.3333333333, ans=0.125 2023-11-18 21:08:51,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=410380.0, ans=0.2 2023-11-18 21:08:59,549 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1450, loss[loss=0.08984, simple_loss=0.1039, pruned_loss=0.02511, audio_tagging_loss=0.01279, over 16384.00 frames. ], tot_loss[loss=0.09825, simple_loss=0.1136, pruned_loss=0.03005, audio_tagging_loss=0.01142, over 3035127.97 frames. ], batch size: 63, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:09:13,877 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:09:16,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=410513.3333333333, ans=0.07 2023-11-18 21:09:20,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=410513.3333333333, ans=0.0 2023-11-18 21:09:34,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=410646.6666666667, ans=0.125 2023-11-18 21:09:38,542 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.23 vs. limit=15.0 2023-11-18 21:09:56,065 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1500, loss[loss=0.09483, simple_loss=0.1125, pruned_loss=0.0277, audio_tagging_loss=0.0109, over 14798.00 frames. ], tot_loss[loss=0.09856, simple_loss=0.1139, pruned_loss=0.0302, audio_tagging_loss=0.01141, over 3033373.65 frames. ], batch size: 55, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:10:06,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.81 vs. limit=15.0 2023-11-18 21:10:06,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=410846.6666666667, ans=0.1 2023-11-18 21:10:12,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=410846.6666666667, ans=0.125 2023-11-18 21:10:13,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=410846.6666666667, ans=0.1 2023-11-18 21:10:29,752 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.050e+01 8.852e+01 9.763e+01 1.053e+02 1.656e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-18 21:10:50,109 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:10:51,952 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1550, loss[loss=0.09116, simple_loss=0.1102, pruned_loss=0.02422, audio_tagging_loss=0.01184, over 16634.00 frames. ], tot_loss[loss=0.09825, simple_loss=0.1139, pruned_loss=0.02985, audio_tagging_loss=0.01145, over 3043509.67 frames. ], batch size: 62, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:10:58,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2023-11-18 21:11:02,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=411180.0, ans=0.125 2023-11-18 21:11:06,897 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-18 21:11:26,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=411313.3333333333, ans=0.0 2023-11-18 21:11:26,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=12.0 2023-11-18 21:11:28,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=411313.3333333333, ans=0.125 2023-11-18 21:11:30,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2023-11-18 21:11:30,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.84 vs. limit=15.0 2023-11-18 21:11:31,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=411313.3333333333, ans=0.125 2023-11-18 21:11:39,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=411380.0, ans=0.125 2023-11-18 21:11:45,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-11-18 21:11:47,817 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1600, loss[loss=0.09303, simple_loss=0.1113, pruned_loss=0.02578, audio_tagging_loss=0.0116, over 15191.00 frames. ], tot_loss[loss=0.09887, simple_loss=0.1145, pruned_loss=0.03012, audio_tagging_loss=0.0115, over 3047222.98 frames. ], batch size: 57, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:11:56,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.15 vs. limit=15.0 2023-11-18 21:12:10,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.83 vs. limit=15.0 2023-11-18 21:12:11,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=411580.0, ans=0.125 2023-11-18 21:12:12,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=411580.0, ans=0.125 2023-11-18 21:12:21,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.910e+01 9.772e+01 1.109e+02 1.512e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-18 21:12:34,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2023-11-18 21:12:44,078 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1650, loss[loss=0.09978, simple_loss=0.1148, pruned_loss=0.03369, audio_tagging_loss=0.008689, over 16113.00 frames. ], tot_loss[loss=0.09923, simple_loss=0.1152, pruned_loss=0.03025, audio_tagging_loss=0.0114, over 3055196.49 frames. ], batch size: 62, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:12:48,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411780.0, ans=0.1 2023-11-18 21:12:58,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=411846.6666666667, ans=0.0 2023-11-18 21:13:05,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=411913.3333333333, ans=10.0 2023-11-18 21:13:08,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=411913.3333333333, ans=0.125 2023-11-18 21:13:39,906 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1700, loss[loss=0.1193, simple_loss=0.1566, pruned_loss=0.03447, audio_tagging_loss=0.006582, over 15909.00 frames. ], tot_loss[loss=0.09955, simple_loss=0.1155, pruned_loss=0.0304, audio_tagging_loss=0.01138, over 3058628.81 frames. ], batch size: 56, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:13:53,939 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-11-18 21:14:03,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=412246.6666666667, ans=0.125 2023-11-18 21:14:05,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=412246.6666666667, ans=0.125 2023-11-18 21:14:06,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=412246.6666666667, ans=0.0 2023-11-18 21:14:13,791 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 9.349e+01 1.070e+02 1.315e+02 2.031e+02, threshold=2.140e+02, percent-clipped=2.0 2023-11-18 21:14:15,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=412313.3333333333, ans=0.125 2023-11-18 21:14:17,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2023-11-18 21:14:35,717 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1750, loss[loss=0.1357, simple_loss=0.1605, pruned_loss=0.0471, audio_tagging_loss=0.008354, over 15423.00 frames. ], tot_loss[loss=0.09924, simple_loss=0.1151, pruned_loss=0.03039, audio_tagging_loss=0.0113, over 3057285.12 frames. ], batch size: 54, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:14:38,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.16 vs. limit=15.0 2023-11-18 21:14:48,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=412513.3333333333, ans=0.125 2023-11-18 21:14:58,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.13 vs. limit=6.0 2023-11-18 21:15:16,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=412646.6666666667, ans=0.125 2023-11-18 21:15:22,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=412713.3333333333, ans=0.0 2023-11-18 21:15:31,635 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1800, loss[loss=0.08608, simple_loss=0.09425, pruned_loss=0.02713, audio_tagging_loss=0.01182, over 14852.00 frames. ], tot_loss[loss=0.09913, simple_loss=0.1149, pruned_loss=0.03041, audio_tagging_loss=0.01126, over 3059786.11 frames. ], batch size: 57, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:15:36,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2023-11-18 21:15:40,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=12.0 2023-11-18 21:15:44,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.48 vs. limit=22.5 2023-11-18 21:15:50,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=412846.6666666667, ans=0.0 2023-11-18 21:16:06,198 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.676e+01 9.052e+01 1.017e+02 1.096e+02 2.007e+02, threshold=2.033e+02, percent-clipped=0.0 2023-11-18 21:16:27,504 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1850, loss[loss=0.09949, simple_loss=0.1198, pruned_loss=0.02909, audio_tagging_loss=0.01048, over 14639.00 frames. ], tot_loss[loss=0.09884, simple_loss=0.1146, pruned_loss=0.03031, audio_tagging_loss=0.01123, over 3055017.88 frames. ], batch size: 53, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:16:58,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.36 vs. limit=15.0 2023-11-18 21:16:59,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.54 vs. limit=22.5 2023-11-18 21:17:00,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=413313.3333333333, ans=0.025 2023-11-18 21:17:01,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=413313.3333333333, ans=0.125 2023-11-18 21:17:09,660 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2023-11-18 21:17:14,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=413380.0, ans=0.125 2023-11-18 21:17:14,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=413380.0, ans=0.2 2023-11-18 21:17:17,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=413380.0, ans=0.09899494936611666 2023-11-18 21:17:23,591 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1900, loss[loss=0.1085, simple_loss=0.1344, pruned_loss=0.03364, audio_tagging_loss=0.00766, over 16002.00 frames. ], tot_loss[loss=0.09934, simple_loss=0.1155, pruned_loss=0.03045, audio_tagging_loss=0.01115, over 3056453.26 frames. ], batch size: 60, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:17:58,662 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.936e+01 9.160e+01 9.941e+01 1.091e+02 1.656e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-18 21:17:58,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=413646.6666666667, ans=0.0 2023-11-18 21:18:16,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=413713.3333333333, ans=0.2 2023-11-18 21:18:16,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=413713.3333333333, ans=0.125 2023-11-18 21:18:18,273 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:18:19,655 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 1950, loss[loss=0.09963, simple_loss=0.1199, pruned_loss=0.02932, audio_tagging_loss=0.01037, over 15036.00 frames. ], tot_loss[loss=0.09837, simple_loss=0.1143, pruned_loss=0.03009, audio_tagging_loss=0.01114, over 3048972.25 frames. ], batch size: 56, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:18:35,895 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:18:45,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=413913.3333333333, ans=0.1 2023-11-18 21:19:03,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2023-11-18 21:19:14,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=414046.6666666667, ans=0.125 2023-11-18 21:19:15,973 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2000, loss[loss=0.07417, simple_loss=0.08275, pruned_loss=0.01953, audio_tagging_loss=0.01326, over 16606.00 frames. ], tot_loss[loss=0.09839, simple_loss=0.1144, pruned_loss=0.03012, audio_tagging_loss=0.01105, over 3053816.80 frames. ], batch size: 63, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:19:16,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=414113.3333333333, ans=0.0 2023-11-18 21:19:18,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=414113.3333333333, ans=0.125 2023-11-18 21:19:22,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=414113.3333333333, ans=0.0 2023-11-18 21:19:23,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=22.5 2023-11-18 21:19:29,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=414180.0, ans=0.1 2023-11-18 21:19:50,562 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.514e+01 8.645e+01 9.576e+01 1.020e+02 1.190e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-18 21:19:50,879 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:20:11,743 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2050, loss[loss=0.1192, simple_loss=0.1358, pruned_loss=0.04134, audio_tagging_loss=0.009963, over 14725.00 frames. ], tot_loss[loss=0.09789, simple_loss=0.1137, pruned_loss=0.03, audio_tagging_loss=0.01104, over 3044270.92 frames. ], batch size: 54, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:20:19,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2023-11-18 21:20:42,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=414580.0, ans=22.5 2023-11-18 21:20:53,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=414646.6666666667, ans=0.0 2023-11-18 21:21:04,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=414713.3333333333, ans=0.1 2023-11-18 21:21:07,344 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2100, loss[loss=0.1146, simple_loss=0.1292, pruned_loss=0.03965, audio_tagging_loss=0.01032, over 14981.00 frames. ], tot_loss[loss=0.09819, simple_loss=0.1144, pruned_loss=0.03, audio_tagging_loss=0.011, over 3044320.40 frames. ], batch size: 56, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:21:11,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=414780.0, ans=0.0 2023-11-18 21:21:13,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=414780.0, ans=0.0 2023-11-18 21:21:19,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=414846.6666666667, ans=0.0 2023-11-18 21:21:19,582 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.44 vs. limit=22.5 2023-11-18 21:21:26,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=414846.6666666667, ans=0.1 2023-11-18 21:21:26,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=414846.6666666667, ans=0.125 2023-11-18 21:21:31,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=414913.3333333333, ans=0.125 2023-11-18 21:21:41,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=414980.0, ans=0.0 2023-11-18 21:21:42,501 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 9.113e+01 9.926e+01 1.128e+02 1.703e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-18 21:21:45,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=414980.0, ans=0.0 2023-11-18 21:21:47,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=414980.0, ans=0.125 2023-11-18 21:22:03,816 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2150, loss[loss=0.08484, simple_loss=0.1029, pruned_loss=0.02206, audio_tagging_loss=0.01131, over 15910.00 frames. ], tot_loss[loss=0.09869, simple_loss=0.1147, pruned_loss=0.03027, audio_tagging_loss=0.01105, over 3047212.92 frames. ], batch size: 59, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:22:09,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=415113.3333333333, ans=0.1 2023-11-18 21:22:13,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=15.0 2023-11-18 21:22:36,318 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:22:36,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2023-11-18 21:22:38,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=415313.3333333333, ans=0.125 2023-11-18 21:22:46,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=415313.3333333333, ans=0.1 2023-11-18 21:22:53,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=415380.0, ans=0.2 2023-11-18 21:22:54,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=415380.0, ans=0.2 2023-11-18 21:22:59,972 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2200, loss[loss=0.1084, simple_loss=0.137, pruned_loss=0.02969, audio_tagging_loss=0.01022, over 14785.00 frames. ], tot_loss[loss=0.09859, simple_loss=0.1147, pruned_loss=0.03024, audio_tagging_loss=0.01099, over 3049983.55 frames. ], batch size: 54, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:23:03,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=415446.6666666667, ans=0.2 2023-11-18 21:23:12,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.71 vs. limit=10.0 2023-11-18 21:23:27,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=415580.0, ans=0.1 2023-11-18 21:23:34,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=415646.6666666667, ans=0.025 2023-11-18 21:23:34,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=415646.6666666667, ans=0.05 2023-11-18 21:23:35,840 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.875e+01 9.823e+01 1.124e+02 2.816e+02, threshold=1.965e+02, percent-clipped=1.0 2023-11-18 21:23:41,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=415646.6666666667, ans=0.0 2023-11-18 21:23:43,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=415713.3333333333, ans=0.2 2023-11-18 21:23:54,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=415780.0, ans=0.125 2023-11-18 21:23:55,603 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2250, loss[loss=0.08658, simple_loss=0.08641, pruned_loss=0.03036, audio_tagging_loss=0.01301, over 14815.00 frames. ], tot_loss[loss=0.09798, simple_loss=0.1141, pruned_loss=0.0299, audio_tagging_loss=0.01105, over 3048108.57 frames. ], batch size: 56, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:24:02,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.90 vs. limit=10.0 2023-11-18 21:24:08,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=415846.6666666667, ans=0.125 2023-11-18 21:24:38,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.21 vs. limit=15.0 2023-11-18 21:24:51,861 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2300, loss[loss=0.1183, simple_loss=0.1275, pruned_loss=0.04761, audio_tagging_loss=0.006967, over 14449.00 frames. ], tot_loss[loss=0.09756, simple_loss=0.1135, pruned_loss=0.0296, audio_tagging_loss=0.01122, over 3051230.89 frames. ], batch size: 53, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:25:03,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=416180.0, ans=0.125 2023-11-18 21:25:22,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=416246.6666666667, ans=0.125 2023-11-18 21:25:27,504 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.888e+01 8.781e+01 9.557e+01 1.037e+02 1.979e+02, threshold=1.911e+02, percent-clipped=1.0 2023-11-18 21:25:29,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=416313.3333333333, ans=0.125 2023-11-18 21:25:31,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.95 vs. limit=15.0 2023-11-18 21:25:37,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=416380.0, ans=0.1 2023-11-18 21:25:39,373 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:25:40,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=416380.0, ans=0.0 2023-11-18 21:25:45,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.54 vs. limit=10.0 2023-11-18 21:25:47,863 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2350, loss[loss=0.1263, simple_loss=0.1557, pruned_loss=0.0403, audio_tagging_loss=0.00814, over 16814.00 frames. ], tot_loss[loss=0.09756, simple_loss=0.1136, pruned_loss=0.02955, audio_tagging_loss=0.01121, over 3047393.62 frames. ], batch size: 57, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:25:49,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=416446.6666666667, ans=0.125 2023-11-18 21:26:15,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=416580.0, ans=0.125 2023-11-18 21:26:41,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=416713.3333333333, ans=0.125 2023-11-18 21:26:43,289 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2400, loss[loss=0.09914, simple_loss=0.1095, pruned_loss=0.03161, audio_tagging_loss=0.0128, over 14533.00 frames. ], tot_loss[loss=0.09851, simple_loss=0.1146, pruned_loss=0.02993, audio_tagging_loss=0.01129, over 3045789.39 frames. ], batch size: 56, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:26:48,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=416780.0, ans=0.2 2023-11-18 21:26:52,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=416780.0, ans=0.125 2023-11-18 21:27:20,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.680e+01 9.662e+01 1.129e+02 1.566e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-18 21:27:24,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=416980.0, ans=0.0 2023-11-18 21:27:26,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=416980.0, ans=0.2 2023-11-18 21:27:38,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2023-11-18 21:27:39,627 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2450, loss[loss=0.1211, simple_loss=0.1387, pruned_loss=0.04076, audio_tagging_loss=0.01096, over 14750.00 frames. ], tot_loss[loss=0.09901, simple_loss=0.1152, pruned_loss=0.03004, audio_tagging_loss=0.01137, over 3045464.05 frames. ], batch size: 57, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:27:40,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=417113.3333333333, ans=0.04949747468305833 2023-11-18 21:27:51,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=417180.0, ans=0.125 2023-11-18 21:28:04,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=417246.6666666667, ans=0.125 2023-11-18 21:28:08,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=417246.6666666667, ans=0.1 2023-11-18 21:28:19,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=417313.3333333333, ans=0.125 2023-11-18 21:28:21,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=417313.3333333333, ans=0.5 2023-11-18 21:28:22,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=417313.3333333333, ans=0.0 2023-11-18 21:28:26,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=417380.0, ans=0.2 2023-11-18 21:28:35,689 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2500, loss[loss=0.1015, simple_loss=0.1191, pruned_loss=0.03269, audio_tagging_loss=0.009218, over 14928.00 frames. ], tot_loss[loss=0.09845, simple_loss=0.1143, pruned_loss=0.02989, audio_tagging_loss=0.0114, over 3056981.35 frames. ], batch size: 55, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:28:38,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=417446.6666666667, ans=0.1 2023-11-18 21:29:01,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=417580.0, ans=0.125 2023-11-18 21:29:06,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=417580.0, ans=15.0 2023-11-18 21:29:08,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=417646.6666666667, ans=0.125 2023-11-18 21:29:12,269 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.822e+01 8.909e+01 1.012e+02 1.108e+02 1.409e+02, threshold=2.024e+02, percent-clipped=0.0 2023-11-18 21:29:31,819 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2550, loss[loss=0.09701, simple_loss=0.1192, pruned_loss=0.0295, audio_tagging_loss=0.00791, over 14317.00 frames. ], tot_loss[loss=0.0984, simple_loss=0.1144, pruned_loss=0.02993, audio_tagging_loss=0.01126, over 3052559.64 frames. ], batch size: 55, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:29:35,642 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2023-11-18 21:29:42,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=417846.6666666667, ans=0.025 2023-11-18 21:29:55,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=417913.3333333333, ans=0.125 2023-11-18 21:30:08,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2023-11-18 21:30:17,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=418046.6666666667, ans=0.125 2023-11-18 21:30:28,035 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2600, loss[loss=0.1311, simple_loss=0.1651, pruned_loss=0.04336, audio_tagging_loss=0.005239, over 15308.00 frames. ], tot_loss[loss=0.09778, simple_loss=0.1138, pruned_loss=0.0297, audio_tagging_loss=0.0112, over 3050533.21 frames. ], batch size: 54, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:30:45,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=418180.0, ans=0.1 2023-11-18 21:30:47,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=418180.0, ans=0.1 2023-11-18 21:30:56,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=418246.6666666667, ans=0.0 2023-11-18 21:31:02,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=418313.3333333333, ans=0.125 2023-11-18 21:31:04,866 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.916e+01 1.001e+02 1.139e+02 1.588e+02, threshold=2.001e+02, percent-clipped=0.0 2023-11-18 21:31:13,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=418380.0, ans=0.0 2023-11-18 21:31:14,188 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:31:24,009 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2650, loss[loss=0.08327, simple_loss=0.09145, pruned_loss=0.02433, audio_tagging_loss=0.01322, over 15174.00 frames. ], tot_loss[loss=0.09814, simple_loss=0.1141, pruned_loss=0.02991, audio_tagging_loss=0.01121, over 3049933.24 frames. ], batch size: 59, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:31:26,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=418446.6666666667, ans=0.125 2023-11-18 21:31:38,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=418513.3333333333, ans=0.1 2023-11-18 21:31:53,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.79 vs. limit=10.0 2023-11-18 21:32:19,365 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.86 vs. limit=10.0 2023-11-18 21:32:19,689 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2700, loss[loss=0.1223, simple_loss=0.1349, pruned_loss=0.04417, audio_tagging_loss=0.01067, over 16223.00 frames. ], tot_loss[loss=0.09848, simple_loss=0.1146, pruned_loss=0.03009, audio_tagging_loss=0.01109, over 3045097.41 frames. ], batch size: 60, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:32:45,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=418913.3333333333, ans=0.0 2023-11-18 21:32:56,911 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.943e+01 9.076e+01 9.942e+01 1.068e+02 1.459e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-18 21:33:07,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=419046.6666666667, ans=0.125 2023-11-18 21:33:16,779 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2750, loss[loss=0.1105, simple_loss=0.1267, pruned_loss=0.03582, audio_tagging_loss=0.01134, over 15702.00 frames. ], tot_loss[loss=0.09797, simple_loss=0.1139, pruned_loss=0.02988, audio_tagging_loss=0.01113, over 3043945.93 frames. ], batch size: 60, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:33:32,810 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.02 vs. limit=15.0 2023-11-18 21:33:35,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=419180.0, ans=0.0 2023-11-18 21:33:40,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2023-11-18 21:33:40,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=419246.6666666667, ans=0.1 2023-11-18 21:34:03,604 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:34:12,564 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2800, loss[loss=0.1272, simple_loss=0.1601, pruned_loss=0.03996, audio_tagging_loss=0.007208, over 15852.00 frames. ], tot_loss[loss=0.09724, simple_loss=0.1132, pruned_loss=0.02953, audio_tagging_loss=0.01111, over 3045178.49 frames. ], batch size: 57, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:34:24,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=419513.3333333333, ans=0.2 2023-11-18 21:34:30,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=419513.3333333333, ans=0.0 2023-11-18 21:34:49,204 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.663e+01 8.954e+01 9.859e+01 1.088e+02 1.629e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-18 21:35:07,673 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2850, loss[loss=0.07547, simple_loss=0.07867, pruned_loss=0.0223, audio_tagging_loss=0.01384, over 15567.00 frames. ], tot_loss[loss=0.09691, simple_loss=0.1127, pruned_loss=0.02946, audio_tagging_loss=0.01109, over 3044116.81 frames. ], batch size: 61, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:35:13,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=419780.0, ans=0.125 2023-11-18 21:35:13,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=419780.0, ans=0.0 2023-11-18 21:35:13,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=419780.0, ans=0.0 2023-11-18 21:35:22,573 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.24 vs. limit=22.5 2023-11-18 21:35:28,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=419846.6666666667, ans=0.125 2023-11-18 21:35:46,767 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=419980.0, ans=0.0 2023-11-18 21:35:59,564 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=420046.6666666667, ans=0.0 2023-11-18 21:36:02,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=420046.6666666667, ans=0.125 2023-11-18 21:36:05,319 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2900, loss[loss=0.09612, simple_loss=0.1054, pruned_loss=0.03088, audio_tagging_loss=0.01256, over 15247.00 frames. ], tot_loss[loss=0.09746, simple_loss=0.1134, pruned_loss=0.02964, audio_tagging_loss=0.01114, over 3042196.73 frames. ], batch size: 58, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:36:20,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=420180.0, ans=0.125 2023-11-18 21:36:27,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.58 vs. limit=22.5 2023-11-18 21:36:32,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=420246.6666666667, ans=0.125 2023-11-18 21:36:39,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.08 vs. limit=22.5 2023-11-18 21:36:40,900 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.758e+01 9.574e+01 1.055e+02 1.297e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-18 21:36:44,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=420313.3333333333, ans=0.0 2023-11-18 21:36:53,384 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.14 vs. limit=15.0 2023-11-18 21:36:53,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420380.0, ans=0.1 2023-11-18 21:37:00,114 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 2950, loss[loss=0.09251, simple_loss=0.1123, pruned_loss=0.02525, audio_tagging_loss=0.0111, over 14545.00 frames. ], tot_loss[loss=0.09772, simple_loss=0.114, pruned_loss=0.02962, audio_tagging_loss=0.01112, over 3045977.84 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:37:18,549 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.83 vs. limit=15.0 2023-11-18 21:37:23,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=420580.0, ans=12.0 2023-11-18 21:37:32,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=420646.6666666667, ans=0.0 2023-11-18 21:37:51,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=420713.3333333333, ans=0.125 2023-11-18 21:37:53,904 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.92 vs. limit=22.5 2023-11-18 21:37:55,425 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3000, loss[loss=0.1194, simple_loss=0.1483, pruned_loss=0.03833, audio_tagging_loss=0.00693, over 14858.00 frames. ], tot_loss[loss=0.09916, simple_loss=0.1156, pruned_loss=0.03026, audio_tagging_loss=0.0111, over 3049120.87 frames. ], batch size: 53, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:37:55,425 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 21:38:19,085 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.6991, 5.7505, 5.7914, 5.8458], device='cuda:1') 2023-11-18 21:38:24,151 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([0.8619, 3.4417, 2.8833, 2.9434, 3.6900, 3.2476, 3.2362, 3.5286], device='cuda:1') 2023-11-18 21:38:28,439 INFO [train_asr.py:1147] (1/4) Epoch 6, validation: loss=0.07003, simple_loss=0.05914, pruned_loss=0.008279, audio_tagging_loss=0.03218, over 4681554.00 frames. 2023-11-18 21:38:28,440 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 21:38:49,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=420913.3333333333, ans=0.125 2023-11-18 21:38:58,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=420913.3333333333, ans=0.0 2023-11-18 21:39:03,792 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.758e+01 9.190e+01 1.009e+02 1.131e+02 1.432e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-18 21:39:06,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=420980.0, ans=0.125 2023-11-18 21:39:19,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=421046.6666666667, ans=6.0 2023-11-18 21:39:23,545 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3050, loss[loss=0.1124, simple_loss=0.1385, pruned_loss=0.03345, audio_tagging_loss=0.009688, over 14821.00 frames. ], tot_loss[loss=0.09959, simple_loss=0.1163, pruned_loss=0.03031, audio_tagging_loss=0.01114, over 3048473.52 frames. ], batch size: 54, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:39:46,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.98 vs. limit=6.0 2023-11-18 21:39:55,065 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:40:01,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=421313.3333333333, ans=0.05 2023-11-18 21:40:11,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=421380.0, ans=0.125 2023-11-18 21:40:19,089 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3100, loss[loss=0.1016, simple_loss=0.1222, pruned_loss=0.02665, audio_tagging_loss=0.01387, over 15569.00 frames. ], tot_loss[loss=0.101, simple_loss=0.1178, pruned_loss=0.03089, audio_tagging_loss=0.01122, over 3044743.37 frames. ], batch size: 57, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:40:34,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=421513.3333333333, ans=0.1 2023-11-18 21:40:38,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=421513.3333333333, ans=0.125 2023-11-18 21:40:49,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=421580.0, ans=0.125 2023-11-18 21:40:55,634 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.959e+01 9.848e+01 1.091e+02 1.372e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-18 21:40:58,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=421646.6666666667, ans=0.0 2023-11-18 21:41:06,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2023-11-18 21:41:10,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=421713.3333333333, ans=0.125 2023-11-18 21:41:14,376 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3150, loss[loss=0.1329, simple_loss=0.1718, pruned_loss=0.04045, audio_tagging_loss=0.006496, over 15317.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.118, pruned_loss=0.03064, audio_tagging_loss=0.01123, over 3044633.87 frames. ], batch size: 54, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:41:15,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=421780.0, ans=0.125 2023-11-18 21:41:24,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=421846.6666666667, ans=0.125 2023-11-18 21:41:30,518 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=421846.6666666667, ans=0.04949747468305833 2023-11-18 21:41:31,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=421846.6666666667, ans=0.0 2023-11-18 21:41:37,804 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:41:57,095 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=12.0 2023-11-18 21:41:59,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=422046.6666666667, ans=0.1 2023-11-18 21:42:05,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=422046.6666666667, ans=0.0 2023-11-18 21:42:10,392 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3200, loss[loss=0.08568, simple_loss=0.09587, pruned_loss=0.02563, audio_tagging_loss=0.01212, over 14933.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1175, pruned_loss=0.03065, audio_tagging_loss=0.01148, over 3043109.68 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:42:20,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=422180.0, ans=0.0 2023-11-18 21:42:41,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=422246.6666666667, ans=0.2 2023-11-18 21:42:41,933 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2023-11-18 21:42:46,573 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.629e+01 9.130e+01 9.753e+01 1.111e+02 1.645e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-18 21:43:05,605 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3250, loss[loss=0.114, simple_loss=0.1264, pruned_loss=0.03908, audio_tagging_loss=0.01175, over 14878.00 frames. ], tot_loss[loss=0.09951, simple_loss=0.1157, pruned_loss=0.03002, audio_tagging_loss=0.01163, over 3049813.74 frames. ], batch size: 55, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:43:59,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=422713.3333333333, ans=0.09899494936611666 2023-11-18 21:43:59,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.49 vs. limit=6.0 2023-11-18 21:44:01,575 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3300, loss[loss=0.09795, simple_loss=0.1182, pruned_loss=0.0304, audio_tagging_loss=0.008433, over 15331.00 frames. ], tot_loss[loss=0.0993, simple_loss=0.1154, pruned_loss=0.03005, audio_tagging_loss=0.01155, over 3051797.60 frames. ], batch size: 58, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:44:06,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.67 vs. limit=15.0 2023-11-18 21:44:11,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=422846.6666666667, ans=0.125 2023-11-18 21:44:28,949 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2023-11-18 21:44:38,003 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.927e+01 9.126e+01 1.034e+02 1.155e+02 1.977e+02, threshold=2.069e+02, percent-clipped=1.0 2023-11-18 21:44:57,141 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3350, loss[loss=0.09511, simple_loss=0.1172, pruned_loss=0.02739, audio_tagging_loss=0.009108, over 15060.00 frames. ], tot_loss[loss=0.09916, simple_loss=0.1152, pruned_loss=0.03019, audio_tagging_loss=0.01138, over 3054876.64 frames. ], batch size: 58, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:45:02,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=423113.3333333333, ans=0.125 2023-11-18 21:45:16,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=423180.0, ans=0.0 2023-11-18 21:45:20,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=423246.6666666667, ans=0.1 2023-11-18 21:45:52,880 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3400, loss[loss=0.112, simple_loss=0.1461, pruned_loss=0.03136, audio_tagging_loss=0.007618, over 15657.00 frames. ], tot_loss[loss=0.09867, simple_loss=0.1148, pruned_loss=0.03004, audio_tagging_loss=0.01124, over 3053441.14 frames. ], batch size: 55, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:46:01,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=423446.6666666667, ans=0.125 2023-11-18 21:46:03,171 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2023-11-18 21:46:08,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=423513.3333333333, ans=0.0 2023-11-18 21:46:09,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=15.0 2023-11-18 21:46:12,587 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.17 vs. limit=15.0 2023-11-18 21:46:17,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2023-11-18 21:46:29,914 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.592e+01 8.844e+01 9.699e+01 1.055e+02 1.387e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-18 21:46:41,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=423713.3333333333, ans=0.125 2023-11-18 21:46:47,949 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3450, loss[loss=0.0923, simple_loss=0.1163, pruned_loss=0.02553, audio_tagging_loss=0.008644, over 15388.00 frames. ], tot_loss[loss=0.09813, simple_loss=0.1144, pruned_loss=0.02978, audio_tagging_loss=0.01113, over 3055720.36 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:46:53,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=423780.0, ans=0.04949747468305833 2023-11-18 21:46:59,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=423846.6666666667, ans=0.125 2023-11-18 21:47:28,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=423980.0, ans=0.125 2023-11-18 21:47:33,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=22.5 2023-11-18 21:47:42,032 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2023-11-18 21:47:44,696 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3500, loss[loss=0.1229, simple_loss=0.1472, pruned_loss=0.03863, audio_tagging_loss=0.01069, over 15244.00 frames. ], tot_loss[loss=0.09827, simple_loss=0.1148, pruned_loss=0.0299, audio_tagging_loss=0.01097, over 3054527.50 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:47:52,334 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.75 vs. limit=15.0 2023-11-18 21:48:11,272 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:48:15,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=424246.6666666667, ans=0.0 2023-11-18 21:48:21,879 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 9.190e+01 1.062e+02 1.231e+02 1.599e+02, threshold=2.123e+02, percent-clipped=0.0 2023-11-18 21:48:40,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=424446.6666666667, ans=0.125 2023-11-18 21:48:41,143 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3550, loss[loss=0.1058, simple_loss=0.127, pruned_loss=0.03189, audio_tagging_loss=0.01042, over 14866.00 frames. ], tot_loss[loss=0.09789, simple_loss=0.1144, pruned_loss=0.0298, audio_tagging_loss=0.01091, over 3056525.22 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:49:11,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2023-11-18 21:49:12,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=424580.0, ans=0.1 2023-11-18 21:49:14,199 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=22.5 2023-11-18 21:49:27,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=424713.3333333333, ans=0.125 2023-11-18 21:49:29,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=424713.3333333333, ans=0.125 2023-11-18 21:49:30,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=424713.3333333333, ans=0.125 2023-11-18 21:49:36,613 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3600, loss[loss=0.09513, simple_loss=0.1043, pruned_loss=0.03087, audio_tagging_loss=0.01213, over 14816.00 frames. ], tot_loss[loss=0.0988, simple_loss=0.1156, pruned_loss=0.03005, audio_tagging_loss=0.01097, over 3060050.88 frames. ], batch size: 57, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:49:49,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=424846.6666666667, ans=0.125 2023-11-18 21:50:03,284 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.53 vs. limit=10.0 2023-11-18 21:50:04,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=424913.3333333333, ans=0.1 2023-11-18 21:50:12,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=424980.0, ans=0.0 2023-11-18 21:50:13,743 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 9.290e+01 1.027e+02 1.176e+02 1.572e+02, threshold=2.055e+02, percent-clipped=0.0 2023-11-18 21:50:19,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=424980.0, ans=0.0 2023-11-18 21:50:30,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=12.0 2023-11-18 21:50:33,094 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3650, loss[loss=0.09857, simple_loss=0.1182, pruned_loss=0.02915, audio_tagging_loss=0.01033, over 15001.00 frames. ], tot_loss[loss=0.09906, simple_loss=0.116, pruned_loss=0.03019, audio_tagging_loss=0.01088, over 3058688.57 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:50:39,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=425113.3333333333, ans=0.1 2023-11-18 21:51:17,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=425380.0, ans=0.0 2023-11-18 21:51:29,223 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3700, loss[loss=0.07844, simple_loss=0.09393, pruned_loss=0.02224, audio_tagging_loss=0.009228, over 16441.00 frames. ], tot_loss[loss=0.09834, simple_loss=0.1148, pruned_loss=0.02997, audio_tagging_loss=0.01099, over 3059288.71 frames. ], batch size: 61, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:51:29,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=425446.6666666667, ans=0.125 2023-11-18 21:51:31,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=425446.6666666667, ans=0.125 2023-11-18 21:52:06,437 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.585e+01 8.991e+01 9.791e+01 1.095e+02 1.443e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-18 21:52:17,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=425713.3333333333, ans=0.0 2023-11-18 21:52:25,160 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3750, loss[loss=0.09688, simple_loss=0.09629, pruned_loss=0.03635, audio_tagging_loss=0.01238, over 15829.00 frames. ], tot_loss[loss=0.09855, simple_loss=0.1148, pruned_loss=0.0301, audio_tagging_loss=0.01105, over 3060926.37 frames. ], batch size: 59, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:52:40,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=425846.6666666667, ans=0.0 2023-11-18 21:52:51,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=425913.3333333333, ans=0.125 2023-11-18 21:53:02,240 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:53:02,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=425980.0, ans=0.125 2023-11-18 21:53:21,306 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3800, loss[loss=0.05477, simple_loss=0.05908, pruned_loss=0.01151, audio_tagging_loss=0.01372, over 14936.00 frames. ], tot_loss[loss=0.0991, simple_loss=0.1156, pruned_loss=0.03018, audio_tagging_loss=0.0111, over 3062257.41 frames. ], batch size: 58, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:53:29,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=426113.3333333333, ans=0.125 2023-11-18 21:53:31,920 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2023-11-18 21:53:32,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=426180.0, ans=0.125 2023-11-18 21:53:57,909 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.759e+01 9.502e+01 1.058e+02 1.503e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-18 21:54:01,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=426313.3333333333, ans=0.125 2023-11-18 21:54:03,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=426313.3333333333, ans=0.125 2023-11-18 21:54:16,855 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3850, loss[loss=0.1244, simple_loss=0.1568, pruned_loss=0.03711, audio_tagging_loss=0.00889, over 15080.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.1172, pruned_loss=0.03064, audio_tagging_loss=0.01113, over 3056280.17 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:54:28,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2023-11-18 21:54:32,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=426513.3333333333, ans=0.0 2023-11-18 21:54:38,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=426580.0, ans=0.2 2023-11-18 21:54:38,888 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2023-11-18 21:54:45,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=426580.0, ans=0.0 2023-11-18 21:54:48,892 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2023-11-18 21:54:59,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.73 vs. limit=10.0 2023-11-18 21:55:14,860 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3900, loss[loss=0.1074, simple_loss=0.121, pruned_loss=0.03572, audio_tagging_loss=0.01123, over 15416.00 frames. ], tot_loss[loss=0.09989, simple_loss=0.1161, pruned_loss=0.03063, audio_tagging_loss=0.01121, over 3044564.47 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:55:18,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=426780.0, ans=0.0 2023-11-18 21:55:48,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=426980.0, ans=0.1 2023-11-18 21:55:51,458 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.999e+01 9.434e+01 1.040e+02 1.132e+02 1.500e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-18 21:55:53,099 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.91 vs. limit=15.0 2023-11-18 21:56:04,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=427046.6666666667, ans=0.125 2023-11-18 21:56:05,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=427046.6666666667, ans=0.125 2023-11-18 21:56:10,953 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 3950, loss[loss=0.1142, simple_loss=0.1334, pruned_loss=0.03583, audio_tagging_loss=0.01166, over 15413.00 frames. ], tot_loss[loss=0.09989, simple_loss=0.1158, pruned_loss=0.03062, audio_tagging_loss=0.01136, over 3037070.49 frames. ], batch size: 58, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:56:27,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=427180.0, ans=0.125 2023-11-18 21:57:07,301 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4000, loss[loss=0.1133, simple_loss=0.1228, pruned_loss=0.04183, audio_tagging_loss=0.01001, over 13990.00 frames. ], tot_loss[loss=0.09961, simple_loss=0.1154, pruned_loss=0.03048, audio_tagging_loss=0.01141, over 3037027.02 frames. ], batch size: 53, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:57:11,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2023-11-18 21:57:16,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=427446.6666666667, ans=0.1 2023-11-18 21:57:25,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=427513.3333333333, ans=0.2 2023-11-18 21:57:27,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=427513.3333333333, ans=0.1 2023-11-18 21:57:43,816 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.662e+01 9.312e+01 1.008e+02 1.147e+02 1.511e+02, threshold=2.016e+02, percent-clipped=0.0 2023-11-18 21:57:51,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.04 vs. limit=15.0 2023-11-18 21:58:01,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=427780.0, ans=0.125 2023-11-18 21:58:02,414 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4050, loss[loss=0.1323, simple_loss=0.1513, pruned_loss=0.04647, audio_tagging_loss=0.01017, over 15532.00 frames. ], tot_loss[loss=0.09948, simple_loss=0.1156, pruned_loss=0.03023, audio_tagging_loss=0.01146, over 3042990.77 frames. ], batch size: 58, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:58:03,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.28 vs. limit=15.0 2023-11-18 21:58:03,491 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:58:17,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=427846.6666666667, ans=0.125 2023-11-18 21:58:27,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=427913.3333333333, ans=0.2 2023-11-18 21:58:37,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=427980.0, ans=0.2 2023-11-18 21:58:39,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=22.5 2023-11-18 21:58:39,440 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.16 vs. limit=15.0 2023-11-18 21:58:59,665 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4100, loss[loss=0.1266, simple_loss=0.1566, pruned_loss=0.04204, audio_tagging_loss=0.006232, over 15718.00 frames. ], tot_loss[loss=0.09901, simple_loss=0.115, pruned_loss=0.03008, audio_tagging_loss=0.01142, over 3039772.44 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:59:08,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=428113.3333333333, ans=0.125 2023-11-18 21:59:20,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=428246.6666666667, ans=0.2 2023-11-18 21:59:20,687 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=428246.6666666667, ans=0.125 2023-11-18 21:59:25,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2023-11-18 21:59:35,731 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 8.951e+01 9.681e+01 1.090e+02 3.452e+02, threshold=1.936e+02, percent-clipped=1.0 2023-11-18 21:59:42,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=428313.3333333333, ans=0.1 2023-11-18 21:59:55,347 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4150, loss[loss=0.09646, simple_loss=0.1096, pruned_loss=0.02814, audio_tagging_loss=0.01353, over 15931.00 frames. ], tot_loss[loss=0.09919, simple_loss=0.1155, pruned_loss=0.03014, audio_tagging_loss=0.0113, over 3049222.80 frames. ], batch size: 59, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:59:58,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=428446.6666666667, ans=0.125 2023-11-18 22:00:01,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.96 vs. limit=15.0 2023-11-18 22:00:21,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=428580.0, ans=0.0 2023-11-18 22:00:34,704 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:00:35,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2023-11-18 22:00:50,539 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4200, loss[loss=0.06037, simple_loss=0.05728, pruned_loss=0.01493, audio_tagging_loss=0.01681, over 14175.00 frames. ], tot_loss[loss=0.09879, simple_loss=0.1151, pruned_loss=0.03011, audio_tagging_loss=0.01116, over 3049002.71 frames. ], batch size: 59, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:01:12,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=428913.3333333333, ans=0.0 2023-11-18 22:01:27,562 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 8.609e+01 9.394e+01 1.081e+02 1.374e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-18 22:01:46,214 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4250, loss[loss=0.09519, simple_loss=0.1189, pruned_loss=0.0273, audio_tagging_loss=0.00845, over 14775.00 frames. ], tot_loss[loss=0.09815, simple_loss=0.1142, pruned_loss=0.02988, audio_tagging_loss=0.01118, over 3047368.89 frames. ], batch size: 54, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:01:59,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=429180.0, ans=0.125 2023-11-18 22:02:03,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=429180.0, ans=0.125 2023-11-18 22:02:25,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=429313.3333333333, ans=0.1 2023-11-18 22:02:32,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=429380.0, ans=0.07 2023-11-18 22:02:34,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=429380.0, ans=0.125 2023-11-18 22:02:43,269 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4300, loss[loss=0.1209, simple_loss=0.1604, pruned_loss=0.03476, audio_tagging_loss=0.005946, over 14803.00 frames. ], tot_loss[loss=0.09749, simple_loss=0.1136, pruned_loss=0.02956, audio_tagging_loss=0.01112, over 3048267.54 frames. ], batch size: 54, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:02:58,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=429513.3333333333, ans=0.125 2023-11-18 22:03:02,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=429513.3333333333, ans=0.1 2023-11-18 22:03:06,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=429580.0, ans=0.1 2023-11-18 22:03:20,038 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 9.239e+01 1.003e+02 1.122e+02 1.597e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-18 22:03:20,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=429646.6666666667, ans=0.125 2023-11-18 22:03:38,791 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4350, loss[loss=0.09177, simple_loss=0.1092, pruned_loss=0.02689, audio_tagging_loss=0.01031, over 15524.00 frames. ], tot_loss[loss=0.09887, simple_loss=0.1153, pruned_loss=0.03004, audio_tagging_loss=0.01117, over 3050174.29 frames. ], batch size: 61, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:03:41,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=429780.0, ans=0.125 2023-11-18 22:03:59,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=429846.6666666667, ans=0.125 2023-11-18 22:04:14,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=429980.0, ans=0.1 2023-11-18 22:04:18,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=429980.0, ans=0.09899494936611666 2023-11-18 22:04:34,586 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4400, loss[loss=0.1021, simple_loss=0.1238, pruned_loss=0.02886, audio_tagging_loss=0.01135, over 14795.00 frames. ], tot_loss[loss=0.09902, simple_loss=0.116, pruned_loss=0.02997, audio_tagging_loss=0.01106, over 3059320.22 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:04:38,731 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.21 vs. limit=15.0 2023-11-18 22:04:43,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2023-11-18 22:05:04,600 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.63 vs. limit=15.0 2023-11-18 22:05:11,341 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.875e+01 9.886e+01 1.073e+02 1.418e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-18 22:05:30,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=430446.6666666667, ans=0.0 2023-11-18 22:05:31,701 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4450, loss[loss=0.07551, simple_loss=0.08564, pruned_loss=0.02012, audio_tagging_loss=0.01257, over 14771.00 frames. ], tot_loss[loss=0.09857, simple_loss=0.1154, pruned_loss=0.02986, audio_tagging_loss=0.011, over 3052576.76 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:05:32,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=430446.6666666667, ans=0.125 2023-11-18 22:05:35,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=430446.6666666667, ans=0.125 2023-11-18 22:05:41,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=15.0 2023-11-18 22:06:01,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=430580.0, ans=0.2 2023-11-18 22:06:04,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.87 vs. limit=10.0 2023-11-18 22:06:08,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=430646.6666666667, ans=0.0 2023-11-18 22:06:15,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=430713.3333333333, ans=0.125 2023-11-18 22:06:26,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=430780.0, ans=0.2 2023-11-18 22:06:26,827 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4500, loss[loss=0.08815, simple_loss=0.1034, pruned_loss=0.02538, audio_tagging_loss=0.01106, over 15149.00 frames. ], tot_loss[loss=0.0985, simple_loss=0.1156, pruned_loss=0.02973, audio_tagging_loss=0.01097, over 3053216.09 frames. ], batch size: 58, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:06:29,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=15.0 2023-11-18 22:06:46,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=430846.6666666667, ans=0.0 2023-11-18 22:06:50,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=430913.3333333333, ans=0.1 2023-11-18 22:06:56,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=430913.3333333333, ans=0.07 2023-11-18 22:07:03,970 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.610e+01 9.076e+01 9.936e+01 1.124e+02 1.630e+02, threshold=1.987e+02, percent-clipped=0.0 2023-11-18 22:07:05,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=430980.0, ans=0.0 2023-11-18 22:07:16,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=431046.6666666667, ans=0.125 2023-11-18 22:07:18,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.67 vs. limit=22.5 2023-11-18 22:07:22,444 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4550, loss[loss=0.0876, simple_loss=0.109, pruned_loss=0.02347, audio_tagging_loss=0.009627, over 16300.00 frames. ], tot_loss[loss=0.09898, simple_loss=0.1159, pruned_loss=0.02995, audio_tagging_loss=0.01106, over 3050629.76 frames. ], batch size: 60, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:07:35,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.02 vs. limit=12.0 2023-11-18 22:07:36,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=431180.0, ans=0.2 2023-11-18 22:07:50,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=431246.6666666667, ans=0.1 2023-11-18 22:07:53,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2023-11-18 22:08:02,176 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:08:18,468 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4600, loss[loss=0.109, simple_loss=0.1406, pruned_loss=0.03232, audio_tagging_loss=0.006419, over 15056.00 frames. ], tot_loss[loss=0.09795, simple_loss=0.1142, pruned_loss=0.02966, audio_tagging_loss=0.01117, over 3045688.17 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:08:22,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=431446.6666666667, ans=0.1 2023-11-18 22:08:55,073 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.959e+01 9.865e+01 1.112e+02 1.512e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 22:09:12,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=431713.3333333333, ans=0.1 2023-11-18 22:09:14,279 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4650, loss[loss=0.09101, simple_loss=0.1082, pruned_loss=0.02717, audio_tagging_loss=0.009751, over 14993.00 frames. ], tot_loss[loss=0.09873, simple_loss=0.1152, pruned_loss=0.02997, audio_tagging_loss=0.01117, over 3042079.20 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:09:35,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=431913.3333333333, ans=0.035 2023-11-18 22:09:38,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=431913.3333333333, ans=0.1 2023-11-18 22:09:52,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=431980.0, ans=0.2 2023-11-18 22:09:58,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=432046.6666666667, ans=0.0 2023-11-18 22:10:03,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2023-11-18 22:10:09,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=432113.3333333333, ans=0.125 2023-11-18 22:10:09,897 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4700, loss[loss=0.09745, simple_loss=0.1215, pruned_loss=0.02766, audio_tagging_loss=0.00904, over 16388.00 frames. ], tot_loss[loss=0.09913, simple_loss=0.1153, pruned_loss=0.03018, audio_tagging_loss=0.0113, over 3042818.79 frames. ], batch size: 61, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:10:10,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=432113.3333333333, ans=0.0 2023-11-18 22:10:11,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=432113.3333333333, ans=0.125 2023-11-18 22:10:21,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=432180.0, ans=0.1 2023-11-18 22:10:31,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=432246.6666666667, ans=0.0 2023-11-18 22:10:32,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=432246.6666666667, ans=0.0 2023-11-18 22:10:36,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=432246.6666666667, ans=0.125 2023-11-18 22:10:36,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=432246.6666666667, ans=0.125 2023-11-18 22:10:39,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=432246.6666666667, ans=0.125 2023-11-18 22:10:45,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=432313.3333333333, ans=0.125 2023-11-18 22:10:45,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=432313.3333333333, ans=0.125 2023-11-18 22:10:47,010 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.805e+01 9.825e+01 1.121e+02 1.529e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-18 22:10:48,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=432313.3333333333, ans=0.125 2023-11-18 22:11:06,127 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4750, loss[loss=0.0937, simple_loss=0.1113, pruned_loss=0.02712, audio_tagging_loss=0.01092, over 15010.00 frames. ], tot_loss[loss=0.09965, simple_loss=0.116, pruned_loss=0.03034, audio_tagging_loss=0.01133, over 3043570.47 frames. ], batch size: 57, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:11:15,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-11-18 22:11:27,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=432580.0, ans=0.125 2023-11-18 22:11:36,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=432580.0, ans=0.1 2023-11-18 22:11:48,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=15.0 2023-11-18 22:11:54,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=15.0 2023-11-18 22:12:02,266 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4800, loss[loss=0.06911, simple_loss=0.07403, pruned_loss=0.01813, audio_tagging_loss=0.01396, over 13977.00 frames. ], tot_loss[loss=0.0995, simple_loss=0.1154, pruned_loss=0.03033, audio_tagging_loss=0.01148, over 3046674.74 frames. ], batch size: 53, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:12:03,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.86 vs. limit=15.0 2023-11-18 22:12:29,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=432913.3333333333, ans=0.0 2023-11-18 22:12:39,893 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.105e+01 8.787e+01 9.594e+01 1.036e+02 1.388e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-18 22:12:53,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=433046.6666666667, ans=0.0 2023-11-18 22:12:57,424 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4850, loss[loss=0.09049, simple_loss=0.1011, pruned_loss=0.02843, audio_tagging_loss=0.01153, over 15039.00 frames. ], tot_loss[loss=0.09846, simple_loss=0.114, pruned_loss=0.02982, audio_tagging_loss=0.01165, over 3040275.11 frames. ], batch size: 58, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:12:57,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=433113.3333333333, ans=0.0 2023-11-18 22:13:12,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=433180.0, ans=0.025 2023-11-18 22:13:45,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2023-11-18 22:13:49,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=433380.0, ans=0.125 2023-11-18 22:13:53,997 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4900, loss[loss=0.07401, simple_loss=0.08193, pruned_loss=0.0214, audio_tagging_loss=0.01165, over 15701.00 frames. ], tot_loss[loss=0.09812, simple_loss=0.1135, pruned_loss=0.02969, audio_tagging_loss=0.01166, over 3037800.76 frames. ], batch size: 62, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:13:54,258 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.144e-03 2023-11-18 22:14:30,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=433646.6666666667, ans=0.1 2023-11-18 22:14:32,279 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.047e+01 8.759e+01 9.245e+01 1.025e+02 1.316e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-18 22:14:43,709 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:14:47,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=433713.3333333333, ans=0.025 2023-11-18 22:14:48,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=433713.3333333333, ans=0.1 2023-11-18 22:14:49,981 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 4950, loss[loss=0.1094, simple_loss=0.1267, pruned_loss=0.03467, audio_tagging_loss=0.01139, over 16466.00 frames. ], tot_loss[loss=0.09809, simple_loss=0.1137, pruned_loss=0.02983, audio_tagging_loss=0.01142, over 3039344.12 frames. ], batch size: 60, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:14:51,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=433780.0, ans=0.125 2023-11-18 22:15:04,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=433846.6666666667, ans=0.1 2023-11-18 22:15:45,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.04 vs. limit=22.5 2023-11-18 22:15:45,756 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5000, loss[loss=0.08677, simple_loss=0.1067, pruned_loss=0.02457, audio_tagging_loss=0.008832, over 15386.00 frames. ], tot_loss[loss=0.09707, simple_loss=0.1126, pruned_loss=0.02948, audio_tagging_loss=0.01128, over 3047189.71 frames. ], batch size: 58, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:15:48,433 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2023-11-18 22:15:53,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.87 vs. limit=15.0 2023-11-18 22:16:16,677 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2023-11-18 22:16:17,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=434246.6666666667, ans=0.125 2023-11-18 22:16:23,432 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.790e+01 9.696e+01 1.074e+02 1.675e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-18 22:16:29,494 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:16:42,007 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5050, loss[loss=0.126, simple_loss=0.1463, pruned_loss=0.04285, audio_tagging_loss=0.009946, over 14839.00 frames. ], tot_loss[loss=0.09729, simple_loss=0.113, pruned_loss=0.02961, audio_tagging_loss=0.01119, over 3047022.12 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:16:47,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=434446.6666666667, ans=0.125 2023-11-18 22:16:55,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=434513.3333333333, ans=0.0 2023-11-18 22:17:15,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=434646.6666666667, ans=0.2 2023-11-18 22:17:17,460 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.28 vs. limit=6.0 2023-11-18 22:17:28,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=434713.3333333333, ans=0.0 2023-11-18 22:17:38,175 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5100, loss[loss=0.09483, simple_loss=0.1141, pruned_loss=0.02681, audio_tagging_loss=0.01098, over 14809.00 frames. ], tot_loss[loss=0.09727, simple_loss=0.1132, pruned_loss=0.02956, audio_tagging_loss=0.01113, over 3047817.30 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:17:39,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=434780.0, ans=0.125 2023-11-18 22:17:43,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=434780.0, ans=0.09899494936611666 2023-11-18 22:17:47,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.42 vs. limit=15.0 2023-11-18 22:18:16,619 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.735e+01 9.607e+01 1.051e+02 1.879e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-18 22:18:30,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=435046.6666666667, ans=0.125 2023-11-18 22:18:30,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435046.6666666667, ans=0.1 2023-11-18 22:18:30,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2023-11-18 22:18:33,470 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5150, loss[loss=0.05189, simple_loss=0.05646, pruned_loss=0.01045, audio_tagging_loss=0.01321, over 14450.00 frames. ], tot_loss[loss=0.09653, simple_loss=0.1125, pruned_loss=0.02921, audio_tagging_loss=0.01109, over 3047028.68 frames. ], batch size: 57, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:18:51,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=435180.0, ans=0.2 2023-11-18 22:19:12,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=435313.3333333333, ans=0.2 2023-11-18 22:19:20,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=435380.0, ans=0.125 2023-11-18 22:19:26,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=435380.0, ans=0.125 2023-11-18 22:19:30,433 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5200, loss[loss=0.1074, simple_loss=0.1217, pruned_loss=0.03538, audio_tagging_loss=0.01115, over 15536.00 frames. ], tot_loss[loss=0.09728, simple_loss=0.1136, pruned_loss=0.02947, audio_tagging_loss=0.01099, over 3046183.41 frames. ], batch size: 59, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:19:38,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2023-11-18 22:19:51,802 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.44 vs. limit=10.0 2023-11-18 22:19:52,983 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.03 vs. limit=22.5 2023-11-18 22:20:04,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=435646.6666666667, ans=0.125 2023-11-18 22:20:05,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=435646.6666666667, ans=0.125 2023-11-18 22:20:09,695 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.828e+01 8.948e+01 9.784e+01 1.083e+02 1.629e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-18 22:20:10,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2023-11-18 22:20:14,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=435713.3333333333, ans=0.125 2023-11-18 22:20:23,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=435713.3333333333, ans=0.0 2023-11-18 22:20:25,530 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5250, loss[loss=0.1175, simple_loss=0.1337, pruned_loss=0.03891, audio_tagging_loss=0.01179, over 14409.00 frames. ], tot_loss[loss=0.098, simple_loss=0.1144, pruned_loss=0.02976, audio_tagging_loss=0.01102, over 3046721.51 frames. ], batch size: 55, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:20:41,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=435846.6666666667, ans=0.125 2023-11-18 22:20:42,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=435846.6666666667, ans=0.0 2023-11-18 22:20:47,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2023-11-18 22:20:56,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435913.3333333333, ans=0.1 2023-11-18 22:21:01,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=435980.0, ans=0.125 2023-11-18 22:21:12,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.84 vs. limit=15.0 2023-11-18 22:21:21,105 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5300, loss[loss=0.07527, simple_loss=0.08783, pruned_loss=0.02139, audio_tagging_loss=0.009973, over 15005.00 frames. ], tot_loss[loss=0.09823, simple_loss=0.1149, pruned_loss=0.02986, audio_tagging_loss=0.0109, over 3044706.16 frames. ], batch size: 55, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:21:24,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=436113.3333333333, ans=0.125 2023-11-18 22:21:30,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=436113.3333333333, ans=0.125 2023-11-18 22:21:33,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=436180.0, ans=0.07 2023-11-18 22:21:53,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=436246.6666666667, ans=0.1 2023-11-18 22:21:54,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=436313.3333333333, ans=0.1 2023-11-18 22:22:01,695 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.866e+01 8.659e+01 9.451e+01 1.050e+02 1.358e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-18 22:22:17,480 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5350, loss[loss=0.1006, simple_loss=0.109, pruned_loss=0.03536, audio_tagging_loss=0.01075, over 14741.00 frames. ], tot_loss[loss=0.0982, simple_loss=0.115, pruned_loss=0.02989, audio_tagging_loss=0.01082, over 3041112.07 frames. ], batch size: 57, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:22:26,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=436446.6666666667, ans=0.1 2023-11-18 22:22:29,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=436513.3333333333, ans=0.0 2023-11-18 22:22:47,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2023-11-18 22:22:49,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=436646.6666666667, ans=0.05 2023-11-18 22:22:49,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=436646.6666666667, ans=0.2 2023-11-18 22:23:13,372 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5400, loss[loss=0.07719, simple_loss=0.08623, pruned_loss=0.02283, audio_tagging_loss=0.01125, over 14566.00 frames. ], tot_loss[loss=0.09827, simple_loss=0.1148, pruned_loss=0.02987, audio_tagging_loss=0.011, over 3044453.55 frames. ], batch size: 57, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:23:26,291 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=436846.6666666667, ans=0.125 2023-11-18 22:23:27,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=436846.6666666667, ans=0.125 2023-11-18 22:23:34,182 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=12.0 2023-11-18 22:23:53,554 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.845e+01 9.112e+01 1.017e+02 1.141e+02 1.585e+02, threshold=2.034e+02, percent-clipped=0.0 2023-11-18 22:24:08,343 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5450, loss[loss=0.0779, simple_loss=0.0904, pruned_loss=0.02034, audio_tagging_loss=0.01237, over 13987.00 frames. ], tot_loss[loss=0.0988, simple_loss=0.1153, pruned_loss=0.03015, audio_tagging_loss=0.01102, over 3041576.23 frames. ], batch size: 52, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:24:36,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=437246.6666666667, ans=0.0 2023-11-18 22:24:40,561 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:24:59,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.95 vs. limit=15.0 2023-11-18 22:25:02,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=437380.0, ans=0.0 2023-11-18 22:25:04,300 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5500, loss[loss=0.1056, simple_loss=0.1226, pruned_loss=0.0344, audio_tagging_loss=0.009863, over 16307.00 frames. ], tot_loss[loss=0.09905, simple_loss=0.1155, pruned_loss=0.03015, audio_tagging_loss=0.01117, over 3048427.31 frames. ], batch size: 59, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:25:13,616 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=8.0 2023-11-18 22:25:23,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.48 vs. limit=15.0 2023-11-18 22:25:27,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=437580.0, ans=0.05 2023-11-18 22:25:35,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437580.0, ans=0.1 2023-11-18 22:25:41,386 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.85 vs. limit=10.0 2023-11-18 22:25:43,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=437646.6666666667, ans=0.0 2023-11-18 22:25:44,447 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 8.778e+01 9.490e+01 1.043e+02 1.354e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-18 22:25:47,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=437713.3333333333, ans=0.125 2023-11-18 22:25:53,677 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:26:00,858 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5550, loss[loss=0.08316, simple_loss=0.09756, pruned_loss=0.0237, audio_tagging_loss=0.01068, over 15876.00 frames. ], tot_loss[loss=0.09876, simple_loss=0.1153, pruned_loss=0.02992, audio_tagging_loss=0.01118, over 3044845.83 frames. ], batch size: 62, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:26:03,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=437780.0, ans=0.0 2023-11-18 22:26:16,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=437846.6666666667, ans=0.0 2023-11-18 22:26:17,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=437846.6666666667, ans=0.125 2023-11-18 22:26:45,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=438046.6666666667, ans=10.0 2023-11-18 22:26:47,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2023-11-18 22:26:49,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=438046.6666666667, ans=0.07 2023-11-18 22:26:49,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.54 vs. limit=6.0 2023-11-18 22:26:55,685 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5600, loss[loss=0.1167, simple_loss=0.1432, pruned_loss=0.03624, audio_tagging_loss=0.0088, over 15174.00 frames. ], tot_loss[loss=0.0986, simple_loss=0.115, pruned_loss=0.02982, audio_tagging_loss=0.01126, over 3046216.07 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:27:34,842 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:27:35,842 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.410e+01 9.319e+01 9.867e+01 1.109e+02 1.388e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 22:27:42,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=438380.0, ans=0.125 2023-11-18 22:27:51,226 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5650, loss[loss=0.09489, simple_loss=0.0964, pruned_loss=0.0304, audio_tagging_loss=0.01629, over 15936.00 frames. ], tot_loss[loss=0.09846, simple_loss=0.1149, pruned_loss=0.02967, audio_tagging_loss=0.01133, over 3046197.53 frames. ], batch size: 62, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:27:52,559 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:28:02,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=438513.3333333333, ans=0.0 2023-11-18 22:28:19,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=438580.0, ans=0.125 2023-11-18 22:28:26,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=438646.6666666667, ans=0.125 2023-11-18 22:28:41,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=438713.3333333333, ans=0.125 2023-11-18 22:28:46,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=438780.0, ans=0.125 2023-11-18 22:28:46,902 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5700, loss[loss=0.1089, simple_loss=0.1241, pruned_loss=0.03531, audio_tagging_loss=0.01152, over 15184.00 frames. ], tot_loss[loss=0.09766, simple_loss=0.1139, pruned_loss=0.02937, audio_tagging_loss=0.01136, over 3043367.54 frames. ], batch size: 58, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:28:55,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.41 vs. limit=15.0 2023-11-18 22:29:08,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2023-11-18 22:29:13,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=438913.3333333333, ans=0.2 2023-11-18 22:29:27,085 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.478e+01 8.890e+01 9.546e+01 1.053e+02 1.340e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-18 22:29:30,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=438980.0, ans=0.2 2023-11-18 22:29:43,000 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5750, loss[loss=0.08201, simple_loss=0.0995, pruned_loss=0.02149, audio_tagging_loss=0.01078, over 14592.00 frames. ], tot_loss[loss=0.09609, simple_loss=0.112, pruned_loss=0.02879, audio_tagging_loss=0.01128, over 3045751.31 frames. ], batch size: 54, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:30:18,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=439313.3333333333, ans=0.125 2023-11-18 22:30:20,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.44 vs. limit=10.0 2023-11-18 22:30:31,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=439380.0, ans=0.04949747468305833 2023-11-18 22:30:33,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=439380.0, ans=0.125 2023-11-18 22:30:37,466 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5800, loss[loss=0.07555, simple_loss=0.08749, pruned_loss=0.01873, audio_tagging_loss=0.01307, over 14953.00 frames. ], tot_loss[loss=0.09631, simple_loss=0.112, pruned_loss=0.02905, audio_tagging_loss=0.01126, over 3049604.23 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:30:44,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=439446.6666666667, ans=0.125 2023-11-18 22:30:46,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=439446.6666666667, ans=0.125 2023-11-18 22:30:50,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=439513.3333333333, ans=0.125 2023-11-18 22:31:11,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=439646.6666666667, ans=0.125 2023-11-18 22:31:17,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=439646.6666666667, ans=0.2 2023-11-18 22:31:17,839 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 8.421e+01 9.751e+01 1.061e+02 1.528e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-18 22:31:33,824 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5850, loss[loss=0.09454, simple_loss=0.1066, pruned_loss=0.03052, audio_tagging_loss=0.01073, over 15924.00 frames. ], tot_loss[loss=0.09606, simple_loss=0.1118, pruned_loss=0.02903, audio_tagging_loss=0.01114, over 3047123.79 frames. ], batch size: 60, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:31:50,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=439846.6666666667, ans=0.125 2023-11-18 22:31:57,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.52 vs. limit=22.5 2023-11-18 22:32:06,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-11-18 22:32:15,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.11 vs. limit=15.0 2023-11-18 22:32:22,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=440046.6666666667, ans=0.2 2023-11-18 22:32:29,695 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5900, loss[loss=0.08349, simple_loss=0.09844, pruned_loss=0.02237, audio_tagging_loss=0.0119, over 16093.00 frames. ], tot_loss[loss=0.09656, simple_loss=0.1127, pruned_loss=0.0292, audio_tagging_loss=0.01101, over 3049053.79 frames. ], batch size: 61, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:32:49,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=440180.0, ans=0.125 2023-11-18 22:33:09,415 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.910e+01 8.634e+01 9.404e+01 1.031e+02 1.635e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-18 22:33:14,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.51 vs. limit=10.0 2023-11-18 22:33:17,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=440380.0, ans=0.2 2023-11-18 22:33:25,046 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 5950, loss[loss=0.102, simple_loss=0.1089, pruned_loss=0.03524, audio_tagging_loss=0.01231, over 14784.00 frames. ], tot_loss[loss=0.09589, simple_loss=0.1118, pruned_loss=0.02883, audio_tagging_loss=0.01119, over 3052782.55 frames. ], batch size: 57, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:33:28,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=440446.6666666667, ans=0.125 2023-11-18 22:33:29,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=440446.6666666667, ans=0.125 2023-11-18 22:34:00,823 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2023-11-18 22:34:02,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=440646.6666666667, ans=0.125 2023-11-18 22:34:05,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=15.0 2023-11-18 22:34:08,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=440713.3333333333, ans=0.09899494936611666 2023-11-18 22:34:11,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=440713.3333333333, ans=0.2 2023-11-18 22:34:21,294 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6000, loss[loss=0.1149, simple_loss=0.1371, pruned_loss=0.03696, audio_tagging_loss=0.009384, over 16219.00 frames. ], tot_loss[loss=0.09737, simple_loss=0.1138, pruned_loss=0.02947, audio_tagging_loss=0.01102, over 3062439.19 frames. ], batch size: 58, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:34:21,295 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 22:34:36,960 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.2289, 3.9911, 3.9857, 3.5925, 4.1316, 4.2096, 4.4170, 4.2542], device='cuda:1') 2023-11-18 22:34:45,076 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1746, 3.9526, 4.2734, 4.3860], device='cuda:1') 2023-11-18 22:34:54,495 INFO [train_asr.py:1147] (1/4) Epoch 6, validation: loss=0.07034, simple_loss=0.0589, pruned_loss=0.008199, audio_tagging_loss=0.03269, over 4681554.00 frames. 2023-11-18 22:34:54,496 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 22:35:02,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=440780.0, ans=22.5 2023-11-18 22:35:33,138 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:35:34,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.702e+01 9.372e+01 1.021e+02 1.628e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-18 22:35:50,037 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6050, loss[loss=0.1242, simple_loss=0.1469, pruned_loss=0.03987, audio_tagging_loss=0.01091, over 15095.00 frames. ], tot_loss[loss=0.0974, simple_loss=0.1138, pruned_loss=0.02954, audio_tagging_loss=0.01095, over 3059726.45 frames. ], batch size: 55, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:36:15,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=441246.6666666667, ans=0.125 2023-11-18 22:36:17,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=441246.6666666667, ans=0.1 2023-11-18 22:36:33,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=441380.0, ans=0.125 2023-11-18 22:36:43,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=441380.0, ans=0.2 2023-11-18 22:36:46,620 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6100, loss[loss=0.07532, simple_loss=0.08175, pruned_loss=0.02199, audio_tagging_loss=0.01245, over 16245.00 frames. ], tot_loss[loss=0.09772, simple_loss=0.1144, pruned_loss=0.02967, audio_tagging_loss=0.01088, over 3058429.96 frames. ], batch size: 62, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:36:52,403 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.25 vs. limit=22.5 2023-11-18 22:36:53,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=441446.6666666667, ans=0.2 2023-11-18 22:36:59,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.36 vs. limit=15.0 2023-11-18 22:37:10,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=441580.0, ans=0.2 2023-11-18 22:37:13,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.16 vs. limit=22.5 2023-11-18 22:37:26,318 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.924e+01 9.623e+01 1.090e+02 1.421e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-18 22:37:41,749 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6150, loss[loss=0.1138, simple_loss=0.1267, pruned_loss=0.0353, audio_tagging_loss=0.01511, over 16010.00 frames. ], tot_loss[loss=0.09754, simple_loss=0.1143, pruned_loss=0.02955, audio_tagging_loss=0.01085, over 3047432.99 frames. ], batch size: 59, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:37:51,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=441780.0, ans=0.5 2023-11-18 22:37:58,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=441846.6666666667, ans=0.125 2023-11-18 22:38:01,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=441846.6666666667, ans=0.0 2023-11-18 22:38:15,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.26 vs. limit=15.0 2023-11-18 22:38:16,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=441980.0, ans=0.125 2023-11-18 22:38:37,185 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6200, loss[loss=0.08344, simple_loss=0.09405, pruned_loss=0.02515, audio_tagging_loss=0.01127, over 15156.00 frames. ], tot_loss[loss=0.09708, simple_loss=0.1137, pruned_loss=0.02931, audio_tagging_loss=0.01093, over 3048127.48 frames. ], batch size: 57, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:39:10,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=442313.3333333333, ans=0.1 2023-11-18 22:39:17,407 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.454e+01 8.898e+01 9.637e+01 1.062e+02 1.709e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-18 22:39:33,418 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6250, loss[loss=0.0993, simple_loss=0.1186, pruned_loss=0.02818, audio_tagging_loss=0.01182, over 15553.00 frames. ], tot_loss[loss=0.0975, simple_loss=0.1139, pruned_loss=0.02941, audio_tagging_loss=0.01112, over 3046511.38 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:39:36,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=442446.6666666667, ans=0.125 2023-11-18 22:39:38,492 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:39:54,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=442580.0, ans=0.0 2023-11-18 22:40:02,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=442580.0, ans=0.1 2023-11-18 22:40:11,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=442646.6666666667, ans=0.1 2023-11-18 22:40:29,553 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6300, loss[loss=0.07246, simple_loss=0.08049, pruned_loss=0.01951, audio_tagging_loss=0.0127, over 14819.00 frames. ], tot_loss[loss=0.09746, simple_loss=0.1136, pruned_loss=0.02937, audio_tagging_loss=0.01128, over 3044689.94 frames. ], batch size: 57, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:40:42,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=442846.6666666667, ans=0.125 2023-11-18 22:40:43,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=442846.6666666667, ans=0.125 2023-11-18 22:40:48,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=442846.6666666667, ans=0.1 2023-11-18 22:40:50,863 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.147e-02 2023-11-18 22:40:57,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=442913.3333333333, ans=0.1 2023-11-18 22:41:07,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=442980.0, ans=0.125 2023-11-18 22:41:08,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=442980.0, ans=0.025 2023-11-18 22:41:09,551 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.985e+01 9.817e+01 1.039e+02 1.348e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-18 22:41:24,954 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6350, loss[loss=0.08836, simple_loss=0.1029, pruned_loss=0.0263, audio_tagging_loss=0.01059, over 14852.00 frames. ], tot_loss[loss=0.09782, simple_loss=0.1135, pruned_loss=0.0296, audio_tagging_loss=0.01147, over 3050841.52 frames. ], batch size: 54, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:41:54,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=443246.6666666667, ans=0.125 2023-11-18 22:41:55,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=443246.6666666667, ans=0.125 2023-11-18 22:41:58,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=443313.3333333333, ans=0.1 2023-11-18 22:42:02,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2023-11-18 22:42:04,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=443313.3333333333, ans=0.125 2023-11-18 22:42:11,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.82 vs. limit=15.0 2023-11-18 22:42:17,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=443380.0, ans=0.0 2023-11-18 22:42:21,102 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6400, loss[loss=0.07686, simple_loss=0.08067, pruned_loss=0.02378, audio_tagging_loss=0.01274, over 14766.00 frames. ], tot_loss[loss=0.09661, simple_loss=0.1116, pruned_loss=0.0292, audio_tagging_loss=0.01164, over 3044724.67 frames. ], batch size: 59, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:42:26,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=443446.6666666667, ans=0.125 2023-11-18 22:42:56,091 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2023-11-18 22:42:56,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=443646.6666666667, ans=0.07 2023-11-18 22:43:01,452 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.672e+01 9.519e+01 1.064e+02 1.432e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-18 22:43:05,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=443713.3333333333, ans=0.0 2023-11-18 22:43:08,772 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:43:12,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=443713.3333333333, ans=10.0 2023-11-18 22:43:16,888 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6450, loss[loss=0.118, simple_loss=0.1428, pruned_loss=0.03506, audio_tagging_loss=0.01156, over 15808.00 frames. ], tot_loss[loss=0.09779, simple_loss=0.113, pruned_loss=0.0296, audio_tagging_loss=0.01167, over 3044365.40 frames. ], batch size: 57, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:43:17,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=443780.0, ans=0.125 2023-11-18 22:43:20,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=443780.0, ans=0.125 2023-11-18 22:43:33,541 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:43:38,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=443913.3333333333, ans=0.125 2023-11-18 22:43:42,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=443913.3333333333, ans=0.1 2023-11-18 22:44:12,490 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6500, loss[loss=0.1119, simple_loss=0.123, pruned_loss=0.03617, audio_tagging_loss=0.01424, over 15186.00 frames. ], tot_loss[loss=0.09782, simple_loss=0.1133, pruned_loss=0.02954, audio_tagging_loss=0.01164, over 3046112.96 frames. ], batch size: 58, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:44:33,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=444180.0, ans=0.2 2023-11-18 22:44:43,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=444246.6666666667, ans=0.1 2023-11-18 22:44:45,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=444313.3333333333, ans=0.0 2023-11-18 22:44:52,547 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.587e+01 9.452e+01 1.044e+02 1.613e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-18 22:44:54,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.85 vs. limit=22.5 2023-11-18 22:44:57,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=444380.0, ans=0.125 2023-11-18 22:45:09,096 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6550, loss[loss=0.132, simple_loss=0.1455, pruned_loss=0.04887, audio_tagging_loss=0.01034, over 14045.00 frames. ], tot_loss[loss=0.09804, simple_loss=0.1141, pruned_loss=0.02963, audio_tagging_loss=0.01136, over 3048807.67 frames. ], batch size: 53, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:45:10,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=444446.6666666667, ans=0.05 2023-11-18 22:45:22,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=444513.3333333333, ans=0.125 2023-11-18 22:45:38,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=444580.0, ans=0.0 2023-11-18 22:46:04,536 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6600, loss[loss=0.0876, simple_loss=0.101, pruned_loss=0.02531, audio_tagging_loss=0.0118, over 14671.00 frames. ], tot_loss[loss=0.09708, simple_loss=0.1132, pruned_loss=0.02925, audio_tagging_loss=0.01124, over 3042238.60 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:46:10,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2023-11-18 22:46:44,723 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 9.179e+01 9.898e+01 1.109e+02 1.412e+02, threshold=1.980e+02, percent-clipped=0.0 2023-11-18 22:46:49,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=445046.6666666667, ans=0.125 2023-11-18 22:46:59,490 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6650, loss[loss=0.1126, simple_loss=0.1244, pruned_loss=0.03765, audio_tagging_loss=0.0127, over 16593.00 frames. ], tot_loss[loss=0.09733, simple_loss=0.1135, pruned_loss=0.02939, audio_tagging_loss=0.01117, over 3048496.84 frames. ], batch size: 62, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:47:23,883 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.07 vs. limit=10.0 2023-11-18 22:47:54,893 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6700, loss[loss=0.07439, simple_loss=0.08789, pruned_loss=0.01972, audio_tagging_loss=0.01072, over 13288.00 frames. ], tot_loss[loss=0.09798, simple_loss=0.1147, pruned_loss=0.02964, audio_tagging_loss=0.011, over 3040425.01 frames. ], batch size: 54, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:47:57,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=445446.6666666667, ans=0.125 2023-11-18 22:47:58,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=445446.6666666667, ans=0.1 2023-11-18 22:48:07,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=445513.3333333333, ans=0.2 2023-11-18 22:48:30,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=15.0 2023-11-18 22:48:34,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=445646.6666666667, ans=0.0 2023-11-18 22:48:36,163 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 9.188e+01 9.958e+01 1.118e+02 1.458e+02, threshold=1.992e+02, percent-clipped=0.0 2023-11-18 22:48:45,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=445713.3333333333, ans=0.125 2023-11-18 22:48:51,578 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6750, loss[loss=0.1039, simple_loss=0.1251, pruned_loss=0.0307, audio_tagging_loss=0.01062, over 15600.00 frames. ], tot_loss[loss=0.09717, simple_loss=0.1137, pruned_loss=0.02939, audio_tagging_loss=0.01095, over 3043100.30 frames. ], batch size: 59, lr: 1.14e-02, grad_scale: 16.0 2023-11-18 22:49:01,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=445846.6666666667, ans=0.2 2023-11-18 22:49:18,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=445913.3333333333, ans=0.125 2023-11-18 22:49:22,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=445913.3333333333, ans=0.125 2023-11-18 22:49:24,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=445980.0, ans=0.125 2023-11-18 22:49:39,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=446046.6666666667, ans=0.1 2023-11-18 22:49:46,735 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6800, loss[loss=0.1217, simple_loss=0.1435, pruned_loss=0.04156, audio_tagging_loss=0.008416, over 15962.00 frames. ], tot_loss[loss=0.09821, simple_loss=0.1147, pruned_loss=0.02987, audio_tagging_loss=0.01097, over 3044760.68 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:49:59,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2023-11-18 22:50:06,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=446180.0, ans=0.125 2023-11-18 22:50:19,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=446313.3333333333, ans=0.04949747468305833 2023-11-18 22:50:25,446 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2023-11-18 22:50:27,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.920e+01 9.995e+01 1.137e+02 1.788e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-18 22:50:39,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=446380.0, ans=0.125 2023-11-18 22:50:40,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.32 vs. limit=22.5 2023-11-18 22:50:42,349 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6850, loss[loss=0.09074, simple_loss=0.1004, pruned_loss=0.02789, audio_tagging_loss=0.01263, over 15704.00 frames. ], tot_loss[loss=0.09828, simple_loss=0.1148, pruned_loss=0.0299, audio_tagging_loss=0.01096, over 3048912.19 frames. ], batch size: 58, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:50:53,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2023-11-18 22:51:01,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=446513.3333333333, ans=0.125 2023-11-18 22:51:03,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=446513.3333333333, ans=0.125 2023-11-18 22:51:11,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=446580.0, ans=0.1 2023-11-18 22:51:39,209 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6900, loss[loss=0.08575, simple_loss=0.09878, pruned_loss=0.02562, audio_tagging_loss=0.01074, over 15560.00 frames. ], tot_loss[loss=0.09823, simple_loss=0.1147, pruned_loss=0.02993, audio_tagging_loss=0.01097, over 3048040.59 frames. ], batch size: 58, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:51:41,429 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:51:45,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-11-18 22:51:54,777 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.67 vs. limit=22.5 2023-11-18 22:52:05,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=446913.3333333333, ans=0.0 2023-11-18 22:52:08,742 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.61 vs. limit=10.0 2023-11-18 22:52:10,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=446980.0, ans=0.2 2023-11-18 22:52:19,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=446980.0, ans=0.125 2023-11-18 22:52:19,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=446980.0, ans=0.0 2023-11-18 22:52:19,851 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 9.182e+01 9.955e+01 1.058e+02 1.430e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-18 22:52:20,940 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:52:21,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=446980.0, ans=0.2 2023-11-18 22:52:34,141 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 6950, loss[loss=0.0845, simple_loss=0.1034, pruned_loss=0.02378, audio_tagging_loss=0.009037, over 15099.00 frames. ], tot_loss[loss=0.09873, simple_loss=0.1154, pruned_loss=0.02999, audio_tagging_loss=0.01103, over 3049192.69 frames. ], batch size: 55, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:52:48,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=447180.0, ans=0.125 2023-11-18 22:53:02,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=447246.6666666667, ans=0.2 2023-11-18 22:53:09,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2023-11-18 22:53:29,857 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7000, loss[loss=0.09668, simple_loss=0.1188, pruned_loss=0.02847, audio_tagging_loss=0.008789, over 15002.00 frames. ], tot_loss[loss=0.09804, simple_loss=0.1145, pruned_loss=0.02972, audio_tagging_loss=0.01106, over 3044672.23 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:54:11,107 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.734e+01 9.498e+01 1.045e+02 1.881e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-18 22:54:12,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2023-11-18 22:54:25,869 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7050, loss[loss=0.1061, simple_loss=0.1252, pruned_loss=0.0303, audio_tagging_loss=0.01318, over 15876.00 frames. ], tot_loss[loss=0.09776, simple_loss=0.1139, pruned_loss=0.02959, audio_tagging_loss=0.01121, over 3046458.74 frames. ], batch size: 58, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:54:26,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=447780.0, ans=0.125 2023-11-18 22:54:31,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=447780.0, ans=0.125 2023-11-18 22:54:40,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=447846.6666666667, ans=0.2 2023-11-18 22:54:41,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=447846.6666666667, ans=0.2 2023-11-18 22:54:42,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.05 vs. limit=22.5 2023-11-18 22:54:52,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=447913.3333333333, ans=0.125 2023-11-18 22:55:21,692 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7100, loss[loss=0.08307, simple_loss=0.09424, pruned_loss=0.02133, audio_tagging_loss=0.01462, over 14818.00 frames. ], tot_loss[loss=0.09807, simple_loss=0.1143, pruned_loss=0.02966, audio_tagging_loss=0.01126, over 3045857.61 frames. ], batch size: 57, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:55:23,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=448113.3333333333, ans=0.0 2023-11-18 22:56:02,161 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=448313.3333333333, ans=0.04949747468305833 2023-11-18 22:56:02,997 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 9.011e+01 9.786e+01 1.101e+02 1.464e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-18 22:56:16,826 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7150, loss[loss=0.108, simple_loss=0.1293, pruned_loss=0.03166, audio_tagging_loss=0.01172, over 15410.00 frames. ], tot_loss[loss=0.09781, simple_loss=0.1141, pruned_loss=0.02945, audio_tagging_loss=0.0113, over 3045003.28 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:56:22,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=448446.6666666667, ans=0.125 2023-11-18 22:56:23,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=448446.6666666667, ans=0.1 2023-11-18 22:56:27,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448513.3333333333, ans=0.1 2023-11-18 22:56:31,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=22.5 2023-11-18 22:56:39,005 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=15.0 2023-11-18 22:56:43,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=448580.0, ans=0.1 2023-11-18 22:56:43,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.08 vs. limit=22.5 2023-11-18 22:56:45,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=448580.0, ans=0.0 2023-11-18 22:56:58,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=448646.6666666667, ans=0.0 2023-11-18 22:57:00,531 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2023-11-18 22:57:02,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=448713.3333333333, ans=0.125 2023-11-18 22:57:07,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=448713.3333333333, ans=0.125 2023-11-18 22:57:13,669 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7200, loss[loss=0.09331, simple_loss=0.1122, pruned_loss=0.02609, audio_tagging_loss=0.01111, over 15790.00 frames. ], tot_loss[loss=0.09706, simple_loss=0.1131, pruned_loss=0.02914, audio_tagging_loss=0.01139, over 3051414.67 frames. ], batch size: 61, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:57:26,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=448846.6666666667, ans=0.125 2023-11-18 22:57:54,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.744e+01 9.483e+01 1.054e+02 1.266e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-18 22:58:05,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=449046.6666666667, ans=0.125 2023-11-18 22:58:08,843 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7250, loss[loss=0.1353, simple_loss=0.1597, pruned_loss=0.04538, audio_tagging_loss=0.01003, over 16294.00 frames. ], tot_loss[loss=0.09818, simple_loss=0.1143, pruned_loss=0.02961, audio_tagging_loss=0.01142, over 3052483.94 frames. ], batch size: 59, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:58:14,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.70 vs. limit=10.0 2023-11-18 22:58:15,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=449113.3333333333, ans=0.0 2023-11-18 22:58:23,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=449180.0, ans=0.125 2023-11-18 22:58:30,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=449246.6666666667, ans=0.2 2023-11-18 22:58:32,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=449246.6666666667, ans=0.1 2023-11-18 22:58:45,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=449313.3333333333, ans=12.0 2023-11-18 22:58:54,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=449380.0, ans=0.0 2023-11-18 22:59:04,605 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7300, loss[loss=0.07757, simple_loss=0.09331, pruned_loss=0.02304, audio_tagging_loss=0.007874, over 16948.00 frames. ], tot_loss[loss=0.09772, simple_loss=0.1145, pruned_loss=0.02927, audio_tagging_loss=0.0112, over 3047583.11 frames. ], batch size: 66, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:59:17,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=449513.3333333333, ans=0.0 2023-11-18 22:59:32,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=449580.0, ans=0.1 2023-11-18 22:59:32,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=449580.0, ans=0.1 2023-11-18 22:59:34,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=449580.0, ans=0.125 2023-11-18 22:59:45,848 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 8.690e+01 9.830e+01 1.104e+02 1.354e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-18 22:59:59,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=449780.0, ans=0.0 2023-11-18 23:00:00,637 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7350, loss[loss=0.09244, simple_loss=0.1031, pruned_loss=0.02839, audio_tagging_loss=0.01248, over 15725.00 frames. ], tot_loss[loss=0.09766, simple_loss=0.1145, pruned_loss=0.02946, audio_tagging_loss=0.01096, over 3043073.00 frames. ], batch size: 60, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:00:03,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=449780.0, ans=0.125 2023-11-18 23:00:08,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=449780.0, ans=0.125 2023-11-18 23:00:14,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=449846.6666666667, ans=0.125 2023-11-18 23:00:23,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=449913.3333333333, ans=0.0 2023-11-18 23:00:25,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=449913.3333333333, ans=0.125 2023-11-18 23:00:45,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=450046.6666666667, ans=0.125 2023-11-18 23:00:55,968 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7400, loss[loss=0.1228, simple_loss=0.1365, pruned_loss=0.04453, audio_tagging_loss=0.01002, over 14791.00 frames. ], tot_loss[loss=0.09718, simple_loss=0.1143, pruned_loss=0.02923, audio_tagging_loss=0.01079, over 3041010.44 frames. ], batch size: 55, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:01:16,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2023-11-18 23:01:22,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=450246.6666666667, ans=0.125 2023-11-18 23:01:27,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2023-11-18 23:01:34,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=450313.3333333333, ans=0.125 2023-11-18 23:01:36,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=450313.3333333333, ans=0.025 2023-11-18 23:01:37,430 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 9.086e+01 1.005e+02 1.143e+02 1.555e+02, threshold=2.009e+02, percent-clipped=0.0 2023-11-18 23:01:38,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=450313.3333333333, ans=0.0 2023-11-18 23:01:42,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2023-11-18 23:01:51,873 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7450, loss[loss=0.1291, simple_loss=0.1503, pruned_loss=0.04353, audio_tagging_loss=0.01038, over 14868.00 frames. ], tot_loss[loss=0.09723, simple_loss=0.1143, pruned_loss=0.02924, audio_tagging_loss=0.01081, over 3035911.91 frames. ], batch size: 55, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:02:02,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=450513.3333333333, ans=0.0 2023-11-18 23:02:05,652 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.93 vs. limit=22.5 2023-11-18 23:02:13,463 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.50 vs. limit=10.0 2023-11-18 23:02:23,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=450580.0, ans=0.0 2023-11-18 23:02:29,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=450646.6666666667, ans=0.125 2023-11-18 23:02:41,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=450713.3333333333, ans=0.125 2023-11-18 23:02:47,572 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7500, loss[loss=0.1016, simple_loss=0.117, pruned_loss=0.03326, audio_tagging_loss=0.009821, over 15666.00 frames. ], tot_loss[loss=0.09774, simple_loss=0.115, pruned_loss=0.02953, audio_tagging_loss=0.01069, over 3040850.45 frames. ], batch size: 55, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:02:53,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=450780.0, ans=0.2 2023-11-18 23:03:09,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=450913.3333333333, ans=15.0 2023-11-18 23:03:30,364 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.742e+01 9.569e+01 1.067e+02 1.631e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-18 23:03:38,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=451046.6666666667, ans=0.0 2023-11-18 23:03:43,629 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7550, loss[loss=0.07307, simple_loss=0.07403, pruned_loss=0.02587, audio_tagging_loss=0.01019, over 14870.00 frames. ], tot_loss[loss=0.09635, simple_loss=0.1132, pruned_loss=0.02899, audio_tagging_loss=0.01075, over 3044072.74 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:04:17,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=451313.3333333333, ans=0.1 2023-11-18 23:04:24,656 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=451313.3333333333, ans=0.125 2023-11-18 23:04:25,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=451313.3333333333, ans=0.1 2023-11-18 23:04:30,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=451380.0, ans=0.0 2023-11-18 23:04:38,065 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7600, loss[loss=0.1332, simple_loss=0.1535, pruned_loss=0.04555, audio_tagging_loss=0.01088, over 14919.00 frames. ], tot_loss[loss=0.09592, simple_loss=0.1125, pruned_loss=0.02883, audio_tagging_loss=0.01085, over 3050778.51 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:04:39,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=451446.6666666667, ans=0.5 2023-11-18 23:04:54,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=451513.3333333333, ans=0.0 2023-11-18 23:04:56,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=451513.3333333333, ans=0.125 2023-11-18 23:04:57,269 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2023-11-18 23:05:05,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=451580.0, ans=0.125 2023-11-18 23:05:20,680 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.440e+01 8.942e+01 9.830e+01 1.127e+02 1.912e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-18 23:05:29,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=451713.3333333333, ans=0.125 2023-11-18 23:05:33,287 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7650, loss[loss=0.1197, simple_loss=0.1287, pruned_loss=0.04437, audio_tagging_loss=0.01099, over 15303.00 frames. ], tot_loss[loss=0.09611, simple_loss=0.1127, pruned_loss=0.02884, audio_tagging_loss=0.0109, over 3041779.26 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:06:20,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=452046.6666666667, ans=0.125 2023-11-18 23:06:29,830 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7700, loss[loss=0.1352, simple_loss=0.1602, pruned_loss=0.04592, audio_tagging_loss=0.009176, over 15482.00 frames. ], tot_loss[loss=0.09618, simple_loss=0.1132, pruned_loss=0.02871, audio_tagging_loss=0.01086, over 3045457.61 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:06:30,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=452113.3333333333, ans=0.0 2023-11-18 23:06:41,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=452180.0, ans=0.2 2023-11-18 23:06:47,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=452180.0, ans=0.5 2023-11-18 23:07:13,150 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.026e+01 8.646e+01 9.561e+01 1.050e+02 1.437e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-18 23:07:24,972 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7750, loss[loss=0.08521, simple_loss=0.08884, pruned_loss=0.02695, audio_tagging_loss=0.01383, over 14435.00 frames. ], tot_loss[loss=0.09658, simple_loss=0.1134, pruned_loss=0.02884, audio_tagging_loss=0.01104, over 3040724.26 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:07:45,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=452513.3333333333, ans=0.015 2023-11-18 23:07:59,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2023-11-18 23:08:07,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=452646.6666666667, ans=0.0 2023-11-18 23:08:07,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=452646.6666666667, ans=0.0 2023-11-18 23:08:20,991 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7800, loss[loss=0.09965, simple_loss=0.1071, pruned_loss=0.03337, audio_tagging_loss=0.01272, over 14639.00 frames. ], tot_loss[loss=0.09587, simple_loss=0.1121, pruned_loss=0.02872, audio_tagging_loss=0.01111, over 3039473.99 frames. ], batch size: 55, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:08:27,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=452780.0, ans=0.07 2023-11-18 23:08:32,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=452846.6666666667, ans=0.2 2023-11-18 23:08:45,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=452913.3333333333, ans=0.1 2023-11-18 23:08:50,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-11-18 23:09:00,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=452980.0, ans=0.125 2023-11-18 23:09:04,316 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.766e+01 9.659e+01 1.074e+02 1.731e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-18 23:09:09,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=453046.6666666667, ans=0.025 2023-11-18 23:09:17,034 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7850, loss[loss=0.07194, simple_loss=0.08949, pruned_loss=0.0161, audio_tagging_loss=0.01109, over 14621.00 frames. ], tot_loss[loss=0.09624, simple_loss=0.1122, pruned_loss=0.02887, audio_tagging_loss=0.01126, over 3029454.68 frames. ], batch size: 57, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:09:40,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=453246.6666666667, ans=0.2 2023-11-18 23:09:49,511 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-11-18 23:10:01,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.09 vs. limit=6.0 2023-11-18 23:10:14,380 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7900, loss[loss=0.144, simple_loss=0.1687, pruned_loss=0.05279, audio_tagging_loss=0.006855, over 15163.00 frames. ], tot_loss[loss=0.09731, simple_loss=0.1136, pruned_loss=0.02924, audio_tagging_loss=0.01126, over 3044305.84 frames. ], batch size: 54, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:10:27,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=453513.3333333333, ans=0.1 2023-11-18 23:10:47,308 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:10:50,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=453646.6666666667, ans=0.5 2023-11-18 23:10:57,621 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 8.743e+01 9.470e+01 1.071e+02 1.653e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-18 23:10:57,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=453713.3333333333, ans=0.1 2023-11-18 23:11:03,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=453713.3333333333, ans=0.1 2023-11-18 23:11:09,780 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 7950, loss[loss=0.09946, simple_loss=0.1152, pruned_loss=0.02621, audio_tagging_loss=0.01567, over 15228.00 frames. ], tot_loss[loss=0.09666, simple_loss=0.1125, pruned_loss=0.02901, audio_tagging_loss=0.01142, over 3042738.88 frames. ], batch size: 57, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:11:13,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=453780.0, ans=0.1 2023-11-18 23:11:16,382 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.85 vs. limit=10.0 2023-11-18 23:11:21,988 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-11-18 23:11:22,633 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:11:30,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=453846.6666666667, ans=0.125 2023-11-18 23:11:35,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=453913.3333333333, ans=0.125 2023-11-18 23:12:05,805 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8000, loss[loss=0.1197, simple_loss=0.1404, pruned_loss=0.04051, audio_tagging_loss=0.008997, over 14528.00 frames. ], tot_loss[loss=0.09626, simple_loss=0.1121, pruned_loss=0.02876, audio_tagging_loss=0.01148, over 3045960.30 frames. ], batch size: 53, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:12:15,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=454180.0, ans=0.0 2023-11-18 23:12:38,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=454313.3333333333, ans=0.1 2023-11-18 23:12:48,897 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.670e+01 9.475e+01 1.059e+02 1.539e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-18 23:13:00,536 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8050, loss[loss=0.07962, simple_loss=0.08588, pruned_loss=0.02516, audio_tagging_loss=0.01152, over 14761.00 frames. ], tot_loss[loss=0.09458, simple_loss=0.1098, pruned_loss=0.02803, audio_tagging_loss=0.01164, over 3053100.37 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:13:11,119 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2023-11-18 23:13:12,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=454513.3333333333, ans=0.125 2023-11-18 23:13:25,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=454580.0, ans=0.0 2023-11-18 23:13:31,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=454580.0, ans=0.125 2023-11-18 23:13:36,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.73 vs. limit=10.0 2023-11-18 23:13:36,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=454646.6666666667, ans=0.125 2023-11-18 23:13:37,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=454646.6666666667, ans=0.0 2023-11-18 23:13:56,072 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8100, loss[loss=0.1135, simple_loss=0.1282, pruned_loss=0.03978, audio_tagging_loss=0.009581, over 15787.00 frames. ], tot_loss[loss=0.09596, simple_loss=0.1114, pruned_loss=0.02882, audio_tagging_loss=0.01144, over 3059060.41 frames. ], batch size: 55, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:14:05,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2023-11-18 23:14:05,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=454780.0, ans=0.125 2023-11-18 23:14:08,058 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:14:25,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=15.0 2023-11-18 23:14:28,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=454980.0, ans=0.125 2023-11-18 23:14:33,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=454980.0, ans=0.2 2023-11-18 23:14:39,588 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.949e+01 9.117e+01 9.858e+01 1.091e+02 1.353e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-18 23:14:52,355 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8150, loss[loss=0.151, simple_loss=0.1734, pruned_loss=0.05487, audio_tagging_loss=0.00945, over 16126.00 frames. ], tot_loss[loss=0.09676, simple_loss=0.1124, pruned_loss=0.02928, audio_tagging_loss=0.0113, over 3055060.74 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:15:02,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=455180.0, ans=0.0 2023-11-18 23:15:03,131 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2023-11-18 23:15:12,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.66 vs. limit=22.5 2023-11-18 23:15:23,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=455246.6666666667, ans=0.0 2023-11-18 23:15:24,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=455313.3333333333, ans=0.0 2023-11-18 23:15:34,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=455313.3333333333, ans=0.1 2023-11-18 23:15:47,078 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:15:48,096 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8200, loss[loss=0.06499, simple_loss=0.08186, pruned_loss=0.01704, audio_tagging_loss=0.007022, over 14686.00 frames. ], tot_loss[loss=0.09714, simple_loss=0.1133, pruned_loss=0.02951, audio_tagging_loss=0.01098, over 3055217.71 frames. ], batch size: 55, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:16:14,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=455580.0, ans=0.125 2023-11-18 23:16:24,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=455646.6666666667, ans=0.0 2023-11-18 23:16:29,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.21 vs. limit=22.5 2023-11-18 23:16:30,915 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.905e+01 9.848e+01 1.096e+02 1.904e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-18 23:16:42,988 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8250, loss[loss=0.09222, simple_loss=0.111, pruned_loss=0.0243, audio_tagging_loss=0.01241, over 15246.00 frames. ], tot_loss[loss=0.09724, simple_loss=0.1136, pruned_loss=0.02958, audio_tagging_loss=0.01086, over 3049045.81 frames. ], batch size: 60, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:16:46,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=455780.0, ans=0.125 2023-11-18 23:17:08,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=455913.3333333333, ans=0.125 2023-11-18 23:17:18,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=455980.0, ans=0.1 2023-11-18 23:17:38,209 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8300, loss[loss=0.09742, simple_loss=0.119, pruned_loss=0.02897, audio_tagging_loss=0.00897, over 14575.00 frames. ], tot_loss[loss=0.09789, simple_loss=0.1149, pruned_loss=0.02963, audio_tagging_loss=0.01079, over 3052009.80 frames. ], batch size: 53, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:17:41,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=456113.3333333333, ans=0.0 2023-11-18 23:18:07,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2023-11-18 23:18:13,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=456313.3333333333, ans=0.0 2023-11-18 23:18:17,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=456313.3333333333, ans=0.125 2023-11-18 23:18:21,238 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.810e+01 9.840e+01 1.082e+02 1.589e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-18 23:18:33,352 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8350, loss[loss=0.1136, simple_loss=0.1301, pruned_loss=0.03704, audio_tagging_loss=0.01151, over 15995.00 frames. ], tot_loss[loss=0.09796, simple_loss=0.1153, pruned_loss=0.02955, audio_tagging_loss=0.01074, over 3048570.71 frames. ], batch size: 60, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:18:52,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=456513.3333333333, ans=0.0 2023-11-18 23:19:02,903 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2023-11-18 23:19:02,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-11-18 23:19:06,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=456646.6666666667, ans=0.0 2023-11-18 23:19:28,816 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8400, loss[loss=0.08835, simple_loss=0.1017, pruned_loss=0.02723, audio_tagging_loss=0.01028, over 15817.00 frames. ], tot_loss[loss=0.09773, simple_loss=0.1149, pruned_loss=0.0295, audio_tagging_loss=0.01077, over 3049991.41 frames. ], batch size: 62, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:20:12,050 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.924e+01 9.865e+01 1.104e+02 3.626e+02, threshold=1.973e+02, percent-clipped=1.0 2023-11-18 23:20:19,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.16 vs. limit=15.0 2023-11-18 23:20:24,656 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8450, loss[loss=0.1012, simple_loss=0.1259, pruned_loss=0.02817, audio_tagging_loss=0.01005, over 15880.00 frames. ], tot_loss[loss=0.09766, simple_loss=0.1147, pruned_loss=0.02947, audio_tagging_loss=0.01082, over 3052095.04 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:20:27,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=457113.3333333333, ans=0.125 2023-11-18 23:20:37,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=457180.0, ans=0.07 2023-11-18 23:21:02,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=457313.3333333333, ans=0.2 2023-11-18 23:21:19,984 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8500, loss[loss=0.0877, simple_loss=0.1018, pruned_loss=0.02562, audio_tagging_loss=0.01116, over 14723.00 frames. ], tot_loss[loss=0.09778, simple_loss=0.1147, pruned_loss=0.02959, audio_tagging_loss=0.01085, over 3060432.39 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:21:29,516 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.03 vs. limit=10.0 2023-11-18 23:21:30,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=457513.3333333333, ans=0.1 2023-11-18 23:21:36,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=457513.3333333333, ans=0.0 2023-11-18 23:21:37,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=457513.3333333333, ans=0.0 2023-11-18 23:21:41,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=457580.0, ans=0.0 2023-11-18 23:21:55,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.16 vs. limit=15.0 2023-11-18 23:21:58,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=457646.6666666667, ans=10.0 2023-11-18 23:22:04,503 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 8.613e+01 9.508e+01 1.037e+02 1.516e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-18 23:22:15,588 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8550, loss[loss=0.07885, simple_loss=0.09324, pruned_loss=0.0232, audio_tagging_loss=0.009029, over 15662.00 frames. ], tot_loss[loss=0.09666, simple_loss=0.1133, pruned_loss=0.02918, audio_tagging_loss=0.01082, over 3054569.88 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:22:24,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=457780.0, ans=0.0 2023-11-18 23:22:24,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=457780.0, ans=10.0 2023-11-18 23:22:29,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=457846.6666666667, ans=0.125 2023-11-18 23:22:36,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=457846.6666666667, ans=0.0 2023-11-18 23:23:05,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=458046.6666666667, ans=0.0 2023-11-18 23:23:11,601 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8600, loss[loss=0.07945, simple_loss=0.09183, pruned_loss=0.02245, audio_tagging_loss=0.01108, over 13451.00 frames. ], tot_loss[loss=0.09642, simple_loss=0.1127, pruned_loss=0.02902, audio_tagging_loss=0.01106, over 3056967.50 frames. ], batch size: 52, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:23:14,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=458113.3333333333, ans=0.125 2023-11-18 23:23:21,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=458180.0, ans=0.0 2023-11-18 23:23:38,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=458246.6666666667, ans=0.125 2023-11-18 23:23:44,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.00 vs. limit=22.5 2023-11-18 23:23:53,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=458313.3333333333, ans=0.0 2023-11-18 23:23:53,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=458313.3333333333, ans=0.1 2023-11-18 23:23:56,313 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.729e+01 9.610e+01 1.061e+02 1.458e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-18 23:24:03,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=458380.0, ans=0.125 2023-11-18 23:24:06,896 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8650, loss[loss=0.1079, simple_loss=0.1274, pruned_loss=0.03522, audio_tagging_loss=0.008948, over 17266.00 frames. ], tot_loss[loss=0.09704, simple_loss=0.1135, pruned_loss=0.02916, audio_tagging_loss=0.01112, over 3058822.14 frames. ], batch size: 62, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:24:10,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=458446.6666666667, ans=0.2 2023-11-18 23:24:12,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=458446.6666666667, ans=0.125 2023-11-18 23:24:24,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=458513.3333333333, ans=0.0 2023-11-18 23:24:44,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=458646.6666666667, ans=0.125 2023-11-18 23:24:47,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=458646.6666666667, ans=0.1 2023-11-18 23:24:54,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=458713.3333333333, ans=0.0 2023-11-18 23:24:59,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=458713.3333333333, ans=0.125 2023-11-18 23:25:02,862 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8700, loss[loss=0.09982, simple_loss=0.1221, pruned_loss=0.02765, audio_tagging_loss=0.0111, over 16055.00 frames. ], tot_loss[loss=0.09776, simple_loss=0.1143, pruned_loss=0.02942, audio_tagging_loss=0.0112, over 3050921.91 frames. ], batch size: 55, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:25:05,487 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.89 vs. limit=15.0 2023-11-18 23:25:21,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=458846.6666666667, ans=0.125 2023-11-18 23:25:27,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=458913.3333333333, ans=0.0 2023-11-18 23:25:47,298 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 9.228e+01 9.960e+01 1.094e+02 1.937e+02, threshold=1.992e+02, percent-clipped=1.0 2023-11-18 23:25:58,427 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8750, loss[loss=0.1217, simple_loss=0.1397, pruned_loss=0.03781, audio_tagging_loss=0.01403, over 15250.00 frames. ], tot_loss[loss=0.09889, simple_loss=0.1158, pruned_loss=0.02984, audio_tagging_loss=0.01113, over 3058326.51 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:26:00,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=459113.3333333333, ans=15.0 2023-11-18 23:26:11,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=459180.0, ans=0.125 2023-11-18 23:26:28,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=459246.6666666667, ans=0.2 2023-11-18 23:26:35,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=459313.3333333333, ans=0.02 2023-11-18 23:26:54,472 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8800, loss[loss=0.09553, simple_loss=0.1156, pruned_loss=0.02508, audio_tagging_loss=0.01266, over 16782.00 frames. ], tot_loss[loss=0.09961, simple_loss=0.1163, pruned_loss=0.03017, audio_tagging_loss=0.01128, over 3053080.37 frames. ], batch size: 62, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:27:32,899 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=15.0 2023-11-18 23:27:38,832 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.027e+01 8.792e+01 9.740e+01 1.071e+02 1.410e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-18 23:27:49,272 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8850, loss[loss=0.09103, simple_loss=0.1099, pruned_loss=0.02529, audio_tagging_loss=0.01076, over 14174.00 frames. ], tot_loss[loss=0.0995, simple_loss=0.1162, pruned_loss=0.03009, audio_tagging_loss=0.01132, over 3052576.39 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:27:50,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=459780.0, ans=0.2 2023-11-18 23:27:56,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=459780.0, ans=0.125 2023-11-18 23:27:58,829 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:27:59,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=459780.0, ans=0.125 2023-11-18 23:28:13,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=459913.3333333333, ans=0.09899494936611666 2023-11-18 23:28:28,254 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2023-11-18 23:28:43,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=460046.6666666667, ans=0.025 2023-11-18 23:28:45,807 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8900, loss[loss=0.11, simple_loss=0.1244, pruned_loss=0.03616, audio_tagging_loss=0.01166, over 14494.00 frames. ], tot_loss[loss=0.09911, simple_loss=0.1159, pruned_loss=0.02995, audio_tagging_loss=0.01122, over 3053387.56 frames. ], batch size: 55, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:28:46,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=460113.3333333333, ans=0.2 2023-11-18 23:28:47,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=460113.3333333333, ans=0.125 2023-11-18 23:28:57,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=460180.0, ans=0.125 2023-11-18 23:29:15,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=460246.6666666667, ans=0.0 2023-11-18 23:29:26,508 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.58 vs. limit=10.0 2023-11-18 23:29:30,090 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 9.040e+01 1.017e+02 1.156e+02 1.581e+02, threshold=2.035e+02, percent-clipped=0.0 2023-11-18 23:29:41,813 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 8950, loss[loss=0.09066, simple_loss=0.1092, pruned_loss=0.02649, audio_tagging_loss=0.009592, over 15712.00 frames. ], tot_loss[loss=0.0985, simple_loss=0.1154, pruned_loss=0.02978, audio_tagging_loss=0.01101, over 3047230.13 frames. ], batch size: 59, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:30:11,922 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.72 vs. limit=6.0 2023-11-18 23:30:15,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2023-11-18 23:30:36,560 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9000, loss[loss=0.08672, simple_loss=0.1012, pruned_loss=0.02455, audio_tagging_loss=0.01155, over 15600.00 frames. ], tot_loss[loss=0.09815, simple_loss=0.1149, pruned_loss=0.02973, audio_tagging_loss=0.01097, over 3056697.80 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:30:36,561 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-18 23:30:47,463 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.0357, 2.6832, 3.4910, 2.6694, 3.7542, 3.6595, 3.2527, 2.8612], device='cuda:1') 2023-11-18 23:30:47,944 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1370, 2.4331, 4.9836, 2.4743], device='cuda:1') 2023-11-18 23:30:56,266 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.9196, 1.7545, 2.6745, 2.8681, 2.4631, 2.8219, 2.2367, 2.8592], device='cuda:1') 2023-11-18 23:31:07,521 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.6675, 3.9494, 3.5415, 2.2155], device='cuda:1') 2023-11-18 23:31:09,099 INFO [train_asr.py:1147] (1/4) Epoch 6, validation: loss=0.07051, simple_loss=0.05865, pruned_loss=0.008039, audio_tagging_loss=0.03315, over 4681554.00 frames. 2023-11-18 23:31:09,100 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-18 23:31:32,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.86 vs. limit=15.0 2023-11-18 23:31:49,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.38 vs. limit=15.0 2023-11-18 23:31:53,345 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.759e+01 8.670e+01 9.700e+01 1.069e+02 1.408e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-18 23:31:58,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=461046.6666666667, ans=0.125 2023-11-18 23:32:04,557 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9050, loss[loss=0.09428, simple_loss=0.1148, pruned_loss=0.02665, audio_tagging_loss=0.01021, over 14520.00 frames. ], tot_loss[loss=0.09667, simple_loss=0.1134, pruned_loss=0.02905, audio_tagging_loss=0.01091, over 3052311.28 frames. ], batch size: 55, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:32:17,041 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.05 vs. limit=22.5 2023-11-18 23:32:17,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=461180.0, ans=0.125 2023-11-18 23:32:25,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=461246.6666666667, ans=0.0 2023-11-18 23:32:25,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=461246.6666666667, ans=0.2 2023-11-18 23:32:28,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=461246.6666666667, ans=0.125 2023-11-18 23:32:33,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.68 vs. limit=12.0 2023-11-18 23:32:43,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=461313.3333333333, ans=0.2 2023-11-18 23:32:47,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=461380.0, ans=0.125 2023-11-18 23:32:48,362 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.84 vs. limit=22.5 2023-11-18 23:32:57,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=461380.0, ans=0.0 2023-11-18 23:32:58,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.43 vs. limit=22.5 2023-11-18 23:32:59,355 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9100, loss[loss=0.1063, simple_loss=0.1265, pruned_loss=0.03236, audio_tagging_loss=0.01071, over 15724.00 frames. ], tot_loss[loss=0.09629, simple_loss=0.1134, pruned_loss=0.02875, audio_tagging_loss=0.01084, over 3051726.03 frames. ], batch size: 63, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:33:03,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=461446.6666666667, ans=0.0 2023-11-18 23:33:26,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=461580.0, ans=0.0 2023-11-18 23:33:27,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=461580.0, ans=0.0 2023-11-18 23:33:42,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=461713.3333333333, ans=0.0 2023-11-18 23:33:43,696 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.366e+01 8.817e+01 9.477e+01 1.033e+02 1.344e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-18 23:33:45,032 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:33:50,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=461713.3333333333, ans=0.0 2023-11-18 23:33:54,227 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2023-11-18 23:33:54,709 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9150, loss[loss=0.0916, simple_loss=0.1082, pruned_loss=0.02572, audio_tagging_loss=0.01179, over 15922.00 frames. ], tot_loss[loss=0.09704, simple_loss=0.1143, pruned_loss=0.02903, audio_tagging_loss=0.01084, over 3046676.80 frames. ], batch size: 59, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:33:56,760 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.20 vs. limit=12.0 2023-11-18 23:34:12,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=12.0 2023-11-18 23:34:13,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.72 vs. limit=10.0 2023-11-18 23:34:24,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=461913.3333333333, ans=0.1 2023-11-18 23:34:39,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=462046.6666666667, ans=0.2 2023-11-18 23:34:50,578 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9200, loss[loss=0.09496, simple_loss=0.1039, pruned_loss=0.03082, audio_tagging_loss=0.01218, over 15662.00 frames. ], tot_loss[loss=0.09705, simple_loss=0.114, pruned_loss=0.02914, audio_tagging_loss=0.01089, over 3044702.35 frames. ], batch size: 61, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:34:52,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=462113.3333333333, ans=0.2 2023-11-18 23:35:03,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=462180.0, ans=0.0 2023-11-18 23:35:04,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=462180.0, ans=0.125 2023-11-18 23:35:06,976 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.46 vs. limit=22.5 2023-11-18 23:35:11,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-11-18 23:35:30,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=462313.3333333333, ans=0.1 2023-11-18 23:35:30,985 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=462313.3333333333, ans=0.125 2023-11-18 23:35:33,914 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.421e+01 8.725e+01 9.517e+01 1.069e+02 1.464e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-18 23:35:34,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=462380.0, ans=0.125 2023-11-18 23:35:36,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=462380.0, ans=0.125 2023-11-18 23:35:38,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=462380.0, ans=0.125 2023-11-18 23:35:44,339 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9250, loss[loss=0.09606, simple_loss=0.1067, pruned_loss=0.03093, audio_tagging_loss=0.01181, over 15544.00 frames. ], tot_loss[loss=0.09746, simple_loss=0.1145, pruned_loss=0.02945, audio_tagging_loss=0.01074, over 3047189.89 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:36:19,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2023-11-18 23:36:22,963 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.13 vs. limit=15.0 2023-11-18 23:36:39,679 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9300, loss[loss=0.1209, simple_loss=0.1336, pruned_loss=0.0411, audio_tagging_loss=0.01304, over 13611.00 frames. ], tot_loss[loss=0.09712, simple_loss=0.1139, pruned_loss=0.02942, audio_tagging_loss=0.01077, over 3046120.17 frames. ], batch size: 52, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:36:40,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=462780.0, ans=0.125 2023-11-18 23:36:54,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=15.0 2023-11-18 23:36:58,389 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:37:01,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=462913.3333333333, ans=0.125 2023-11-18 23:37:15,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=462980.0, ans=0.0 2023-11-18 23:37:23,841 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.607e+01 9.427e+01 1.041e+02 1.346e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-18 23:37:28,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=463046.6666666667, ans=0.125 2023-11-18 23:37:34,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=463046.6666666667, ans=0.125 2023-11-18 23:37:35,918 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9350, loss[loss=0.1021, simple_loss=0.1121, pruned_loss=0.03304, audio_tagging_loss=0.01295, over 14574.00 frames. ], tot_loss[loss=0.09651, simple_loss=0.1131, pruned_loss=0.02905, audio_tagging_loss=0.01088, over 3040948.91 frames. ], batch size: 55, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:37:36,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=463113.3333333333, ans=0.125 2023-11-18 23:37:49,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=463180.0, ans=0.0 2023-11-18 23:38:04,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=463246.6666666667, ans=0.0 2023-11-18 23:38:14,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.63 vs. limit=15.0 2023-11-18 23:38:30,374 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9400, loss[loss=0.09849, simple_loss=0.1172, pruned_loss=0.03063, audio_tagging_loss=0.009243, over 15154.00 frames. ], tot_loss[loss=0.09707, simple_loss=0.1137, pruned_loss=0.02932, audio_tagging_loss=0.01087, over 3048465.39 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:38:34,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=463446.6666666667, ans=10.0 2023-11-18 23:38:36,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=463446.6666666667, ans=0.2 2023-11-18 23:38:56,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463580.0, ans=0.1 2023-11-18 23:39:06,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=463646.6666666667, ans=0.0 2023-11-18 23:39:07,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=463646.6666666667, ans=0.0 2023-11-18 23:39:15,430 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.264e+01 8.967e+01 9.961e+01 1.049e+02 1.500e+02, threshold=1.992e+02, percent-clipped=0.0 2023-11-18 23:39:19,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=463713.3333333333, ans=0.2 2023-11-18 23:39:21,675 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:39:25,410 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9450, loss[loss=0.1075, simple_loss=0.1281, pruned_loss=0.03339, audio_tagging_loss=0.01009, over 15689.00 frames. ], tot_loss[loss=0.09683, simple_loss=0.1131, pruned_loss=0.02925, audio_tagging_loss=0.01104, over 3046521.34 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:39:46,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463846.6666666667, ans=0.1 2023-11-18 23:39:54,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=463913.3333333333, ans=0.2 2023-11-18 23:40:00,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463980.0, ans=0.1 2023-11-18 23:40:06,514 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2023-11-18 23:40:12,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=464046.6666666667, ans=0.125 2023-11-18 23:40:13,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=464046.6666666667, ans=0.125 2023-11-18 23:40:13,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=464046.6666666667, ans=0.07 2023-11-18 23:40:21,506 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9500, loss[loss=0.0843, simple_loss=0.09835, pruned_loss=0.02612, audio_tagging_loss=0.009003, over 14910.00 frames. ], tot_loss[loss=0.09561, simple_loss=0.1113, pruned_loss=0.02881, audio_tagging_loss=0.01115, over 3040978.42 frames. ], batch size: 58, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:40:34,275 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.24 vs. limit=6.0 2023-11-18 23:40:38,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.61 vs. limit=10.0 2023-11-18 23:41:07,216 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 9.036e+01 9.803e+01 1.077e+02 1.985e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-18 23:41:10,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=464380.0, ans=0.025 2023-11-18 23:41:13,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=464380.0, ans=0.0 2023-11-18 23:41:17,268 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9550, loss[loss=0.07612, simple_loss=0.08118, pruned_loss=0.02035, audio_tagging_loss=0.01518, over 14583.00 frames. ], tot_loss[loss=0.09605, simple_loss=0.1118, pruned_loss=0.02884, audio_tagging_loss=0.0113, over 3053470.04 frames. ], batch size: 58, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:41:38,657 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.07 vs. limit=22.5 2023-11-18 23:41:50,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=464646.6666666667, ans=0.125 2023-11-18 23:41:57,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=464646.6666666667, ans=0.125 2023-11-18 23:41:59,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=464646.6666666667, ans=0.2 2023-11-18 23:42:00,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=464713.3333333333, ans=0.07 2023-11-18 23:42:12,592 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9600, loss[loss=0.07466, simple_loss=0.07558, pruned_loss=0.02411, audio_tagging_loss=0.01275, over 15330.00 frames. ], tot_loss[loss=0.09595, simple_loss=0.1116, pruned_loss=0.02879, audio_tagging_loss=0.01134, over 3050759.85 frames. ], batch size: 62, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:42:25,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=464846.6666666667, ans=0.0 2023-11-18 23:42:32,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=464846.6666666667, ans=0.0 2023-11-18 23:42:41,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=464913.3333333333, ans=0.04949747468305833 2023-11-18 23:42:44,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=464913.3333333333, ans=0.0 2023-11-18 23:42:48,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=464980.0, ans=0.5 2023-11-18 23:42:58,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 8.742e+01 9.765e+01 1.051e+02 1.389e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-18 23:42:58,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=465046.6666666667, ans=0.0 2023-11-18 23:43:09,310 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9650, loss[loss=0.09929, simple_loss=0.1096, pruned_loss=0.03064, audio_tagging_loss=0.01385, over 15646.00 frames. ], tot_loss[loss=0.09573, simple_loss=0.1114, pruned_loss=0.02869, audio_tagging_loss=0.01134, over 3048210.61 frames. ], batch size: 60, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:43:12,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=465113.3333333333, ans=0.125 2023-11-18 23:43:14,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=465113.3333333333, ans=0.125 2023-11-18 23:43:14,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=465113.3333333333, ans=0.0 2023-11-18 23:43:33,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.17 vs. limit=10.0 2023-11-18 23:43:35,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=465246.6666666667, ans=0.125 2023-11-18 23:43:43,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=465313.3333333333, ans=0.0 2023-11-18 23:43:54,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=465380.0, ans=0.0 2023-11-18 23:43:55,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465380.0, ans=0.1 2023-11-18 23:43:57,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=465380.0, ans=0.125 2023-11-18 23:44:02,885 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:44:04,665 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9700, loss[loss=0.109, simple_loss=0.1313, pruned_loss=0.03337, audio_tagging_loss=0.00998, over 15239.00 frames. ], tot_loss[loss=0.09584, simple_loss=0.1121, pruned_loss=0.02879, audio_tagging_loss=0.01101, over 3044949.43 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:44:15,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=465513.3333333333, ans=0.09899494936611666 2023-11-18 23:44:19,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465513.3333333333, ans=0.1 2023-11-18 23:44:20,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=465513.3333333333, ans=0.0 2023-11-18 23:44:41,485 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.07 vs. limit=15.0 2023-11-18 23:44:48,537 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.32 vs. limit=15.0 2023-11-18 23:44:50,718 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.590e+01 9.606e+01 1.115e+02 1.456e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-18 23:45:00,260 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9750, loss[loss=0.1199, simple_loss=0.1309, pruned_loss=0.04178, audio_tagging_loss=0.01265, over 15325.00 frames. ], tot_loss[loss=0.09576, simple_loss=0.112, pruned_loss=0.02879, audio_tagging_loss=0.01097, over 3040486.82 frames. ], batch size: 58, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:45:23,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=465913.3333333333, ans=0.125 2023-11-18 23:45:41,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465980.0, ans=0.1 2023-11-18 23:45:42,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=465980.0, ans=0.1 2023-11-18 23:45:57,117 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9800, loss[loss=0.1103, simple_loss=0.1329, pruned_loss=0.03422, audio_tagging_loss=0.009619, over 15978.00 frames. ], tot_loss[loss=0.09687, simple_loss=0.1135, pruned_loss=0.0292, audio_tagging_loss=0.01091, over 3040685.42 frames. ], batch size: 60, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:46:21,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=466246.6666666667, ans=0.125 2023-11-18 23:46:31,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=466313.3333333333, ans=0.125 2023-11-18 23:46:32,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=466313.3333333333, ans=0.1 2023-11-18 23:46:42,476 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.868e+01 8.589e+01 9.720e+01 1.056e+02 1.437e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-18 23:46:44,618 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:46:48,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2023-11-18 23:46:52,052 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9850, loss[loss=0.08947, simple_loss=0.1069, pruned_loss=0.02431, audio_tagging_loss=0.01173, over 14902.00 frames. ], tot_loss[loss=0.0976, simple_loss=0.1146, pruned_loss=0.02945, audio_tagging_loss=0.01084, over 3040963.27 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:47:27,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=466646.6666666667, ans=0.1 2023-11-18 23:47:47,556 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9900, loss[loss=0.0992, simple_loss=0.1245, pruned_loss=0.0299, audio_tagging_loss=0.007026, over 16322.00 frames. ], tot_loss[loss=0.09708, simple_loss=0.1146, pruned_loss=0.02916, audio_tagging_loss=0.01059, over 3036988.88 frames. ], batch size: 59, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:47:50,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=466780.0, ans=0.0 2023-11-18 23:47:56,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=466780.0, ans=0.1 2023-11-18 23:48:07,465 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=15.0 2023-11-18 23:48:14,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=466913.3333333333, ans=0.125 2023-11-18 23:48:27,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=466980.0, ans=0.0 2023-11-18 23:48:32,817 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.839e+01 9.469e+01 1.066e+02 1.468e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-18 23:48:39,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=467046.6666666667, ans=0.125 2023-11-18 23:48:43,395 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 9950, loss[loss=0.126, simple_loss=0.162, pruned_loss=0.03507, audio_tagging_loss=0.009976, over 15849.00 frames. ], tot_loss[loss=0.09725, simple_loss=0.1148, pruned_loss=0.02922, audio_tagging_loss=0.01065, over 3040858.45 frames. ], batch size: 58, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:48:51,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2023-11-18 23:48:52,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=467113.3333333333, ans=0.125 2023-11-18 23:48:53,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=467180.0, ans=0.2 2023-11-18 23:48:58,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=467180.0, ans=0.05 2023-11-18 23:49:06,480 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=22.5 2023-11-18 23:49:38,709 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10000, loss[loss=0.08922, simple_loss=0.1168, pruned_loss=0.02368, audio_tagging_loss=0.007164, over 14319.00 frames. ], tot_loss[loss=0.09713, simple_loss=0.1142, pruned_loss=0.02926, audio_tagging_loss=0.01079, over 3043820.48 frames. ], batch size: 54, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:49:45,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=467446.6666666667, ans=0.0 2023-11-18 23:49:48,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=467513.3333333333, ans=0.0 2023-11-18 23:49:54,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=467513.3333333333, ans=0.0 2023-11-18 23:50:09,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2023-11-18 23:50:16,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=467646.6666666667, ans=0.1 2023-11-18 23:50:23,828 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.767e+01 8.943e+01 9.834e+01 1.077e+02 1.357e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-18 23:50:33,363 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10050, loss[loss=0.1086, simple_loss=0.1308, pruned_loss=0.03246, audio_tagging_loss=0.01074, over 15162.00 frames. ], tot_loss[loss=0.09728, simple_loss=0.1142, pruned_loss=0.02936, audio_tagging_loss=0.01082, over 3047053.03 frames. ], batch size: 59, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:50:45,272 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.41 vs. limit=15.0 2023-11-18 23:50:47,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=467846.6666666667, ans=0.125 2023-11-18 23:51:19,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=468046.6666666667, ans=0.0 2023-11-18 23:51:28,608 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10100, loss[loss=0.1195, simple_loss=0.1476, pruned_loss=0.03859, audio_tagging_loss=0.007113, over 15656.00 frames. ], tot_loss[loss=0.09712, simple_loss=0.1138, pruned_loss=0.02918, audio_tagging_loss=0.01106, over 3050056.31 frames. ], batch size: 59, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:51:41,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=468180.0, ans=0.125 2023-11-18 23:51:54,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=468246.6666666667, ans=0.2 2023-11-18 23:51:59,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=468313.3333333333, ans=0.0 2023-11-18 23:52:00,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=468313.3333333333, ans=0.07 2023-11-18 23:52:03,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=468313.3333333333, ans=0.125 2023-11-18 23:52:08,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468313.3333333333, ans=0.1 2023-11-18 23:52:08,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.34 vs. limit=22.5 2023-11-18 23:52:10,338 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:52:13,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=468380.0, ans=0.0 2023-11-18 23:52:15,592 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.783e+01 8.875e+01 9.595e+01 1.083e+02 1.455e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-18 23:52:18,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=468380.0, ans=0.125 2023-11-18 23:52:23,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=468446.6666666667, ans=0.09899494936611666 2023-11-18 23:52:24,029 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10150, loss[loss=0.06503, simple_loss=0.06086, pruned_loss=0.02031, audio_tagging_loss=0.01429, over 14502.00 frames. ], tot_loss[loss=0.09718, simple_loss=0.1137, pruned_loss=0.0292, audio_tagging_loss=0.01113, over 3041881.61 frames. ], batch size: 55, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:52:46,781 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:52:49,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=468580.0, ans=0.1 2023-11-18 23:52:54,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=468580.0, ans=0.125 2023-11-18 23:52:56,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=468646.6666666667, ans=0.125 2023-11-18 23:53:00,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=468646.6666666667, ans=0.125 2023-11-18 23:53:08,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=468713.3333333333, ans=0.07 2023-11-18 23:53:18,611 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10200, loss[loss=0.1213, simple_loss=0.1403, pruned_loss=0.04089, audio_tagging_loss=0.01021, over 15098.00 frames. ], tot_loss[loss=0.09756, simple_loss=0.114, pruned_loss=0.02929, audio_tagging_loss=0.01127, over 3049371.96 frames. ], batch size: 55, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:53:37,336 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:53:49,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=468913.3333333333, ans=0.0 2023-11-18 23:53:53,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=468980.0, ans=0.0 2023-11-18 23:53:58,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=468980.0, ans=0.125 2023-11-18 23:54:03,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=469046.6666666667, ans=0.125 2023-11-18 23:54:04,805 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.998e+01 1.003e+02 1.100e+02 1.354e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-18 23:54:09,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=469046.6666666667, ans=0.125 2023-11-18 23:54:13,816 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10250, loss[loss=0.0989, simple_loss=0.1157, pruned_loss=0.03029, audio_tagging_loss=0.01078, over 15409.00 frames. ], tot_loss[loss=0.09779, simple_loss=0.1147, pruned_loss=0.02934, audio_tagging_loss=0.0111, over 3051596.87 frames. ], batch size: 58, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:54:18,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=469113.3333333333, ans=0.2 2023-11-18 23:54:22,135 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.32 vs. limit=15.0 2023-11-18 23:54:32,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=469180.0, ans=0.125 2023-11-18 23:54:37,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=469246.6666666667, ans=0.0 2023-11-18 23:54:47,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=469313.3333333333, ans=0.2 2023-11-18 23:54:53,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=469313.3333333333, ans=0.1 2023-11-18 23:54:57,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=469380.0, ans=0.0 2023-11-18 23:55:00,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=469380.0, ans=0.0 2023-11-18 23:55:10,096 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10300, loss[loss=0.1096, simple_loss=0.1295, pruned_loss=0.03551, audio_tagging_loss=0.009361, over 15332.00 frames. ], tot_loss[loss=0.09838, simple_loss=0.115, pruned_loss=0.02959, audio_tagging_loss=0.01127, over 3059027.17 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:55:12,890 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.50 vs. limit=10.0 2023-11-18 23:55:27,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=469513.3333333333, ans=0.2 2023-11-18 23:55:38,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=469580.0, ans=0.0 2023-11-18 23:55:48,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=469646.6666666667, ans=0.125 2023-11-18 23:55:56,572 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 9.104e+01 1.005e+02 1.145e+02 1.607e+02, threshold=2.009e+02, percent-clipped=0.0 2023-11-18 23:56:05,066 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10350, loss[loss=0.1124, simple_loss=0.1243, pruned_loss=0.03711, audio_tagging_loss=0.01312, over 14374.00 frames. ], tot_loss[loss=0.09829, simple_loss=0.1149, pruned_loss=0.02957, audio_tagging_loss=0.01126, over 3066831.36 frames. ], batch size: 55, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:56:20,386 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=469846.6666666667, ans=0.2 2023-11-18 23:56:28,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=469913.3333333333, ans=0.125 2023-11-18 23:56:29,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=469913.3333333333, ans=0.125 2023-11-18 23:56:42,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.75 vs. limit=22.5 2023-11-18 23:56:45,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=469980.0, ans=0.125 2023-11-18 23:56:47,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=12.0 2023-11-18 23:57:00,327 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10400, loss[loss=0.1078, simple_loss=0.1243, pruned_loss=0.03452, audio_tagging_loss=0.01113, over 15510.00 frames. ], tot_loss[loss=0.09798, simple_loss=0.1146, pruned_loss=0.02933, audio_tagging_loss=0.01136, over 3056374.99 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:57:25,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=470246.6666666667, ans=0.125 2023-11-18 23:57:27,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=470246.6666666667, ans=0.125 2023-11-18 23:57:29,801 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.54 vs. limit=22.5 2023-11-18 23:57:34,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=470313.3333333333, ans=0.0 2023-11-18 23:57:39,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=470313.3333333333, ans=0.125 2023-11-18 23:57:41,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.28 vs. limit=15.0 2023-11-18 23:57:42,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=470313.3333333333, ans=0.125 2023-11-18 23:57:42,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=470313.3333333333, ans=0.1 2023-11-18 23:57:42,799 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:57:47,236 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.610e+01 9.344e+01 1.013e+02 2.407e+02, threshold=1.869e+02, percent-clipped=1.0 2023-11-18 23:57:56,723 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10450, loss[loss=0.09268, simple_loss=0.1036, pruned_loss=0.02952, audio_tagging_loss=0.01134, over 15723.00 frames. ], tot_loss[loss=0.09744, simple_loss=0.1138, pruned_loss=0.02916, audio_tagging_loss=0.01137, over 3048014.56 frames. ], batch size: 58, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:57:58,143 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=12.0 2023-11-18 23:58:16,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=470580.0, ans=0.09899494936611666 2023-11-18 23:58:41,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=470713.3333333333, ans=0.125 2023-11-18 23:58:51,741 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10500, loss[loss=0.07104, simple_loss=0.07924, pruned_loss=0.01981, audio_tagging_loss=0.01161, over 14997.00 frames. ], tot_loss[loss=0.0967, simple_loss=0.1131, pruned_loss=0.02893, audio_tagging_loss=0.01122, over 3049232.42 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:59:03,164 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.91 vs. limit=12.0 2023-11-18 23:59:09,631 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2023-11-18 23:59:21,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=470913.3333333333, ans=0.0 2023-11-18 23:59:38,867 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.568e+01 9.489e+01 1.065e+02 1.523e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-18 23:59:40,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2023-11-18 23:59:45,407 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2023-11-18 23:59:46,868 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10550, loss[loss=0.1131, simple_loss=0.1408, pruned_loss=0.03103, audio_tagging_loss=0.01163, over 16163.00 frames. ], tot_loss[loss=0.09685, simple_loss=0.1136, pruned_loss=0.02898, audio_tagging_loss=0.0111, over 3049856.69 frames. ], batch size: 59, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:00:02,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2023-11-19 00:00:02,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471180.0, ans=0.1 2023-11-19 00:00:05,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=471180.0, ans=0.2 2023-11-19 00:00:07,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=471180.0, ans=0.125 2023-11-19 00:00:23,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=471313.3333333333, ans=0.125 2023-11-19 00:00:31,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471380.0, ans=0.1 2023-11-19 00:00:43,093 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10600, loss[loss=0.1, simple_loss=0.1249, pruned_loss=0.02956, audio_tagging_loss=0.008017, over 13994.00 frames. ], tot_loss[loss=0.09698, simple_loss=0.1143, pruned_loss=0.02891, audio_tagging_loss=0.01094, over 3042086.02 frames. ], batch size: 54, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:01:02,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=471513.3333333333, ans=0.1 2023-11-19 00:01:18,489 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=17.25 vs. limit=15.0 2023-11-19 00:01:24,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=471646.6666666667, ans=0.125 2023-11-19 00:01:29,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=471713.3333333333, ans=0.125 2023-11-19 00:01:31,012 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 9.038e+01 9.665e+01 1.088e+02 1.655e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-19 00:01:33,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=471713.3333333333, ans=0.125 2023-11-19 00:01:33,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=471713.3333333333, ans=0.1 2023-11-19 00:01:38,982 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10650, loss[loss=0.07136, simple_loss=0.08232, pruned_loss=0.01755, audio_tagging_loss=0.01265, over 14722.00 frames. ], tot_loss[loss=0.09722, simple_loss=0.1148, pruned_loss=0.02905, audio_tagging_loss=0.01078, over 3036441.98 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:01:53,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.33 vs. limit=15.0 2023-11-19 00:02:14,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=471980.0, ans=0.0 2023-11-19 00:02:21,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=471980.0, ans=0.125 2023-11-19 00:02:26,186 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:02:34,796 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10700, loss[loss=0.1135, simple_loss=0.1354, pruned_loss=0.0369, audio_tagging_loss=0.008898, over 15943.00 frames. ], tot_loss[loss=0.09658, simple_loss=0.1141, pruned_loss=0.02877, audio_tagging_loss=0.01074, over 3045338.77 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:02:41,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=472113.3333333333, ans=0.0 2023-11-19 00:02:41,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=472113.3333333333, ans=0.125 2023-11-19 00:02:48,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=472180.0, ans=0.0 2023-11-19 00:03:10,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=472313.3333333333, ans=0.2 2023-11-19 00:03:11,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=472313.3333333333, ans=0.5 2023-11-19 00:03:17,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=472313.3333333333, ans=0.0 2023-11-19 00:03:22,300 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.951e+01 9.625e+01 1.080e+02 1.426e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-19 00:03:27,689 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.89 vs. limit=15.0 2023-11-19 00:03:30,923 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10750, loss[loss=0.08291, simple_loss=0.09441, pruned_loss=0.02441, audio_tagging_loss=0.01129, over 14476.00 frames. ], tot_loss[loss=0.09657, simple_loss=0.1138, pruned_loss=0.02896, audio_tagging_loss=0.01068, over 3043889.01 frames. ], batch size: 55, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:03:31,166 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.014e-01 2023-11-19 00:03:34,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=472446.6666666667, ans=0.2 2023-11-19 00:03:46,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=472513.3333333333, ans=0.0 2023-11-19 00:03:54,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=472580.0, ans=0.0 2023-11-19 00:04:02,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=472646.6666666667, ans=0.2 2023-11-19 00:04:04,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=472646.6666666667, ans=0.0 2023-11-19 00:04:12,185 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:04:13,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=472646.6666666667, ans=0.1 2023-11-19 00:04:25,558 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10800, loss[loss=0.09297, simple_loss=0.1072, pruned_loss=0.02852, audio_tagging_loss=0.01084, over 14631.00 frames. ], tot_loss[loss=0.09585, simple_loss=0.1128, pruned_loss=0.02868, audio_tagging_loss=0.01078, over 3047790.88 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 32.0 2023-11-19 00:04:48,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=472913.3333333333, ans=0.0 2023-11-19 00:04:53,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=472913.3333333333, ans=0.0 2023-11-19 00:05:09,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=473046.6666666667, ans=0.125 2023-11-19 00:05:13,539 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.608e+01 9.354e+01 1.065e+02 1.440e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-19 00:05:20,958 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10850, loss[loss=0.09243, simple_loss=0.112, pruned_loss=0.02563, audio_tagging_loss=0.01081, over 15273.00 frames. ], tot_loss[loss=0.09584, simple_loss=0.1127, pruned_loss=0.02876, audio_tagging_loss=0.01071, over 3044420.66 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:05:49,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=473246.6666666667, ans=0.125 2023-11-19 00:05:59,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.69 vs. limit=15.0 2023-11-19 00:06:11,158 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:06:11,446 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:06:15,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=473380.0, ans=0.0 2023-11-19 00:06:17,538 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10900, loss[loss=0.09335, simple_loss=0.1135, pruned_loss=0.02823, audio_tagging_loss=0.00836, over 14797.00 frames. ], tot_loss[loss=0.09637, simple_loss=0.1133, pruned_loss=0.02897, audio_tagging_loss=0.01074, over 3041285.71 frames. ], batch size: 54, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:06:27,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=473513.3333333333, ans=0.0 2023-11-19 00:06:27,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=473513.3333333333, ans=0.125 2023-11-19 00:06:36,021 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2023-11-19 00:06:36,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=473513.3333333333, ans=0.125 2023-11-19 00:06:37,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=473580.0, ans=0.125 2023-11-19 00:06:43,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=473580.0, ans=0.125 2023-11-19 00:06:43,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=473580.0, ans=0.2 2023-11-19 00:06:58,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=473646.6666666667, ans=0.125 2023-11-19 00:07:05,073 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.580e+01 9.550e+01 1.089e+02 1.595e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-19 00:07:12,537 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 10950, loss[loss=0.09463, simple_loss=0.1063, pruned_loss=0.02714, audio_tagging_loss=0.01435, over 15141.00 frames. ], tot_loss[loss=0.09526, simple_loss=0.1117, pruned_loss=0.02849, audio_tagging_loss=0.0109, over 3041764.29 frames. ], batch size: 56, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:07:15,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=473780.0, ans=0.1 2023-11-19 00:07:18,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=473780.0, ans=0.025 2023-11-19 00:07:32,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=473846.6666666667, ans=0.0 2023-11-19 00:07:33,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=473913.3333333333, ans=0.5 2023-11-19 00:07:41,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=473913.3333333333, ans=0.125 2023-11-19 00:07:52,764 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2023-11-19 00:08:00,358 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2023-11-19 00:08:07,624 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11000, loss[loss=0.1016, simple_loss=0.1325, pruned_loss=0.02865, audio_tagging_loss=0.00669, over 14805.00 frames. ], tot_loss[loss=0.0962, simple_loss=0.1127, pruned_loss=0.02891, audio_tagging_loss=0.01094, over 3041740.18 frames. ], batch size: 54, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:08:11,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=474113.3333333333, ans=0.125 2023-11-19 00:08:14,388 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:08:25,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=474180.0, ans=0.025 2023-11-19 00:08:28,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=474180.0, ans=0.0 2023-11-19 00:08:38,780 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:08:56,476 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 9.044e+01 9.862e+01 1.100e+02 1.802e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-19 00:09:03,357 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11050, loss[loss=0.07388, simple_loss=0.0843, pruned_loss=0.01871, audio_tagging_loss=0.01301, over 15346.00 frames. ], tot_loss[loss=0.09624, simple_loss=0.1125, pruned_loss=0.02884, audio_tagging_loss=0.01114, over 3043292.89 frames. ], batch size: 62, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:09:23,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=474513.3333333333, ans=0.0 2023-11-19 00:09:24,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=474580.0, ans=0.125 2023-11-19 00:09:59,042 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11100, loss[loss=0.08733, simple_loss=0.09422, pruned_loss=0.02905, audio_tagging_loss=0.01117, over 14858.00 frames. ], tot_loss[loss=0.09595, simple_loss=0.112, pruned_loss=0.02867, audio_tagging_loss=0.01128, over 3039594.55 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:10:00,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=474780.0, ans=0.0 2023-11-19 00:10:01,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=474780.0, ans=0.035 2023-11-19 00:10:08,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=474846.6666666667, ans=0.125 2023-11-19 00:10:14,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=474846.6666666667, ans=0.125 2023-11-19 00:10:14,242 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:10:14,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=474846.6666666667, ans=0.125 2023-11-19 00:10:22,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=474913.3333333333, ans=0.125 2023-11-19 00:10:26,249 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.16 vs. limit=10.0 2023-11-19 00:10:41,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=474980.0, ans=0.2 2023-11-19 00:10:41,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=474980.0, ans=0.125 2023-11-19 00:10:47,792 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.819e+01 8.698e+01 9.641e+01 1.040e+02 1.445e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-19 00:10:54,132 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11150, loss[loss=0.08066, simple_loss=0.0855, pruned_loss=0.02296, audio_tagging_loss=0.01495, over 15141.00 frames. ], tot_loss[loss=0.09645, simple_loss=0.1125, pruned_loss=0.02873, audio_tagging_loss=0.01148, over 3042451.62 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:11:17,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=475246.6666666667, ans=0.125 2023-11-19 00:11:25,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=475246.6666666667, ans=0.125 2023-11-19 00:11:28,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=475313.3333333333, ans=0.0 2023-11-19 00:11:30,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=475313.3333333333, ans=0.125 2023-11-19 00:11:32,903 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=12.0 2023-11-19 00:11:44,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=475380.0, ans=0.0 2023-11-19 00:11:49,597 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11200, loss[loss=0.1081, simple_loss=0.1362, pruned_loss=0.03285, audio_tagging_loss=0.007127, over 15359.00 frames. ], tot_loss[loss=0.09664, simple_loss=0.113, pruned_loss=0.02871, audio_tagging_loss=0.01145, over 3041422.81 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:11:49,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=475446.6666666667, ans=0.125 2023-11-19 00:11:54,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=475446.6666666667, ans=0.125 2023-11-19 00:12:05,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=475513.3333333333, ans=0.04949747468305833 2023-11-19 00:12:08,996 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=475513.3333333333, ans=0.2 2023-11-19 00:12:25,799 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.07 vs. limit=22.5 2023-11-19 00:12:29,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.06 vs. limit=12.0 2023-11-19 00:12:32,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=475646.6666666667, ans=0.0 2023-11-19 00:12:39,482 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.628e+01 9.761e+01 1.045e+02 1.473e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-19 00:12:45,836 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11250, loss[loss=0.09349, simple_loss=0.1135, pruned_loss=0.02777, audio_tagging_loss=0.008945, over 14703.00 frames. ], tot_loss[loss=0.09618, simple_loss=0.1123, pruned_loss=0.02862, audio_tagging_loss=0.01139, over 3043121.90 frames. ], batch size: 54, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:12:49,392 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=22.5 2023-11-19 00:12:55,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=475846.6666666667, ans=0.0 2023-11-19 00:13:18,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=475980.0, ans=0.125 2023-11-19 00:13:41,020 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11300, loss[loss=0.06748, simple_loss=0.07861, pruned_loss=0.01559, audio_tagging_loss=0.01259, over 15903.00 frames. ], tot_loss[loss=0.09616, simple_loss=0.1128, pruned_loss=0.02857, audio_tagging_loss=0.01119, over 3051240.18 frames. ], batch size: 61, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:14:25,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=476380.0, ans=0.125 2023-11-19 00:14:29,494 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.711e+01 9.512e+01 1.035e+02 1.315e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 00:14:36,318 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11350, loss[loss=0.1038, simple_loss=0.1165, pruned_loss=0.03239, audio_tagging_loss=0.01315, over 13571.00 frames. ], tot_loss[loss=0.09725, simple_loss=0.1142, pruned_loss=0.02912, audio_tagging_loss=0.01104, over 3044566.48 frames. ], batch size: 54, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:15:29,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=476713.3333333333, ans=0.125 2023-11-19 00:15:30,106 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=15.0 2023-11-19 00:15:32,737 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11400, loss[loss=0.1036, simple_loss=0.1197, pruned_loss=0.03268, audio_tagging_loss=0.01105, over 14549.00 frames. ], tot_loss[loss=0.09719, simple_loss=0.1144, pruned_loss=0.02913, audio_tagging_loss=0.01084, over 3047096.95 frames. ], batch size: 53, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:15:37,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=476780.0, ans=0.015 2023-11-19 00:15:41,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=476780.0, ans=0.125 2023-11-19 00:16:08,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=476980.0, ans=0.0 2023-11-19 00:16:20,944 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.841e+01 9.746e+01 1.056e+02 1.411e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-19 00:16:27,295 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11450, loss[loss=0.1046, simple_loss=0.1191, pruned_loss=0.03302, audio_tagging_loss=0.01205, over 15260.00 frames. ], tot_loss[loss=0.09686, simple_loss=0.1137, pruned_loss=0.02908, audio_tagging_loss=0.01094, over 3050016.55 frames. ], batch size: 56, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:16:54,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=477246.6666666667, ans=0.2 2023-11-19 00:16:56,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=477246.6666666667, ans=0.125 2023-11-19 00:16:59,532 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.05 vs. limit=10.0 2023-11-19 00:17:13,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=477380.0, ans=0.125 2023-11-19 00:17:22,950 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11500, loss[loss=0.1006, simple_loss=0.1134, pruned_loss=0.02977, audio_tagging_loss=0.01411, over 15698.00 frames. ], tot_loss[loss=0.0967, simple_loss=0.1132, pruned_loss=0.02914, audio_tagging_loss=0.01097, over 3049719.76 frames. ], batch size: 58, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:17:27,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.56 vs. limit=10.0 2023-11-19 00:17:29,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=477446.6666666667, ans=0.0 2023-11-19 00:17:40,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.83 vs. limit=15.0 2023-11-19 00:17:40,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=477513.3333333333, ans=0.125 2023-11-19 00:17:54,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=477580.0, ans=0.0 2023-11-19 00:18:04,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=477646.6666666667, ans=0.05 2023-11-19 00:18:08,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=15.0 2023-11-19 00:18:11,938 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.969e+01 9.661e+01 1.076e+02 1.537e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-19 00:18:15,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.80 vs. limit=10.0 2023-11-19 00:18:15,944 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.554e-03 2023-11-19 00:18:19,403 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11550, loss[loss=0.05579, simple_loss=0.06434, pruned_loss=0.01008, audio_tagging_loss=0.01354, over 15224.00 frames. ], tot_loss[loss=0.09636, simple_loss=0.1128, pruned_loss=0.02903, audio_tagging_loss=0.01092, over 3048174.67 frames. ], batch size: 59, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:18:28,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=477780.0, ans=0.5 2023-11-19 00:18:34,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-11-19 00:18:49,473 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:19:03,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2023-11-19 00:19:09,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=15.0 2023-11-19 00:19:11,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2023-11-19 00:19:14,384 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11600, loss[loss=0.1055, simple_loss=0.1223, pruned_loss=0.03369, audio_tagging_loss=0.01069, over 14913.00 frames. ], tot_loss[loss=0.09619, simple_loss=0.1127, pruned_loss=0.02889, audio_tagging_loss=0.01096, over 3044963.89 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:19:26,844 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=478180.0, ans=0.125 2023-11-19 00:19:28,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=478180.0, ans=0.125 2023-11-19 00:19:41,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=478246.6666666667, ans=0.2 2023-11-19 00:19:45,381 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:20:02,995 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.140e+01 8.994e+01 9.981e+01 1.100e+02 1.554e+02, threshold=1.996e+02, percent-clipped=0.0 2023-11-19 00:20:09,877 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11650, loss[loss=0.1002, simple_loss=0.1155, pruned_loss=0.02818, audio_tagging_loss=0.01423, over 14502.00 frames. ], tot_loss[loss=0.09576, simple_loss=0.1123, pruned_loss=0.02868, audio_tagging_loss=0.01093, over 3045473.19 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:20:11,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=478446.6666666667, ans=0.1 2023-11-19 00:20:27,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=478513.3333333333, ans=0.125 2023-11-19 00:20:33,230 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2023-11-19 00:21:01,158 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=12.0 2023-11-19 00:21:06,346 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11700, loss[loss=0.113, simple_loss=0.1351, pruned_loss=0.03607, audio_tagging_loss=0.009347, over 15243.00 frames. ], tot_loss[loss=0.096, simple_loss=0.1123, pruned_loss=0.02878, audio_tagging_loss=0.01104, over 3047400.30 frames. ], batch size: 56, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:21:13,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=478780.0, ans=0.95 2023-11-19 00:21:22,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=478846.6666666667, ans=0.125 2023-11-19 00:21:28,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=478913.3333333333, ans=0.0 2023-11-19 00:21:40,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=478980.0, ans=0.0 2023-11-19 00:21:55,332 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.970e+01 9.668e+01 1.084e+02 1.454e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-19 00:22:00,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=479113.3333333333, ans=0.125 2023-11-19 00:22:01,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2023-11-19 00:22:01,684 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11750, loss[loss=0.07908, simple_loss=0.08952, pruned_loss=0.0223, audio_tagging_loss=0.01203, over 15792.00 frames. ], tot_loss[loss=0.09523, simple_loss=0.1115, pruned_loss=0.02851, audio_tagging_loss=0.01098, over 3045021.66 frames. ], batch size: 60, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:22:02,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=479113.3333333333, ans=0.125 2023-11-19 00:22:17,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2023-11-19 00:22:23,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2023-11-19 00:22:35,231 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:22:45,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=479380.0, ans=10.0 2023-11-19 00:22:47,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.63 vs. limit=15.0 2023-11-19 00:22:49,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=479380.0, ans=0.125 2023-11-19 00:22:50,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=479380.0, ans=0.125 2023-11-19 00:22:51,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=479380.0, ans=0.1 2023-11-19 00:22:56,863 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11800, loss[loss=0.09277, simple_loss=0.1106, pruned_loss=0.02514, audio_tagging_loss=0.01235, over 15630.00 frames. ], tot_loss[loss=0.09653, simple_loss=0.1127, pruned_loss=0.02917, audio_tagging_loss=0.01104, over 3038381.50 frames. ], batch size: 58, lr: 1.10e-02, grad_scale: 8.0 2023-11-19 00:23:16,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=479513.3333333333, ans=0.07 2023-11-19 00:23:35,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.47 vs. limit=15.0 2023-11-19 00:23:48,147 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 9.014e+01 9.704e+01 1.070e+02 1.627e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-19 00:23:53,377 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11850, loss[loss=0.09933, simple_loss=0.1164, pruned_loss=0.02914, audio_tagging_loss=0.01197, over 15943.00 frames. ], tot_loss[loss=0.09647, simple_loss=0.1126, pruned_loss=0.02904, audio_tagging_loss=0.01114, over 3035336.62 frames. ], batch size: 59, lr: 1.10e-02, grad_scale: 8.0 2023-11-19 00:23:53,523 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:24:04,491 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2023-11-19 00:24:05,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=479846.6666666667, ans=0.125 2023-11-19 00:24:06,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.33 vs. limit=6.0 2023-11-19 00:24:19,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=479913.3333333333, ans=0.125 2023-11-19 00:24:31,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=479980.0, ans=0.1 2023-11-19 00:24:35,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=479980.0, ans=0.0 2023-11-19 00:24:50,955 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11900, loss[loss=0.08821, simple_loss=0.09778, pruned_loss=0.02641, audio_tagging_loss=0.01291, over 14362.00 frames. ], tot_loss[loss=0.0968, simple_loss=0.1129, pruned_loss=0.02907, audio_tagging_loss=0.0113, over 3039482.45 frames. ], batch size: 56, lr: 1.10e-02, grad_scale: 8.0 2023-11-19 00:25:07,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=480180.0, ans=0.1 2023-11-19 00:25:07,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=12.0 2023-11-19 00:25:14,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=480246.6666666667, ans=0.05 2023-11-19 00:25:17,789 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=15.0 2023-11-19 00:25:35,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=480380.0, ans=0.2 2023-11-19 00:25:41,558 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.633e+01 9.352e+01 1.050e+02 1.397e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-19 00:25:41,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=480380.0, ans=0.5 2023-11-19 00:25:45,904 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 11950, loss[loss=0.1002, simple_loss=0.1241, pruned_loss=0.02677, audio_tagging_loss=0.01136, over 14542.00 frames. ], tot_loss[loss=0.09648, simple_loss=0.1124, pruned_loss=0.02893, audio_tagging_loss=0.01132, over 3041580.55 frames. ], batch size: 54, lr: 1.10e-02, grad_scale: 8.0 2023-11-19 00:26:06,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=480513.3333333333, ans=0.0 2023-11-19 00:26:08,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=480580.0, ans=0.125 2023-11-19 00:26:08,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=480580.0, ans=0.125 2023-11-19 00:26:17,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=480580.0, ans=0.125 2023-11-19 00:26:21,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=480646.6666666667, ans=0.2 2023-11-19 00:26:23,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=480646.6666666667, ans=0.0 2023-11-19 00:26:38,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=480780.0, ans=0.125 2023-11-19 00:26:39,765 INFO [train_asr.py:1115] (1/4) Epoch 6, batch 12000, loss[loss=0.141, simple_loss=0.1653, pruned_loss=0.04711, audio_tagging_loss=0.01127, over 15430.00 frames. ], tot_loss[loss=0.09615, simple_loss=0.1119, pruned_loss=0.02875, audio_tagging_loss=0.01145, over 3048326.54 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:26:39,766 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 00:27:12,305 INFO [train_asr.py:1147] (1/4) Epoch 6, validation: loss=0.07011, simple_loss=0.05856, pruned_loss=0.008079, audio_tagging_loss=0.03275, over 4681554.00 frames. 2023-11-19 00:27:12,306 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 00:27:12,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=480780.0, ans=0.125 2023-11-19 00:27:19,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2023-11-19 00:28:10,680 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 0, loss[loss=0.09999, simple_loss=0.09777, pruned_loss=0.02125, audio_tagging_loss=0.02986, over 13769.00 frames. ], tot_loss[loss=0.09999, simple_loss=0.09777, pruned_loss=0.02125, audio_tagging_loss=0.02986, over 13769.00 frames. ], batch size: 54, lr: 1.03e-02, grad_scale: 32.0 2023-11-19 00:28:10,681 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 00:28:36,470 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.2862, 4.9700, 4.8109, 5.1075], device='cuda:1') 2023-11-19 00:28:42,237 INFO [train_asr.py:1147] (1/4) Epoch 7, validation: loss=0.06897, simple_loss=0.05854, pruned_loss=0.008004, audio_tagging_loss=0.03169, over 4681554.00 frames. 2023-11-19 00:28:42,238 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 00:28:59,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=480993.3333333333, ans=0.0 2023-11-19 00:29:06,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=481060.0, ans=0.125 2023-11-19 00:29:08,502 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.969e+01 9.678e+01 1.084e+02 1.742e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-19 00:29:36,958 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 50, loss[loss=0.1086, simple_loss=0.1236, pruned_loss=0.02768, audio_tagging_loss=0.01915, over 15501.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1158, pruned_loss=0.02881, audio_tagging_loss=0.02071, over 688385.13 frames. ], batch size: 56, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:30:33,428 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 100, loss[loss=0.1181, simple_loss=0.1361, pruned_loss=0.03492, audio_tagging_loss=0.01516, over 15985.00 frames. ], tot_loss[loss=0.106, simple_loss=0.1139, pruned_loss=0.02884, audio_tagging_loss=0.02017, over 1206022.94 frames. ], batch size: 56, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:31:01,053 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 8.882e+01 9.750e+01 1.051e+02 1.477e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-19 00:31:12,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=481793.3333333333, ans=0.2 2023-11-19 00:31:28,789 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 150, loss[loss=0.1254, simple_loss=0.1418, pruned_loss=0.04473, audio_tagging_loss=0.00979, over 15861.00 frames. ], tot_loss[loss=0.1044, simple_loss=0.1149, pruned_loss=0.02915, audio_tagging_loss=0.01782, over 1615206.05 frames. ], batch size: 62, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:31:34,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=481926.6666666667, ans=0.0 2023-11-19 00:31:49,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=481993.3333333333, ans=0.125 2023-11-19 00:32:16,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=482193.3333333333, ans=0.05 2023-11-19 00:32:25,132 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 200, loss[loss=0.1029, simple_loss=0.1251, pruned_loss=0.03035, audio_tagging_loss=0.01002, over 15981.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1138, pruned_loss=0.02922, audio_tagging_loss=0.01598, over 1931812.31 frames. ], batch size: 60, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:32:25,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=482260.0, ans=0.2 2023-11-19 00:32:28,696 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2023-11-19 00:32:44,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=482326.6666666667, ans=22.5 2023-11-19 00:32:49,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=482393.3333333333, ans=0.1 2023-11-19 00:32:51,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=482393.3333333333, ans=0.125 2023-11-19 00:32:52,541 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 9.072e+01 1.001e+02 1.087e+02 1.831e+02, threshold=2.002e+02, percent-clipped=0.0 2023-11-19 00:33:17,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=482526.6666666667, ans=0.04949747468305833 2023-11-19 00:33:21,287 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 250, loss[loss=0.1157, simple_loss=0.1332, pruned_loss=0.03937, audio_tagging_loss=0.009735, over 15710.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.1137, pruned_loss=0.02917, audio_tagging_loss=0.0144, over 2178441.69 frames. ], batch size: 58, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:33:50,244 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2023-11-19 00:34:13,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=482860.0, ans=0.0 2023-11-19 00:34:15,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=482926.6666666667, ans=0.125 2023-11-19 00:34:16,420 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 300, loss[loss=0.09219, simple_loss=0.1118, pruned_loss=0.02729, audio_tagging_loss=0.009002, over 14859.00 frames. ], tot_loss[loss=0.0991, simple_loss=0.1131, pruned_loss=0.02902, audio_tagging_loss=0.01354, over 2376538.37 frames. ], batch size: 57, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:34:34,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=482993.3333333333, ans=0.125 2023-11-19 00:34:44,694 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.825e+01 8.903e+01 9.554e+01 1.061e+02 1.704e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-19 00:34:52,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=483126.6666666667, ans=0.125 2023-11-19 00:35:10,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=483260.0, ans=0.125 2023-11-19 00:35:12,339 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 350, loss[loss=0.08853, simple_loss=0.1123, pruned_loss=0.01922, audio_tagging_loss=0.01314, over 15678.00 frames. ], tot_loss[loss=0.09924, simple_loss=0.1147, pruned_loss=0.02908, audio_tagging_loss=0.01282, over 2529102.21 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:35:28,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.12 vs. limit=15.0 2023-11-19 00:35:46,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=483460.0, ans=0.125 2023-11-19 00:35:53,286 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.24 vs. limit=15.0 2023-11-19 00:36:04,459 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.29 vs. limit=15.0 2023-11-19 00:36:07,561 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 400, loss[loss=0.1227, simple_loss=0.157, pruned_loss=0.03994, audio_tagging_loss=0.004243, over 15633.00 frames. ], tot_loss[loss=0.09821, simple_loss=0.1143, pruned_loss=0.02874, audio_tagging_loss=0.01231, over 2647782.67 frames. ], batch size: 55, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:36:08,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483593.3333333333, ans=0.1 2023-11-19 00:36:13,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=483593.3333333333, ans=0.125 2023-11-19 00:36:22,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=483660.0, ans=0.2 2023-11-19 00:36:23,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=483660.0, ans=0.125 2023-11-19 00:36:32,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=483726.6666666667, ans=0.125 2023-11-19 00:36:33,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483726.6666666667, ans=0.1 2023-11-19 00:36:34,358 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.424e+01 8.614e+01 9.359e+01 1.038e+02 1.564e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 00:36:39,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=483793.3333333333, ans=0.125 2023-11-19 00:36:41,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=483793.3333333333, ans=0.125 2023-11-19 00:36:53,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.49 vs. limit=15.0 2023-11-19 00:36:54,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=483860.0, ans=0.0 2023-11-19 00:37:01,750 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 450, loss[loss=0.07816, simple_loss=0.08883, pruned_loss=0.02056, audio_tagging_loss=0.01318, over 15083.00 frames. ], tot_loss[loss=0.09736, simple_loss=0.114, pruned_loss=0.02852, audio_tagging_loss=0.01183, over 2735908.90 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:37:01,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=483926.6666666667, ans=0.125 2023-11-19 00:37:04,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=483926.6666666667, ans=0.125 2023-11-19 00:37:09,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=483926.6666666667, ans=0.2 2023-11-19 00:37:16,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2023-11-19 00:37:17,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=483993.3333333333, ans=0.04949747468305833 2023-11-19 00:37:19,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=483993.3333333333, ans=0.2 2023-11-19 00:37:32,784 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=484060.0, ans=0.125 2023-11-19 00:37:43,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=484126.6666666667, ans=0.2 2023-11-19 00:37:45,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=484193.3333333333, ans=0.125 2023-11-19 00:37:50,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=484193.3333333333, ans=0.125 2023-11-19 00:37:57,258 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 500, loss[loss=0.1191, simple_loss=0.1387, pruned_loss=0.03524, audio_tagging_loss=0.01446, over 14233.00 frames. ], tot_loss[loss=0.09667, simple_loss=0.1132, pruned_loss=0.02837, audio_tagging_loss=0.01171, over 2802941.84 frames. ], batch size: 53, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:38:06,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=484260.0, ans=0.0 2023-11-19 00:38:17,915 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.44 vs. limit=10.0 2023-11-19 00:38:24,793 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 8.485e+01 9.298e+01 1.059e+02 1.299e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 00:38:32,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2023-11-19 00:38:33,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=484460.0, ans=0.0 2023-11-19 00:38:52,398 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 550, loss[loss=0.1086, simple_loss=0.1385, pruned_loss=0.03186, audio_tagging_loss=0.007498, over 16116.00 frames. ], tot_loss[loss=0.09635, simple_loss=0.1131, pruned_loss=0.02834, audio_tagging_loss=0.01146, over 2864161.43 frames. ], batch size: 59, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:38:54,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=484593.3333333333, ans=0.1 2023-11-19 00:38:59,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=484593.3333333333, ans=0.2 2023-11-19 00:39:10,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=484660.0, ans=0.125 2023-11-19 00:39:11,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.28 vs. limit=15.0 2023-11-19 00:39:14,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=484726.6666666667, ans=0.125 2023-11-19 00:39:32,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=484793.3333333333, ans=0.0 2023-11-19 00:39:34,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=484793.3333333333, ans=0.125 2023-11-19 00:39:40,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=484860.0, ans=0.5 2023-11-19 00:39:45,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=484860.0, ans=0.035 2023-11-19 00:39:48,097 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 600, loss[loss=0.07474, simple_loss=0.08453, pruned_loss=0.02111, audio_tagging_loss=0.01136, over 13689.00 frames. ], tot_loss[loss=0.09673, simple_loss=0.1137, pruned_loss=0.02855, audio_tagging_loss=0.01134, over 2898790.04 frames. ], batch size: 54, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:40:09,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=485060.0, ans=0.0 2023-11-19 00:40:15,507 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.943e+01 9.833e+01 1.134e+02 1.508e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-19 00:40:19,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=485060.0, ans=0.125 2023-11-19 00:40:42,651 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 650, loss[loss=0.0938, simple_loss=0.105, pruned_loss=0.02857, audio_tagging_loss=0.01273, over 14322.00 frames. ], tot_loss[loss=0.09587, simple_loss=0.1128, pruned_loss=0.02816, audio_tagging_loss=0.01133, over 2929337.08 frames. ], batch size: 55, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:40:44,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=485260.0, ans=0.1 2023-11-19 00:40:47,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=485260.0, ans=0.125 2023-11-19 00:41:38,262 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 700, loss[loss=0.08611, simple_loss=0.1002, pruned_loss=0.02411, audio_tagging_loss=0.01189, over 15568.00 frames. ], tot_loss[loss=0.09643, simple_loss=0.1139, pruned_loss=0.02833, audio_tagging_loss=0.01113, over 2952249.55 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:41:44,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=485593.3333333333, ans=0.1 2023-11-19 00:42:06,500 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.220e+01 8.517e+01 9.340e+01 1.042e+02 1.556e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 00:42:14,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=485793.3333333333, ans=0.125 2023-11-19 00:42:15,747 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:42:18,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2023-11-19 00:42:18,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2023-11-19 00:42:27,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=485860.0, ans=0.0 2023-11-19 00:42:33,684 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 750, loss[loss=0.1049, simple_loss=0.1281, pruned_loss=0.02961, audio_tagging_loss=0.01122, over 15801.00 frames. ], tot_loss[loss=0.09634, simple_loss=0.114, pruned_loss=0.02818, audio_tagging_loss=0.01116, over 2980941.93 frames. ], batch size: 61, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:42:39,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=485926.6666666667, ans=0.125 2023-11-19 00:43:20,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=486193.3333333333, ans=0.125 2023-11-19 00:43:28,903 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 800, loss[loss=0.08641, simple_loss=0.08984, pruned_loss=0.02963, audio_tagging_loss=0.01186, over 15646.00 frames. ], tot_loss[loss=0.0958, simple_loss=0.1131, pruned_loss=0.02807, audio_tagging_loss=0.01119, over 2994707.28 frames. ], batch size: 59, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:43:33,045 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2023-11-19 00:43:45,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=486326.6666666667, ans=0.1 2023-11-19 00:43:52,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2023-11-19 00:43:58,560 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 8.961e+01 9.604e+01 1.088e+02 1.734e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-19 00:44:01,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.63 vs. limit=6.0 2023-11-19 00:44:04,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=486460.0, ans=0.0 2023-11-19 00:44:05,586 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.00 vs. limit=6.0 2023-11-19 00:44:07,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=486460.0, ans=0.125 2023-11-19 00:44:07,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=486460.0, ans=0.125 2023-11-19 00:44:09,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=486460.0, ans=0.1 2023-11-19 00:44:11,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=486460.0, ans=0.125 2023-11-19 00:44:20,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=486526.6666666667, ans=0.125 2023-11-19 00:44:20,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=486526.6666666667, ans=0.0 2023-11-19 00:44:24,844 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 850, loss[loss=0.09967, simple_loss=0.1096, pruned_loss=0.02781, audio_tagging_loss=0.01707, over 14177.00 frames. ], tot_loss[loss=0.09642, simple_loss=0.1137, pruned_loss=0.02833, audio_tagging_loss=0.01122, over 3005892.21 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:44:31,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=486593.3333333333, ans=0.125 2023-11-19 00:44:48,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2023-11-19 00:44:57,278 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=486726.6666666667, ans=0.125 2023-11-19 00:44:58,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2023-11-19 00:45:06,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=486793.3333333333, ans=0.125 2023-11-19 00:45:14,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=486860.0, ans=0.0 2023-11-19 00:45:21,333 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 900, loss[loss=0.07851, simple_loss=0.09338, pruned_loss=0.02012, audio_tagging_loss=0.0117, over 15769.00 frames. ], tot_loss[loss=0.09596, simple_loss=0.1131, pruned_loss=0.02811, audio_tagging_loss=0.01131, over 3018481.09 frames. ], batch size: 63, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:45:49,161 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.581e+01 9.444e+01 1.025e+02 1.382e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-19 00:45:51,541 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=487060.0, ans=0.125 2023-11-19 00:46:16,167 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 950, loss[loss=0.07005, simple_loss=0.08813, pruned_loss=0.01356, audio_tagging_loss=0.01242, over 15214.00 frames. ], tot_loss[loss=0.09599, simple_loss=0.1134, pruned_loss=0.02818, audio_tagging_loss=0.01109, over 3024108.75 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:46:45,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 2023-11-19 00:46:50,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=487460.0, ans=0.125 2023-11-19 00:46:55,467 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=487460.0, ans=0.2 2023-11-19 00:47:11,643 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1000, loss[loss=0.08497, simple_loss=0.09059, pruned_loss=0.02701, audio_tagging_loss=0.01266, over 15115.00 frames. ], tot_loss[loss=0.09453, simple_loss=0.1119, pruned_loss=0.02764, audio_tagging_loss=0.01096, over 3028938.41 frames. ], batch size: 59, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:47:13,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2023-11-19 00:47:28,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=487660.0, ans=0.2 2023-11-19 00:47:33,781 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2023-11-19 00:47:35,332 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:47:37,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=487726.6666666667, ans=0.0 2023-11-19 00:47:41,630 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.644e+01 9.195e+01 1.009e+02 1.438e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-19 00:47:48,699 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.403e-02 2023-11-19 00:48:07,471 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1050, loss[loss=0.06633, simple_loss=0.07918, pruned_loss=0.01536, audio_tagging_loss=0.01138, over 14457.00 frames. ], tot_loss[loss=0.09419, simple_loss=0.1112, pruned_loss=0.02768, audio_tagging_loss=0.01091, over 3030384.82 frames. ], batch size: 55, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:48:36,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=488060.0, ans=0.2 2023-11-19 00:48:37,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=488060.0, ans=0.09899494936611666 2023-11-19 00:48:39,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=488126.6666666667, ans=0.125 2023-11-19 00:48:40,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.23 vs. limit=15.0 2023-11-19 00:48:40,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=488126.6666666667, ans=0.0 2023-11-19 00:48:43,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=488126.6666666667, ans=0.0 2023-11-19 00:48:52,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=488193.3333333333, ans=0.0 2023-11-19 00:49:03,262 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1100, loss[loss=0.08769, simple_loss=0.1022, pruned_loss=0.02651, audio_tagging_loss=0.01006, over 15307.00 frames. ], tot_loss[loss=0.0935, simple_loss=0.1103, pruned_loss=0.02738, audio_tagging_loss=0.01095, over 3030977.35 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:49:06,408 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:49:07,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=488260.0, ans=0.125 2023-11-19 00:49:14,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.88 vs. limit=15.0 2023-11-19 00:49:15,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=488326.6666666667, ans=0.125 2023-11-19 00:49:16,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=488326.6666666667, ans=0.125 2023-11-19 00:49:18,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=488326.6666666667, ans=0.2 2023-11-19 00:49:20,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=488326.6666666667, ans=0.125 2023-11-19 00:49:33,461 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.821e+01 9.518e+01 1.052e+02 1.526e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 00:49:58,982 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1150, loss[loss=0.08368, simple_loss=0.1034, pruned_loss=0.02355, audio_tagging_loss=0.00845, over 16356.00 frames. ], tot_loss[loss=0.09292, simple_loss=0.1093, pruned_loss=0.02728, audio_tagging_loss=0.011, over 3038276.49 frames. ], batch size: 61, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:50:11,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=488660.0, ans=0.1 2023-11-19 00:50:12,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=488660.0, ans=0.125 2023-11-19 00:50:18,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=488660.0, ans=0.0 2023-11-19 00:50:29,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=488726.6666666667, ans=0.2 2023-11-19 00:50:31,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2023-11-19 00:50:32,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=488793.3333333333, ans=0.125 2023-11-19 00:50:36,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=488793.3333333333, ans=0.125 2023-11-19 00:50:40,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=12.0 2023-11-19 00:50:40,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=488793.3333333333, ans=0.0 2023-11-19 00:50:55,537 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1200, loss[loss=0.1012, simple_loss=0.122, pruned_loss=0.03012, audio_tagging_loss=0.01008, over 16200.00 frames. ], tot_loss[loss=0.09333, simple_loss=0.1096, pruned_loss=0.02757, audio_tagging_loss=0.01098, over 3039009.30 frames. ], batch size: 59, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:51:00,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=488926.6666666667, ans=0.0 2023-11-19 00:51:03,144 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=488926.6666666667, ans=0.0 2023-11-19 00:51:25,293 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.775e+01 9.458e+01 1.050e+02 1.338e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-19 00:51:26,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=489060.0, ans=0.125 2023-11-19 00:51:34,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=489126.6666666667, ans=0.0 2023-11-19 00:51:50,845 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1250, loss[loss=0.1346, simple_loss=0.15, pruned_loss=0.04474, audio_tagging_loss=0.01489, over 14683.00 frames. ], tot_loss[loss=0.09415, simple_loss=0.1102, pruned_loss=0.02803, audio_tagging_loss=0.011, over 3034804.33 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:51:53,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=489260.0, ans=0.125 2023-11-19 00:52:07,710 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.12 vs. limit=15.0 2023-11-19 00:52:08,299 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=489326.6666666667, ans=0.2 2023-11-19 00:52:22,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=489393.3333333333, ans=0.125 2023-11-19 00:52:46,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=489593.3333333333, ans=0.125 2023-11-19 00:52:47,252 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1300, loss[loss=0.09773, simple_loss=0.1165, pruned_loss=0.02646, audio_tagging_loss=0.01302, over 15173.00 frames. ], tot_loss[loss=0.09431, simple_loss=0.1107, pruned_loss=0.02801, audio_tagging_loss=0.01097, over 3040832.80 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:52:59,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=489660.0, ans=0.1 2023-11-19 00:53:02,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=489660.0, ans=0.0 2023-11-19 00:53:17,089 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.399e+01 8.502e+01 9.491e+01 1.040e+02 1.421e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-19 00:53:18,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=489726.6666666667, ans=0.125 2023-11-19 00:53:43,589 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1350, loss[loss=0.09034, simple_loss=0.1131, pruned_loss=0.02286, audio_tagging_loss=0.01093, over 16823.00 frames. ], tot_loss[loss=0.09446, simple_loss=0.1111, pruned_loss=0.02793, audio_tagging_loss=0.01098, over 3043841.33 frames. ], batch size: 62, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:53:51,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=489926.6666666667, ans=0.125 2023-11-19 00:53:54,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2023-11-19 00:54:06,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=490060.0, ans=0.0 2023-11-19 00:54:16,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=490126.6666666667, ans=0.0 2023-11-19 00:54:20,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2023-11-19 00:54:24,931 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:54:28,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=490193.3333333333, ans=0.07 2023-11-19 00:54:33,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=490193.3333333333, ans=0.0 2023-11-19 00:54:38,695 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1400, loss[loss=0.05545, simple_loss=0.05476, pruned_loss=0.01526, audio_tagging_loss=0.01281, over 15573.00 frames. ], tot_loss[loss=0.0934, simple_loss=0.1095, pruned_loss=0.02755, audio_tagging_loss=0.01108, over 3044160.22 frames. ], batch size: 60, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:54:45,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=490260.0, ans=0.2 2023-11-19 00:55:09,549 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.650e+01 9.368e+01 1.053e+02 1.666e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 00:55:26,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=490526.6666666667, ans=0.2 2023-11-19 00:55:27,953 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=22.5 2023-11-19 00:55:31,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=490526.6666666667, ans=0.2 2023-11-19 00:55:35,057 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1450, loss[loss=0.06503, simple_loss=0.07036, pruned_loss=0.01719, audio_tagging_loss=0.01267, over 15275.00 frames. ], tot_loss[loss=0.09381, simple_loss=0.1101, pruned_loss=0.02768, audio_tagging_loss=0.01109, over 3044852.08 frames. ], batch size: 59, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:55:41,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=490593.3333333333, ans=0.125 2023-11-19 00:55:52,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=490660.0, ans=0.125 2023-11-19 00:56:12,514 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:56:30,786 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1500, loss[loss=0.1145, simple_loss=0.147, pruned_loss=0.03237, audio_tagging_loss=0.008615, over 15223.00 frames. ], tot_loss[loss=0.09464, simple_loss=0.1112, pruned_loss=0.0278, audio_tagging_loss=0.01122, over 3050082.44 frames. ], batch size: 55, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:56:36,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2023-11-19 00:56:38,708 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.41 vs. limit=10.0 2023-11-19 00:57:00,324 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.708e+01 9.682e+01 1.052e+02 1.356e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-19 00:57:00,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=491060.0, ans=0.125 2023-11-19 00:57:19,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=491193.3333333333, ans=0.125 2023-11-19 00:57:21,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=491193.3333333333, ans=0.0 2023-11-19 00:57:25,668 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1550, loss[loss=0.1175, simple_loss=0.1425, pruned_loss=0.03667, audio_tagging_loss=0.009591, over 16130.00 frames. ], tot_loss[loss=0.09451, simple_loss=0.111, pruned_loss=0.0278, audio_tagging_loss=0.01119, over 3050379.05 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:57:31,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2023-11-19 00:57:44,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=491326.6666666667, ans=0.125 2023-11-19 00:57:55,380 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.664e-02 2023-11-19 00:57:55,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=491393.3333333333, ans=0.0 2023-11-19 00:58:19,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=491593.3333333333, ans=0.125 2023-11-19 00:58:20,576 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1600, loss[loss=0.09462, simple_loss=0.1132, pruned_loss=0.02516, audio_tagging_loss=0.01285, over 15219.00 frames. ], tot_loss[loss=0.09587, simple_loss=0.1129, pruned_loss=0.02824, audio_tagging_loss=0.0112, over 3055632.53 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:58:43,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2023-11-19 00:58:47,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=491726.6666666667, ans=0.09899494936611666 2023-11-19 00:58:50,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=491726.6666666667, ans=0.125 2023-11-19 00:58:50,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=491726.6666666667, ans=0.0 2023-11-19 00:58:50,951 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.136e+01 8.782e+01 9.645e+01 1.086e+02 1.733e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-19 00:58:56,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=491793.3333333333, ans=0.125 2023-11-19 00:58:59,891 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2023-11-19 00:59:17,059 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1650, loss[loss=0.07445, simple_loss=0.08384, pruned_loss=0.01846, audio_tagging_loss=0.01408, over 15328.00 frames. ], tot_loss[loss=0.09498, simple_loss=0.1115, pruned_loss=0.02791, audio_tagging_loss=0.01132, over 3046300.60 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:59:23,411 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2023-11-19 00:59:30,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.95 vs. limit=22.5 2023-11-19 00:59:35,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=491993.3333333333, ans=0.2 2023-11-19 00:59:42,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=492060.0, ans=0.2 2023-11-19 00:59:42,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=492060.0, ans=0.2 2023-11-19 00:59:58,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=492126.6666666667, ans=0.1 2023-11-19 01:00:12,743 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1700, loss[loss=0.1106, simple_loss=0.1295, pruned_loss=0.03492, audio_tagging_loss=0.01088, over 14648.00 frames. ], tot_loss[loss=0.09481, simple_loss=0.1116, pruned_loss=0.02777, audio_tagging_loss=0.01126, over 3040015.11 frames. ], batch size: 55, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 01:00:25,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=492326.6666666667, ans=0.125 2023-11-19 01:00:27,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.35 vs. limit=22.5 2023-11-19 01:00:30,181 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=8.0 2023-11-19 01:00:34,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=492393.3333333333, ans=0.2 2023-11-19 01:00:39,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=492393.3333333333, ans=0.2 2023-11-19 01:00:43,065 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.546e+01 9.504e+01 1.048e+02 1.501e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-19 01:00:43,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.09 vs. limit=15.0 2023-11-19 01:00:49,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=492460.0, ans=0.5 2023-11-19 01:00:54,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=492460.0, ans=0.015 2023-11-19 01:01:04,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=492526.6666666667, ans=0.125 2023-11-19 01:01:08,027 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1750, loss[loss=0.08233, simple_loss=0.09118, pruned_loss=0.02247, audio_tagging_loss=0.01427, over 16200.00 frames. ], tot_loss[loss=0.0954, simple_loss=0.1127, pruned_loss=0.02797, audio_tagging_loss=0.01106, over 3048279.94 frames. ], batch size: 60, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 01:01:50,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=492793.3333333333, ans=0.125 2023-11-19 01:02:04,414 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1800, loss[loss=0.07565, simple_loss=0.08471, pruned_loss=0.02067, audio_tagging_loss=0.01263, over 13637.00 frames. ], tot_loss[loss=0.09514, simple_loss=0.1122, pruned_loss=0.02796, audio_tagging_loss=0.01105, over 3052426.93 frames. ], batch size: 52, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:02:13,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=492926.6666666667, ans=10.0 2023-11-19 01:02:18,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=15.0 2023-11-19 01:02:22,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=492993.3333333333, ans=0.1 2023-11-19 01:02:24,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=492993.3333333333, ans=0.125 2023-11-19 01:02:31,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=493060.0, ans=0.125 2023-11-19 01:02:31,274 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:02:33,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=493060.0, ans=0.1 2023-11-19 01:02:34,106 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.111e+01 8.598e+01 9.390e+01 1.040e+02 1.619e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-19 01:03:00,489 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1850, loss[loss=0.08597, simple_loss=0.1018, pruned_loss=0.0258, audio_tagging_loss=0.009254, over 14687.00 frames. ], tot_loss[loss=0.09516, simple_loss=0.1125, pruned_loss=0.02804, audio_tagging_loss=0.01088, over 3042030.45 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:03:05,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=493260.0, ans=0.125 2023-11-19 01:03:08,435 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.00 vs. limit=22.5 2023-11-19 01:03:13,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=493326.6666666667, ans=0.0 2023-11-19 01:03:20,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=493326.6666666667, ans=0.125 2023-11-19 01:03:28,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=493393.3333333333, ans=0.125 2023-11-19 01:03:46,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=493526.6666666667, ans=0.125 2023-11-19 01:03:48,975 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2023-11-19 01:03:55,762 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1900, loss[loss=0.1042, simple_loss=0.1189, pruned_loss=0.03307, audio_tagging_loss=0.01167, over 15903.00 frames. ], tot_loss[loss=0.09394, simple_loss=0.1115, pruned_loss=0.02745, audio_tagging_loss=0.01076, over 3036926.56 frames. ], batch size: 61, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:04:25,810 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.583e+01 8.519e+01 9.193e+01 1.005e+02 1.310e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-19 01:04:28,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2023-11-19 01:04:40,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=493860.0, ans=0.1 2023-11-19 01:04:50,888 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 1950, loss[loss=0.09576, simple_loss=0.1126, pruned_loss=0.02947, audio_tagging_loss=0.01002, over 15152.00 frames. ], tot_loss[loss=0.09403, simple_loss=0.1116, pruned_loss=0.0274, audio_tagging_loss=0.01083, over 3037442.71 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:05:05,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=493993.3333333333, ans=0.125 2023-11-19 01:05:27,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=494126.6666666667, ans=0.125 2023-11-19 01:05:28,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=494126.6666666667, ans=0.125 2023-11-19 01:05:47,464 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2000, loss[loss=0.09107, simple_loss=0.1071, pruned_loss=0.02484, audio_tagging_loss=0.01269, over 15931.00 frames. ], tot_loss[loss=0.09461, simple_loss=0.1119, pruned_loss=0.02777, audio_tagging_loss=0.01091, over 3044777.19 frames. ], batch size: 59, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:05:54,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2023-11-19 01:05:59,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=494326.6666666667, ans=0.125 2023-11-19 01:06:01,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=494326.6666666667, ans=0.0 2023-11-19 01:06:09,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=494393.3333333333, ans=0.125 2023-11-19 01:06:10,584 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2023-11-19 01:06:11,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=494393.3333333333, ans=0.125 2023-11-19 01:06:16,506 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.706e+01 9.238e+01 1.036e+02 1.404e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 01:06:25,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=494460.0, ans=0.125 2023-11-19 01:06:39,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=494526.6666666667, ans=0.125 2023-11-19 01:06:42,727 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2050, loss[loss=0.09947, simple_loss=0.1126, pruned_loss=0.03119, audio_tagging_loss=0.012, over 14598.00 frames. ], tot_loss[loss=0.09491, simple_loss=0.1121, pruned_loss=0.02795, audio_tagging_loss=0.01092, over 3047261.63 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:06:46,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=494593.3333333333, ans=0.2 2023-11-19 01:07:15,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.99 vs. limit=22.5 2023-11-19 01:07:24,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=494793.3333333333, ans=0.05 2023-11-19 01:07:30,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=494860.0, ans=0.125 2023-11-19 01:07:37,157 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:07:39,054 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2100, loss[loss=0.08387, simple_loss=0.0975, pruned_loss=0.02409, audio_tagging_loss=0.01103, over 15206.00 frames. ], tot_loss[loss=0.0947, simple_loss=0.112, pruned_loss=0.02787, audio_tagging_loss=0.01082, over 3039782.03 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:07:39,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.74 vs. limit=15.0 2023-11-19 01:07:59,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=494993.3333333333, ans=0.0 2023-11-19 01:08:09,432 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.837e+01 9.503e+01 1.029e+02 1.417e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-19 01:08:10,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=495060.0, ans=0.0 2023-11-19 01:08:13,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=495126.6666666667, ans=0.2 2023-11-19 01:08:35,530 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2150, loss[loss=0.1225, simple_loss=0.1344, pruned_loss=0.04295, audio_tagging_loss=0.01235, over 15607.00 frames. ], tot_loss[loss=0.09522, simple_loss=0.1127, pruned_loss=0.02802, audio_tagging_loss=0.01087, over 3047322.49 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:08:35,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=495260.0, ans=0.2 2023-11-19 01:08:39,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=495260.0, ans=0.125 2023-11-19 01:08:56,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=495393.3333333333, ans=0.125 2023-11-19 01:08:59,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=495393.3333333333, ans=0.0 2023-11-19 01:09:02,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=495393.3333333333, ans=0.125 2023-11-19 01:09:03,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=495393.3333333333, ans=0.125 2023-11-19 01:09:10,121 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:09:22,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.94 vs. limit=15.0 2023-11-19 01:09:23,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=495526.6666666667, ans=0.2 2023-11-19 01:09:30,869 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2023-11-19 01:09:31,317 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2200, loss[loss=0.09808, simple_loss=0.1125, pruned_loss=0.02508, audio_tagging_loss=0.01673, over 15003.00 frames. ], tot_loss[loss=0.09465, simple_loss=0.1121, pruned_loss=0.02771, audio_tagging_loss=0.01088, over 3045009.66 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:09:37,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=495593.3333333333, ans=0.1 2023-11-19 01:09:39,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=495593.3333333333, ans=0.125 2023-11-19 01:09:48,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2023-11-19 01:09:50,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=495660.0, ans=0.2 2023-11-19 01:09:57,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=495726.6666666667, ans=0.125 2023-11-19 01:09:59,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=495726.6666666667, ans=0.2 2023-11-19 01:10:02,325 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 8.544e+01 9.673e+01 1.053e+02 1.527e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-19 01:10:12,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=495793.3333333333, ans=0.95 2023-11-19 01:10:16,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=495860.0, ans=0.09899494936611666 2023-11-19 01:10:17,079 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2023-11-19 01:10:17,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=495860.0, ans=15.0 2023-11-19 01:10:21,274 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.24 vs. limit=22.5 2023-11-19 01:10:26,515 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2250, loss[loss=0.08578, simple_loss=0.09131, pruned_loss=0.02872, audio_tagging_loss=0.0114, over 14316.00 frames. ], tot_loss[loss=0.09502, simple_loss=0.1124, pruned_loss=0.02789, audio_tagging_loss=0.01091, over 3039229.18 frames. ], batch size: 59, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:10:39,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=495993.3333333333, ans=0.2 2023-11-19 01:10:39,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.19 vs. limit=22.5 2023-11-19 01:10:41,680 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:11:02,835 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=15.0 2023-11-19 01:11:09,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=496126.6666666667, ans=0.0 2023-11-19 01:11:19,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=496193.3333333333, ans=0.125 2023-11-19 01:11:23,059 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2300, loss[loss=0.05658, simple_loss=0.05929, pruned_loss=0.01403, audio_tagging_loss=0.01291, over 14932.00 frames. ], tot_loss[loss=0.09478, simple_loss=0.1122, pruned_loss=0.02771, audio_tagging_loss=0.01095, over 3041103.50 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:11:23,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=496260.0, ans=0.07 2023-11-19 01:11:49,026 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2023-11-19 01:11:53,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=496393.3333333333, ans=0.0 2023-11-19 01:11:53,800 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 9.064e+01 9.790e+01 1.107e+02 1.454e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-19 01:11:55,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496460.0, ans=0.1 2023-11-19 01:12:09,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=496526.6666666667, ans=0.5 2023-11-19 01:12:12,943 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:12:13,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=496526.6666666667, ans=0.2 2023-11-19 01:12:14,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=496526.6666666667, ans=0.035 2023-11-19 01:12:18,213 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2350, loss[loss=0.08464, simple_loss=0.09821, pruned_loss=0.02555, audio_tagging_loss=0.009985, over 14397.00 frames. ], tot_loss[loss=0.09581, simple_loss=0.1131, pruned_loss=0.02835, audio_tagging_loss=0.01093, over 3043113.70 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:12:19,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=496593.3333333333, ans=0.2 2023-11-19 01:12:24,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=496593.3333333333, ans=0.0 2023-11-19 01:12:28,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=496660.0, ans=0.125 2023-11-19 01:12:55,763 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:13:00,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=496793.3333333333, ans=0.0 2023-11-19 01:13:14,597 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2400, loss[loss=0.1161, simple_loss=0.1521, pruned_loss=0.0327, audio_tagging_loss=0.007412, over 15923.00 frames. ], tot_loss[loss=0.09584, simple_loss=0.1128, pruned_loss=0.02841, audio_tagging_loss=0.01103, over 3032952.69 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:13:18,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=496926.6666666667, ans=0.125 2023-11-19 01:13:36,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=497060.0, ans=0.1 2023-11-19 01:13:37,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=497060.0, ans=0.125 2023-11-19 01:13:40,379 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2023-11-19 01:13:45,238 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.739e+01 9.521e+01 1.008e+02 1.350e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 01:14:10,592 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2450, loss[loss=0.08977, simple_loss=0.1065, pruned_loss=0.02377, audio_tagging_loss=0.01274, over 14646.00 frames. ], tot_loss[loss=0.09568, simple_loss=0.1127, pruned_loss=0.02824, audio_tagging_loss=0.01109, over 3032870.19 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:14:15,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=497260.0, ans=0.1 2023-11-19 01:15:02,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=497526.6666666667, ans=0.025 2023-11-19 01:15:02,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=497526.6666666667, ans=0.125 2023-11-19 01:15:02,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=497526.6666666667, ans=0.0 2023-11-19 01:15:02,473 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.52 vs. limit=22.5 2023-11-19 01:15:06,178 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2500, loss[loss=0.09768, simple_loss=0.1125, pruned_loss=0.02834, audio_tagging_loss=0.01311, over 15830.00 frames. ], tot_loss[loss=0.09533, simple_loss=0.1124, pruned_loss=0.02819, audio_tagging_loss=0.01095, over 3030065.09 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:15:07,911 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.85 vs. limit=22.5 2023-11-19 01:15:18,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=497660.0, ans=0.1 2023-11-19 01:15:27,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=497726.6666666667, ans=0.0 2023-11-19 01:15:28,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=497726.6666666667, ans=0.125 2023-11-19 01:15:34,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2023-11-19 01:15:37,855 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.172e+01 8.612e+01 9.372e+01 1.003e+02 1.252e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 01:16:01,818 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2550, loss[loss=0.07408, simple_loss=0.08817, pruned_loss=0.01914, audio_tagging_loss=0.01085, over 14606.00 frames. ], tot_loss[loss=0.09517, simple_loss=0.1121, pruned_loss=0.02815, audio_tagging_loss=0.01098, over 3031344.15 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:16:27,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=498060.0, ans=0.0 2023-11-19 01:16:31,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=498060.0, ans=0.0 2023-11-19 01:16:43,134 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:16:46,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=498193.3333333333, ans=0.125 2023-11-19 01:16:57,738 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2600, loss[loss=0.1043, simple_loss=0.1341, pruned_loss=0.02973, audio_tagging_loss=0.007506, over 14742.00 frames. ], tot_loss[loss=0.09505, simple_loss=0.1124, pruned_loss=0.02809, audio_tagging_loss=0.01076, over 3038162.22 frames. ], batch size: 54, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:17:07,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=498326.6666666667, ans=0.125 2023-11-19 01:17:07,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=498326.6666666667, ans=0.0 2023-11-19 01:17:09,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=498326.6666666667, ans=0.1 2023-11-19 01:17:14,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=498326.6666666667, ans=0.035 2023-11-19 01:17:15,834 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.51 vs. limit=15.0 2023-11-19 01:17:28,699 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.310e+01 9.022e+01 9.947e+01 2.048e+02, threshold=1.804e+02, percent-clipped=1.0 2023-11-19 01:17:33,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=498460.0, ans=0.2 2023-11-19 01:17:43,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=498526.6666666667, ans=0.025 2023-11-19 01:17:53,112 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2650, loss[loss=0.1297, simple_loss=0.1537, pruned_loss=0.04397, audio_tagging_loss=0.008877, over 14778.00 frames. ], tot_loss[loss=0.09595, simple_loss=0.1136, pruned_loss=0.02851, audio_tagging_loss=0.01062, over 3036938.30 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:18:14,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=498726.6666666667, ans=15.0 2023-11-19 01:18:14,852 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2023-11-19 01:18:28,759 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=12.0 2023-11-19 01:18:48,516 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2700, loss[loss=0.08656, simple_loss=0.1114, pruned_loss=0.02129, audio_tagging_loss=0.009563, over 15530.00 frames. ], tot_loss[loss=0.095, simple_loss=0.1124, pruned_loss=0.02813, audio_tagging_loss=0.01066, over 3038686.40 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:18:55,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=498926.6666666667, ans=0.0 2023-11-19 01:18:56,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.66 vs. limit=15.0 2023-11-19 01:19:20,351 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.993e+01 8.454e+01 8.998e+01 9.749e+01 1.436e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 01:19:22,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=499126.6666666667, ans=0.07 2023-11-19 01:19:24,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=499126.6666666667, ans=0.015 2023-11-19 01:19:33,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=499193.3333333333, ans=0.125 2023-11-19 01:19:44,788 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2750, loss[loss=0.1121, simple_loss=0.1281, pruned_loss=0.03579, audio_tagging_loss=0.01224, over 14853.00 frames. ], tot_loss[loss=0.09493, simple_loss=0.1124, pruned_loss=0.02799, audio_tagging_loss=0.01072, over 3051224.48 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:19:58,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=499326.6666666667, ans=0.0 2023-11-19 01:20:19,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=499460.0, ans=0.1 2023-11-19 01:20:25,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-19 01:20:30,588 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2023-11-19 01:20:33,922 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:20:40,205 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2800, loss[loss=0.1087, simple_loss=0.1345, pruned_loss=0.03345, audio_tagging_loss=0.008049, over 14796.00 frames. ], tot_loss[loss=0.09413, simple_loss=0.1114, pruned_loss=0.02771, audio_tagging_loss=0.01074, over 3048766.33 frames. ], batch size: 54, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:20:53,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=499660.0, ans=0.0 2023-11-19 01:20:56,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=499660.0, ans=0.125 2023-11-19 01:20:56,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=499660.0, ans=0.1 2023-11-19 01:21:11,309 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 8.889e+01 9.395e+01 1.013e+02 1.273e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 01:21:31,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=499860.0, ans=0.0 2023-11-19 01:21:35,080 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2850, loss[loss=0.08297, simple_loss=0.08207, pruned_loss=0.02791, audio_tagging_loss=0.01402, over 14062.00 frames. ], tot_loss[loss=0.0937, simple_loss=0.1108, pruned_loss=0.02749, audio_tagging_loss=0.01083, over 3043230.96 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:21:52,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=499993.3333333333, ans=0.125 2023-11-19 01:22:13,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=500126.6666666667, ans=0.2 2023-11-19 01:22:30,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2023-11-19 01:22:32,054 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2900, loss[loss=0.07648, simple_loss=0.09178, pruned_loss=0.01903, audio_tagging_loss=0.01156, over 14476.00 frames. ], tot_loss[loss=0.0939, simple_loss=0.111, pruned_loss=0.02757, audio_tagging_loss=0.01083, over 3041359.71 frames. ], batch size: 59, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:22:34,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2023-11-19 01:22:52,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.18 vs. limit=22.5 2023-11-19 01:22:53,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=500393.3333333333, ans=0.125 2023-11-19 01:22:58,686 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2023-11-19 01:23:02,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.772e+01 8.483e+01 9.561e+01 1.049e+02 1.503e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-19 01:23:08,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=500460.0, ans=0.125 2023-11-19 01:23:14,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=500460.0, ans=0.0 2023-11-19 01:23:16,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=500526.6666666667, ans=0.1 2023-11-19 01:23:23,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=500526.6666666667, ans=0.1 2023-11-19 01:23:27,926 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 2950, loss[loss=0.09969, simple_loss=0.1174, pruned_loss=0.02971, audio_tagging_loss=0.01129, over 14927.00 frames. ], tot_loss[loss=0.09469, simple_loss=0.1116, pruned_loss=0.028, audio_tagging_loss=0.01089, over 3051254.82 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:23:32,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=500593.3333333333, ans=0.0 2023-11-19 01:23:55,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=500726.6666666667, ans=0.125 2023-11-19 01:23:57,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=500726.6666666667, ans=0.04949747468305833 2023-11-19 01:23:58,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=500726.6666666667, ans=0.1 2023-11-19 01:24:02,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=500793.3333333333, ans=0.025 2023-11-19 01:24:21,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=500926.6666666667, ans=0.125 2023-11-19 01:24:22,440 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3000, loss[loss=0.09676, simple_loss=0.1165, pruned_loss=0.02516, audio_tagging_loss=0.01336, over 15739.00 frames. ], tot_loss[loss=0.09462, simple_loss=0.1116, pruned_loss=0.02784, audio_tagging_loss=0.011, over 3057234.25 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:24:22,441 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 01:24:36,775 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9603, 3.4478, 3.9841, 3.6170], device='cuda:1') 2023-11-19 01:24:54,901 INFO [train_asr.py:1147] (1/4) Epoch 7, validation: loss=0.06857, simple_loss=0.05795, pruned_loss=0.007692, audio_tagging_loss=0.0319, over 4681554.00 frames. 2023-11-19 01:24:54,902 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 01:25:24,917 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.894e+01 8.798e+01 9.751e+01 1.102e+02 1.409e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-19 01:25:26,172 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=501126.6666666667, ans=0.125 2023-11-19 01:25:28,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=28.74 vs. limit=15.0 2023-11-19 01:25:41,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.62 vs. limit=22.5 2023-11-19 01:25:46,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.25 vs. limit=22.5 2023-11-19 01:25:47,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=501193.3333333333, ans=0.1 2023-11-19 01:25:49,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.04 vs. limit=10.0 2023-11-19 01:25:49,871 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:25:50,036 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2023-11-19 01:25:50,599 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3050, loss[loss=0.1059, simple_loss=0.1224, pruned_loss=0.03287, audio_tagging_loss=0.01184, over 14680.00 frames. ], tot_loss[loss=0.09466, simple_loss=0.1115, pruned_loss=0.0279, audio_tagging_loss=0.01102, over 3047002.49 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:25:55,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=501260.0, ans=0.1 2023-11-19 01:25:58,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=501260.0, ans=0.2 2023-11-19 01:26:02,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=501326.6666666667, ans=0.125 2023-11-19 01:26:15,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=501393.3333333333, ans=0.2 2023-11-19 01:26:25,156 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:26:34,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=501526.6666666667, ans=0.2 2023-11-19 01:26:46,167 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3100, loss[loss=0.0971, simple_loss=0.1131, pruned_loss=0.03049, audio_tagging_loss=0.01007, over 15991.00 frames. ], tot_loss[loss=0.09455, simple_loss=0.1111, pruned_loss=0.02788, audio_tagging_loss=0.01114, over 3041106.13 frames. ], batch size: 60, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:27:17,742 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.054e+01 8.685e+01 9.204e+01 1.020e+02 1.427e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 01:27:36,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=501860.0, ans=0.125 2023-11-19 01:27:42,158 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3150, loss[loss=0.1041, simple_loss=0.1264, pruned_loss=0.03174, audio_tagging_loss=0.009219, over 14462.00 frames. ], tot_loss[loss=0.09478, simple_loss=0.1115, pruned_loss=0.02785, audio_tagging_loss=0.01117, over 3052180.92 frames. ], batch size: 54, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:27:52,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=501993.3333333333, ans=0.1 2023-11-19 01:28:04,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=502060.0, ans=0.0 2023-11-19 01:28:14,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=502126.6666666667, ans=0.0 2023-11-19 01:28:33,010 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=12.0 2023-11-19 01:28:37,935 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3200, loss[loss=0.1042, simple_loss=0.1211, pruned_loss=0.03354, audio_tagging_loss=0.01008, over 16261.00 frames. ], tot_loss[loss=0.09502, simple_loss=0.1117, pruned_loss=0.02789, audio_tagging_loss=0.01129, over 3055887.15 frames. ], batch size: 60, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:28:54,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=502326.6666666667, ans=0.125 2023-11-19 01:28:54,950 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:29:08,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.725e+01 8.572e+01 9.353e+01 1.015e+02 1.372e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-19 01:29:16,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=502460.0, ans=0.2 2023-11-19 01:29:26,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=502526.6666666667, ans=0.2 2023-11-19 01:29:33,136 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3250, loss[loss=0.08836, simple_loss=0.1116, pruned_loss=0.02161, audio_tagging_loss=0.01095, over 15445.00 frames. ], tot_loss[loss=0.09496, simple_loss=0.1115, pruned_loss=0.02781, audio_tagging_loss=0.01139, over 3059701.61 frames. ], batch size: 60, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:29:48,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=502660.0, ans=0.125 2023-11-19 01:29:51,950 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.71 vs. limit=22.5 2023-11-19 01:29:58,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=502726.6666666667, ans=0.125 2023-11-19 01:30:04,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=502726.6666666667, ans=0.1 2023-11-19 01:30:15,477 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2023-11-19 01:30:19,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=502860.0, ans=0.0 2023-11-19 01:30:29,458 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3300, loss[loss=0.105, simple_loss=0.1131, pruned_loss=0.0376, audio_tagging_loss=0.01081, over 15647.00 frames. ], tot_loss[loss=0.09506, simple_loss=0.1118, pruned_loss=0.0278, audio_tagging_loss=0.01136, over 3058229.72 frames. ], batch size: 59, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:30:38,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=502926.6666666667, ans=0.125 2023-11-19 01:31:00,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=12.0 2023-11-19 01:31:00,698 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.005e+01 8.481e+01 9.247e+01 1.047e+02 1.658e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-19 01:31:03,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=503126.6666666667, ans=0.07 2023-11-19 01:31:26,462 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3350, loss[loss=0.09075, simple_loss=0.09905, pruned_loss=0.0289, audio_tagging_loss=0.01233, over 15200.00 frames. ], tot_loss[loss=0.09516, simple_loss=0.1119, pruned_loss=0.028, audio_tagging_loss=0.0112, over 3059290.49 frames. ], batch size: 57, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:31:39,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=503326.6666666667, ans=0.125 2023-11-19 01:32:13,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=503526.6666666667, ans=0.0 2023-11-19 01:32:21,456 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3400, loss[loss=0.1015, simple_loss=0.1211, pruned_loss=0.02882, audio_tagging_loss=0.01216, over 14587.00 frames. ], tot_loss[loss=0.09519, simple_loss=0.1122, pruned_loss=0.02802, audio_tagging_loss=0.01107, over 3057871.96 frames. ], batch size: 55, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:32:21,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=503593.3333333333, ans=0.125 2023-11-19 01:32:27,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=503593.3333333333, ans=0.1 2023-11-19 01:32:39,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=503660.0, ans=0.0 2023-11-19 01:32:47,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=503726.6666666667, ans=0.0 2023-11-19 01:32:53,221 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.287e+01 9.053e+01 9.903e+01 1.231e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 01:33:03,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=503793.3333333333, ans=0.125 2023-11-19 01:33:05,708 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.76 vs. limit=15.0 2023-11-19 01:33:09,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.98 vs. limit=15.0 2023-11-19 01:33:17,099 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3450, loss[loss=0.1066, simple_loss=0.1368, pruned_loss=0.02906, audio_tagging_loss=0.009184, over 15463.00 frames. ], tot_loss[loss=0.09612, simple_loss=0.1135, pruned_loss=0.02845, audio_tagging_loss=0.0109, over 3045102.79 frames. ], batch size: 58, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:33:18,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.42 vs. limit=15.0 2023-11-19 01:33:39,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=504060.0, ans=0.125 2023-11-19 01:34:06,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=504193.3333333333, ans=0.025 2023-11-19 01:34:13,640 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3500, loss[loss=0.1065, simple_loss=0.1339, pruned_loss=0.03046, audio_tagging_loss=0.00909, over 15786.00 frames. ], tot_loss[loss=0.09567, simple_loss=0.1132, pruned_loss=0.02826, audio_tagging_loss=0.01082, over 3047662.44 frames. ], batch size: 61, lr: 1.00e-02, grad_scale: 16.0 2023-11-19 01:34:15,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=504260.0, ans=0.035 2023-11-19 01:34:30,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=504326.6666666667, ans=0.125 2023-11-19 01:34:41,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=504393.3333333333, ans=0.125 2023-11-19 01:34:41,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=504393.3333333333, ans=0.125 2023-11-19 01:34:43,107 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:34:45,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.641e+01 8.567e+01 9.282e+01 1.040e+02 1.334e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-19 01:34:58,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=504526.6666666667, ans=0.125 2023-11-19 01:35:09,361 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3550, loss[loss=0.09978, simple_loss=0.1143, pruned_loss=0.03229, audio_tagging_loss=0.01037, over 15862.00 frames. ], tot_loss[loss=0.09522, simple_loss=0.1122, pruned_loss=0.02822, audio_tagging_loss=0.01089, over 3045340.46 frames. ], batch size: 61, lr: 1.00e-02, grad_scale: 16.0 2023-11-19 01:35:22,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.55 vs. limit=15.0 2023-11-19 01:35:45,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=504793.3333333333, ans=0.125 2023-11-19 01:35:49,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=504793.3333333333, ans=0.0 2023-11-19 01:35:51,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=504793.3333333333, ans=0.1 2023-11-19 01:35:54,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=504860.0, ans=0.1 2023-11-19 01:35:58,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=504860.0, ans=0.125 2023-11-19 01:36:01,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=504860.0, ans=0.0 2023-11-19 01:36:04,814 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3600, loss[loss=0.1243, simple_loss=0.1394, pruned_loss=0.04204, audio_tagging_loss=0.01252, over 14718.00 frames. ], tot_loss[loss=0.09536, simple_loss=0.1123, pruned_loss=0.0284, audio_tagging_loss=0.01082, over 3045302.85 frames. ], batch size: 56, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:36:23,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=504993.3333333333, ans=0.1 2023-11-19 01:36:31,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2023-11-19 01:36:35,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.58 vs. limit=12.0 2023-11-19 01:36:36,843 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.969e+01 8.561e+01 9.359e+01 1.025e+02 1.551e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 01:36:42,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=505126.6666666667, ans=0.125 2023-11-19 01:36:44,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=12.0 2023-11-19 01:36:59,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=505260.0, ans=0.0 2023-11-19 01:37:00,645 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3650, loss[loss=0.05531, simple_loss=0.06327, pruned_loss=0.009792, audio_tagging_loss=0.01389, over 15842.00 frames. ], tot_loss[loss=0.0942, simple_loss=0.1108, pruned_loss=0.02792, audio_tagging_loss=0.01087, over 3045659.41 frames. ], batch size: 61, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:37:27,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=505393.3333333333, ans=0.125 2023-11-19 01:37:27,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.65 vs. limit=22.5 2023-11-19 01:37:33,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=505460.0, ans=0.0 2023-11-19 01:37:55,876 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3700, loss[loss=0.0964, simple_loss=0.1042, pruned_loss=0.03154, audio_tagging_loss=0.01278, over 15198.00 frames. ], tot_loss[loss=0.09374, simple_loss=0.1101, pruned_loss=0.02772, audio_tagging_loss=0.01095, over 3053555.90 frames. ], batch size: 59, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:38:13,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.92 vs. limit=15.0 2023-11-19 01:38:28,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.900e+01 9.822e+01 1.122e+02 1.774e+02, threshold=1.964e+02, percent-clipped=0.0 2023-11-19 01:38:41,426 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:38:49,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=505860.0, ans=0.0 2023-11-19 01:38:51,766 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3750, loss[loss=0.1026, simple_loss=0.1148, pruned_loss=0.03105, audio_tagging_loss=0.01413, over 14313.00 frames. ], tot_loss[loss=0.09389, simple_loss=0.1103, pruned_loss=0.02775, audio_tagging_loss=0.01097, over 3061626.84 frames. ], batch size: 55, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:39:21,605 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.52 vs. limit=15.0 2023-11-19 01:39:26,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.09 vs. limit=22.5 2023-11-19 01:39:29,282 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.68 vs. limit=22.5 2023-11-19 01:39:30,359 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.29 vs. limit=15.0 2023-11-19 01:39:30,900 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:39:48,458 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3800, loss[loss=0.08073, simple_loss=0.08815, pruned_loss=0.0237, audio_tagging_loss=0.01295, over 14162.00 frames. ], tot_loss[loss=0.09377, simple_loss=0.1103, pruned_loss=0.02763, audio_tagging_loss=0.01101, over 3053851.74 frames. ], batch size: 56, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:40:03,969 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=15.0 2023-11-19 01:40:14,909 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.12 vs. limit=15.0 2023-11-19 01:40:20,051 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.858e+01 8.899e+01 9.421e+01 1.052e+02 1.490e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-19 01:40:43,420 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3850, loss[loss=0.08742, simple_loss=0.09339, pruned_loss=0.02605, audio_tagging_loss=0.01467, over 14332.00 frames. ], tot_loss[loss=0.09545, simple_loss=0.1122, pruned_loss=0.0283, audio_tagging_loss=0.01103, over 3062123.92 frames. ], batch size: 55, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:41:10,429 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2023-11-19 01:41:10,575 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.41 vs. limit=15.0 2023-11-19 01:41:24,356 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:41:26,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=506793.3333333333, ans=10.0 2023-11-19 01:41:30,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=506860.0, ans=0.125 2023-11-19 01:41:41,589 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3900, loss[loss=0.1019, simple_loss=0.1327, pruned_loss=0.02867, audio_tagging_loss=0.006947, over 15528.00 frames. ], tot_loss[loss=0.09564, simple_loss=0.1126, pruned_loss=0.02827, audio_tagging_loss=0.01109, over 3055375.05 frames. ], batch size: 56, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:42:00,775 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.19 vs. limit=15.0 2023-11-19 01:42:03,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=507060.0, ans=0.125 2023-11-19 01:42:13,963 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.813e+01 8.583e+01 9.293e+01 1.003e+02 1.876e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 01:42:24,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=507126.6666666667, ans=0.1 2023-11-19 01:42:26,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=507193.3333333333, ans=0.0 2023-11-19 01:42:31,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=507193.3333333333, ans=0.0 2023-11-19 01:42:35,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=507193.3333333333, ans=0.125 2023-11-19 01:42:38,334 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 3950, loss[loss=0.1035, simple_loss=0.1248, pruned_loss=0.03235, audio_tagging_loss=0.008704, over 14260.00 frames. ], tot_loss[loss=0.09489, simple_loss=0.112, pruned_loss=0.02779, audio_tagging_loss=0.0111, over 3058912.28 frames. ], batch size: 54, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:42:47,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=507260.0, ans=0.2 2023-11-19 01:42:51,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=507326.6666666667, ans=0.0 2023-11-19 01:43:04,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=507393.3333333333, ans=0.125 2023-11-19 01:43:13,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=507460.0, ans=0.125 2023-11-19 01:43:31,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=507526.6666666667, ans=0.1 2023-11-19 01:43:33,330 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4000, loss[loss=0.09672, simple_loss=0.1195, pruned_loss=0.02721, audio_tagging_loss=0.009777, over 15456.00 frames. ], tot_loss[loss=0.09562, simple_loss=0.1129, pruned_loss=0.02806, audio_tagging_loss=0.01112, over 3055471.77 frames. ], batch size: 57, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:43:44,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.13 vs. limit=15.0 2023-11-19 01:43:59,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=507726.6666666667, ans=0.125 2023-11-19 01:44:06,446 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.614e+01 9.104e+01 9.882e+01 1.124e+02 1.409e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-19 01:44:28,701 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4050, loss[loss=0.1091, simple_loss=0.1302, pruned_loss=0.03446, audio_tagging_loss=0.009569, over 14799.00 frames. ], tot_loss[loss=0.09627, simple_loss=0.1134, pruned_loss=0.02844, audio_tagging_loss=0.01112, over 3044608.65 frames. ], batch size: 55, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:44:32,454 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:44:38,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=507926.6666666667, ans=0.125 2023-11-19 01:44:46,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=507993.3333333333, ans=0.125 2023-11-19 01:44:52,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=508060.0, ans=0.0 2023-11-19 01:44:53,874 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2023-11-19 01:44:56,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.56 vs. limit=22.5 2023-11-19 01:45:24,975 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4100, loss[loss=0.0685, simple_loss=0.08489, pruned_loss=0.0152, audio_tagging_loss=0.01086, over 14094.00 frames. ], tot_loss[loss=0.09625, simple_loss=0.1135, pruned_loss=0.02838, audio_tagging_loss=0.01112, over 3043567.72 frames. ], batch size: 54, lr: 9.99e-03, grad_scale: 32.0 2023-11-19 01:45:29,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=508260.0, ans=0.0 2023-11-19 01:45:29,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=508260.0, ans=0.1 2023-11-19 01:45:33,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=508260.0, ans=0.2 2023-11-19 01:45:36,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=508326.6666666667, ans=15.0 2023-11-19 01:45:51,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=508393.3333333333, ans=0.05 2023-11-19 01:45:56,110 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.454e+01 9.161e+01 9.715e+01 1.284e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 01:45:57,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=508460.0, ans=0.2 2023-11-19 01:46:12,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-11-19 01:46:15,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2023-11-19 01:46:20,585 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4150, loss[loss=0.06823, simple_loss=0.08484, pruned_loss=0.01523, audio_tagging_loss=0.01058, over 16257.00 frames. ], tot_loss[loss=0.09537, simple_loss=0.1127, pruned_loss=0.02802, audio_tagging_loss=0.01103, over 3039882.19 frames. ], batch size: 61, lr: 9.99e-03, grad_scale: 32.0 2023-11-19 01:46:34,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=508660.0, ans=0.95 2023-11-19 01:46:46,639 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-19 01:46:55,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-11-19 01:46:56,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=508793.3333333333, ans=0.1 2023-11-19 01:46:57,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=508793.3333333333, ans=0.125 2023-11-19 01:47:01,995 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:47:15,820 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4200, loss[loss=0.1046, simple_loss=0.1219, pruned_loss=0.03185, audio_tagging_loss=0.01183, over 14815.00 frames. ], tot_loss[loss=0.09538, simple_loss=0.1129, pruned_loss=0.02813, audio_tagging_loss=0.01081, over 3039819.62 frames. ], batch size: 57, lr: 9.99e-03, grad_scale: 32.0 2023-11-19 01:47:21,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=508926.6666666667, ans=0.2 2023-11-19 01:47:23,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=508926.6666666667, ans=0.125 2023-11-19 01:47:29,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=508993.3333333333, ans=0.0 2023-11-19 01:47:36,260 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-11-19 01:47:37,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=508993.3333333333, ans=0.2 2023-11-19 01:47:48,448 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.868e+01 9.396e+01 1.025e+02 1.839e+02, threshold=1.879e+02, percent-clipped=1.0 2023-11-19 01:47:50,182 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.11 vs. limit=10.0 2023-11-19 01:47:59,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=509193.3333333333, ans=0.05 2023-11-19 01:48:11,860 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4250, loss[loss=0.09046, simple_loss=0.1122, pruned_loss=0.02497, audio_tagging_loss=0.009377, over 15392.00 frames. ], tot_loss[loss=0.09559, simple_loss=0.1131, pruned_loss=0.02827, audio_tagging_loss=0.01078, over 3043291.40 frames. ], batch size: 58, lr: 9.98e-03, grad_scale: 32.0 2023-11-19 01:48:13,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=509260.0, ans=0.2 2023-11-19 01:48:20,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=509260.0, ans=0.0 2023-11-19 01:48:20,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=509260.0, ans=0.05 2023-11-19 01:48:20,910 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2023-11-19 01:48:39,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.29 vs. limit=15.0 2023-11-19 01:48:40,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=509393.3333333333, ans=0.125 2023-11-19 01:48:41,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=509393.3333333333, ans=0.09899494936611666 2023-11-19 01:48:58,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=509526.6666666667, ans=0.0 2023-11-19 01:49:04,196 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2023-11-19 01:49:06,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=15.0 2023-11-19 01:49:07,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=509593.3333333333, ans=0.0 2023-11-19 01:49:07,986 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4300, loss[loss=0.09034, simple_loss=0.1021, pruned_loss=0.02572, audio_tagging_loss=0.01355, over 15238.00 frames. ], tot_loss[loss=0.09694, simple_loss=0.1148, pruned_loss=0.02886, audio_tagging_loss=0.01065, over 3052886.27 frames. ], batch size: 59, lr: 9.98e-03, grad_scale: 32.0 2023-11-19 01:49:10,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=509593.3333333333, ans=0.09899494936611666 2023-11-19 01:49:17,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=509660.0, ans=0.1 2023-11-19 01:49:39,661 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.988e+01 8.902e+01 9.994e+01 1.090e+02 2.369e+02, threshold=1.999e+02, percent-clipped=2.0 2023-11-19 01:49:42,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=509793.3333333333, ans=0.0 2023-11-19 01:49:51,500 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=12.0 2023-11-19 01:49:52,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=509860.0, ans=0.125 2023-11-19 01:49:57,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=509860.0, ans=0.0 2023-11-19 01:50:02,568 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4350, loss[loss=0.1094, simple_loss=0.1293, pruned_loss=0.0344, audio_tagging_loss=0.0104, over 14897.00 frames. ], tot_loss[loss=0.0969, simple_loss=0.1152, pruned_loss=0.02875, audio_tagging_loss=0.01055, over 3054805.15 frames. ], batch size: 56, lr: 9.98e-03, grad_scale: 32.0 2023-11-19 01:50:17,134 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=509993.3333333333, ans=0.125 2023-11-19 01:50:41,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=510126.6666666667, ans=0.015 2023-11-19 01:50:58,220 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4400, loss[loss=0.08219, simple_loss=0.1021, pruned_loss=0.02154, audio_tagging_loss=0.009604, over 14832.00 frames. ], tot_loss[loss=0.09656, simple_loss=0.1148, pruned_loss=0.02861, audio_tagging_loss=0.01057, over 3051213.06 frames. ], batch size: 57, lr: 9.98e-03, grad_scale: 32.0 2023-11-19 01:50:58,543 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=510260.0, ans=0.0 2023-11-19 01:51:00,754 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.18 vs. limit=22.5 2023-11-19 01:51:02,043 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=510260.0, ans=0.1 2023-11-19 01:51:06,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=510260.0, ans=0.125 2023-11-19 01:51:14,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=510326.6666666667, ans=0.125 2023-11-19 01:51:16,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=510326.6666666667, ans=0.1 2023-11-19 01:51:30,391 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.846e+01 8.237e+01 9.053e+01 9.942e+01 1.233e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 01:51:40,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=510460.0, ans=0.125 2023-11-19 01:51:54,568 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4450, loss[loss=0.1244, simple_loss=0.1587, pruned_loss=0.03751, audio_tagging_loss=0.007544, over 15774.00 frames. ], tot_loss[loss=0.09613, simple_loss=0.1142, pruned_loss=0.0285, audio_tagging_loss=0.01052, over 3052815.11 frames. ], batch size: 55, lr: 9.97e-03, grad_scale: 32.0 2023-11-19 01:52:08,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=510660.0, ans=0.125 2023-11-19 01:52:13,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=510660.0, ans=0.0 2023-11-19 01:52:20,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=510726.6666666667, ans=0.125 2023-11-19 01:52:21,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=510726.6666666667, ans=0.125 2023-11-19 01:52:25,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=510726.6666666667, ans=0.2 2023-11-19 01:52:40,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=510860.0, ans=0.04949747468305833 2023-11-19 01:52:49,750 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4500, loss[loss=0.08951, simple_loss=0.1051, pruned_loss=0.02651, audio_tagging_loss=0.01044, over 14497.00 frames. ], tot_loss[loss=0.09532, simple_loss=0.1134, pruned_loss=0.02809, audio_tagging_loss=0.01053, over 3048494.59 frames. ], batch size: 56, lr: 9.97e-03, grad_scale: 32.0 2023-11-19 01:52:54,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=510926.6666666667, ans=0.2 2023-11-19 01:52:55,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.37 vs. limit=22.5 2023-11-19 01:53:05,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=510993.3333333333, ans=0.1 2023-11-19 01:53:22,563 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 9.019e+01 9.835e+01 1.060e+02 1.349e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-19 01:53:45,534 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4550, loss[loss=0.07307, simple_loss=0.09227, pruned_loss=0.01786, audio_tagging_loss=0.009073, over 14988.00 frames. ], tot_loss[loss=0.09476, simple_loss=0.1124, pruned_loss=0.02791, audio_tagging_loss=0.01063, over 3048106.03 frames. ], batch size: 56, lr: 9.97e-03, grad_scale: 32.0 2023-11-19 01:53:50,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=511260.0, ans=0.2 2023-11-19 01:53:55,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=511260.0, ans=0.07 2023-11-19 01:53:58,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=511326.6666666667, ans=0.125 2023-11-19 01:53:59,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=511326.6666666667, ans=0.2 2023-11-19 01:54:07,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=511393.3333333333, ans=0.1 2023-11-19 01:54:25,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2023-11-19 01:54:29,215 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:54:42,000 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4600, loss[loss=0.1192, simple_loss=0.148, pruned_loss=0.03758, audio_tagging_loss=0.007628, over 15749.00 frames. ], tot_loss[loss=0.09429, simple_loss=0.1118, pruned_loss=0.02771, audio_tagging_loss=0.01068, over 3045913.25 frames. ], batch size: 57, lr: 9.96e-03, grad_scale: 32.0 2023-11-19 01:55:03,040 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=511726.6666666667, ans=0.0 2023-11-19 01:55:12,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=511726.6666666667, ans=0.2 2023-11-19 01:55:14,085 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.730e+01 9.456e+01 1.065e+02 1.421e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-19 01:55:14,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=511793.3333333333, ans=0.125 2023-11-19 01:55:18,631 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:55:37,980 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4650, loss[loss=0.1034, simple_loss=0.1269, pruned_loss=0.03166, audio_tagging_loss=0.008306, over 15848.00 frames. ], tot_loss[loss=0.09357, simple_loss=0.1106, pruned_loss=0.02741, audio_tagging_loss=0.01085, over 3049187.58 frames. ], batch size: 56, lr: 9.96e-03, grad_scale: 32.0 2023-11-19 01:55:39,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=511926.6666666667, ans=0.125 2023-11-19 01:55:39,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=511926.6666666667, ans=0.1 2023-11-19 01:55:42,700 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.88 vs. limit=22.5 2023-11-19 01:55:50,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=6.0 2023-11-19 01:56:02,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=512060.0, ans=0.1 2023-11-19 01:56:04,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=512060.0, ans=0.125 2023-11-19 01:56:13,525 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=512126.6666666667, ans=0.125 2023-11-19 01:56:22,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=512193.3333333333, ans=0.0 2023-11-19 01:56:24,589 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=512193.3333333333, ans=0.1 2023-11-19 01:56:31,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=512193.3333333333, ans=0.125 2023-11-19 01:56:33,517 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4700, loss[loss=0.1101, simple_loss=0.1337, pruned_loss=0.03566, audio_tagging_loss=0.007573, over 15762.00 frames. ], tot_loss[loss=0.09442, simple_loss=0.1118, pruned_loss=0.02764, audio_tagging_loss=0.0109, over 3058262.64 frames. ], batch size: 58, lr: 9.96e-03, grad_scale: 32.0 2023-11-19 01:56:43,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=512260.0, ans=0.1 2023-11-19 01:56:50,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=512326.6666666667, ans=0.125 2023-11-19 01:56:53,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=512326.6666666667, ans=0.1 2023-11-19 01:56:54,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=512393.3333333333, ans=0.0 2023-11-19 01:56:57,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=12.0 2023-11-19 01:57:05,666 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.776e+01 8.634e+01 9.306e+01 1.008e+02 1.470e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-19 01:57:06,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=512460.0, ans=0.0 2023-11-19 01:57:16,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=512526.6666666667, ans=0.2 2023-11-19 01:57:27,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=512526.6666666667, ans=0.125 2023-11-19 01:57:29,458 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4750, loss[loss=0.1092, simple_loss=0.1392, pruned_loss=0.03107, audio_tagging_loss=0.008549, over 14760.00 frames. ], tot_loss[loss=0.09372, simple_loss=0.1107, pruned_loss=0.02731, audio_tagging_loss=0.01106, over 3057052.31 frames. ], batch size: 57, lr: 9.95e-03, grad_scale: 32.0 2023-11-19 01:57:32,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=512593.3333333333, ans=0.125 2023-11-19 01:58:00,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=512726.6666666667, ans=0.0 2023-11-19 01:58:25,651 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4800, loss[loss=0.1136, simple_loss=0.1472, pruned_loss=0.03256, audio_tagging_loss=0.007395, over 15610.00 frames. ], tot_loss[loss=0.09412, simple_loss=0.1112, pruned_loss=0.02736, audio_tagging_loss=0.01115, over 3050932.23 frames. ], batch size: 58, lr: 9.95e-03, grad_scale: 32.0 2023-11-19 01:58:47,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=513060.0, ans=0.2 2023-11-19 01:58:51,610 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=513060.0, ans=0.0 2023-11-19 01:58:57,703 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 8.526e+01 9.167e+01 1.022e+02 1.486e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-19 01:59:06,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=513126.6666666667, ans=0.2 2023-11-19 01:59:07,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.05 vs. limit=22.5 2023-11-19 01:59:12,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=513193.3333333333, ans=0.125 2023-11-19 01:59:20,440 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4850, loss[loss=0.1036, simple_loss=0.1155, pruned_loss=0.0311, audio_tagging_loss=0.0147, over 15268.00 frames. ], tot_loss[loss=0.09401, simple_loss=0.1108, pruned_loss=0.0273, audio_tagging_loss=0.01131, over 3054018.43 frames. ], batch size: 58, lr: 9.95e-03, grad_scale: 32.0 2023-11-19 01:59:24,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=513260.0, ans=0.125 2023-11-19 01:59:28,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=513260.0, ans=0.2 2023-11-19 01:59:45,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=513393.3333333333, ans=0.07 2023-11-19 01:59:46,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2023-11-19 01:59:50,728 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.82 vs. limit=15.0 2023-11-19 01:59:59,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=513460.0, ans=0.2 2023-11-19 02:00:01,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=513460.0, ans=0.1 2023-11-19 02:00:05,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=513526.6666666667, ans=0.2 2023-11-19 02:00:17,672 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4900, loss[loss=0.05488, simple_loss=0.06337, pruned_loss=0.01416, audio_tagging_loss=0.009033, over 16411.00 frames. ], tot_loss[loss=0.09314, simple_loss=0.1096, pruned_loss=0.02708, audio_tagging_loss=0.01124, over 3051200.91 frames. ], batch size: 62, lr: 9.94e-03, grad_scale: 32.0 2023-11-19 02:00:37,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.50 vs. limit=15.0 2023-11-19 02:00:49,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.320e+01 8.524e+01 9.139e+01 9.781e+01 1.634e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 02:01:08,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=513860.0, ans=0.125 2023-11-19 02:01:12,690 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 4950, loss[loss=0.07144, simple_loss=0.07814, pruned_loss=0.02176, audio_tagging_loss=0.01061, over 13527.00 frames. ], tot_loss[loss=0.09315, simple_loss=0.1097, pruned_loss=0.02725, audio_tagging_loss=0.01107, over 3040838.07 frames. ], batch size: 55, lr: 9.94e-03, grad_scale: 32.0 2023-11-19 02:01:15,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=513926.6666666667, ans=0.125 2023-11-19 02:02:00,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=514193.3333333333, ans=0.2 2023-11-19 02:02:08,129 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5000, loss[loss=0.1063, simple_loss=0.1273, pruned_loss=0.0344, audio_tagging_loss=0.008268, over 15824.00 frames. ], tot_loss[loss=0.09326, simple_loss=0.1102, pruned_loss=0.02732, audio_tagging_loss=0.01082, over 3039075.36 frames. ], batch size: 57, lr: 9.94e-03, grad_scale: 32.0 2023-11-19 02:02:09,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=514260.0, ans=0.125 2023-11-19 02:02:22,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.38 vs. limit=6.0 2023-11-19 02:02:29,980 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=8.981e-02 2023-11-19 02:02:39,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=514393.3333333333, ans=0.1 2023-11-19 02:02:40,226 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.705e+01 9.523e+01 1.061e+02 1.468e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-19 02:02:44,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=514460.0, ans=0.2 2023-11-19 02:02:45,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=514460.0, ans=0.125 2023-11-19 02:02:58,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=514526.6666666667, ans=0.025 2023-11-19 02:03:04,446 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5050, loss[loss=0.08566, simple_loss=0.1049, pruned_loss=0.02494, audio_tagging_loss=0.008293, over 15035.00 frames. ], tot_loss[loss=0.09406, simple_loss=0.1118, pruned_loss=0.02751, audio_tagging_loss=0.01065, over 3043072.88 frames. ], batch size: 57, lr: 9.93e-03, grad_scale: 32.0 2023-11-19 02:03:07,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=514593.3333333333, ans=0.125 2023-11-19 02:03:13,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=514593.3333333333, ans=0.0 2023-11-19 02:03:25,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=514726.6666666667, ans=0.0 2023-11-19 02:03:55,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=514860.0, ans=0.07 2023-11-19 02:03:59,686 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5100, loss[loss=0.1269, simple_loss=0.1584, pruned_loss=0.04028, audio_tagging_loss=0.007384, over 16592.00 frames. ], tot_loss[loss=0.09419, simple_loss=0.1121, pruned_loss=0.02748, audio_tagging_loss=0.01065, over 3043722.18 frames. ], batch size: 58, lr: 9.93e-03, grad_scale: 32.0 2023-11-19 02:04:02,294 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2023-11-19 02:04:12,903 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=15.0 2023-11-19 02:04:24,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.65 vs. limit=22.5 2023-11-19 02:04:27,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.50 vs. limit=12.0 2023-11-19 02:04:28,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=515060.0, ans=0.125 2023-11-19 02:04:30,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=515060.0, ans=0.125 2023-11-19 02:04:32,426 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.830e+01 8.394e+01 9.066e+01 1.016e+02 2.426e+02, threshold=1.813e+02, percent-clipped=1.0 2023-11-19 02:04:45,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=515193.3333333333, ans=0.0 2023-11-19 02:04:54,524 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5150, loss[loss=0.1192, simple_loss=0.1423, pruned_loss=0.03726, audio_tagging_loss=0.01074, over 15765.00 frames. ], tot_loss[loss=0.09274, simple_loss=0.1102, pruned_loss=0.02692, audio_tagging_loss=0.01071, over 3046035.64 frames. ], batch size: 58, lr: 9.93e-03, grad_scale: 32.0 2023-11-19 02:05:03,289 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:05:24,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=515393.3333333333, ans=0.125 2023-11-19 02:05:27,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=515460.0, ans=0.125 2023-11-19 02:05:34,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=515460.0, ans=0.0 2023-11-19 02:05:37,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=515460.0, ans=0.125 2023-11-19 02:05:43,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.94 vs. limit=6.0 2023-11-19 02:05:44,735 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.351e-01 2023-11-19 02:05:50,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=515593.3333333333, ans=0.125 2023-11-19 02:05:51,338 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5200, loss[loss=0.0863, simple_loss=0.09807, pruned_loss=0.02703, audio_tagging_loss=0.01024, over 14968.00 frames. ], tot_loss[loss=0.09324, simple_loss=0.1108, pruned_loss=0.0271, audio_tagging_loss=0.01075, over 3038675.85 frames. ], batch size: 57, lr: 9.92e-03, grad_scale: 32.0 2023-11-19 02:05:52,598 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:05:53,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2023-11-19 02:05:54,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=15.0 2023-11-19 02:06:04,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.93 vs. limit=15.0 2023-11-19 02:06:16,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=515726.6666666667, ans=0.125 2023-11-19 02:06:22,515 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.952e+01 8.557e+01 9.238e+01 1.015e+02 1.542e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 02:06:26,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=515793.3333333333, ans=0.1 2023-11-19 02:06:28,097 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:06:36,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=515860.0, ans=0.125 2023-11-19 02:06:36,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=515860.0, ans=0.125 2023-11-19 02:06:43,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=515860.0, ans=0.125 2023-11-19 02:06:46,883 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5250, loss[loss=0.1056, simple_loss=0.1261, pruned_loss=0.03196, audio_tagging_loss=0.01062, over 16023.00 frames. ], tot_loss[loss=0.09367, simple_loss=0.1114, pruned_loss=0.02728, audio_tagging_loss=0.0107, over 3045586.88 frames. ], batch size: 63, lr: 9.92e-03, grad_scale: 32.0 2023-11-19 02:06:51,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=515926.6666666667, ans=0.125 2023-11-19 02:07:00,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=515993.3333333333, ans=0.125 2023-11-19 02:07:03,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=515993.3333333333, ans=0.09899494936611666 2023-11-19 02:07:25,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=516126.6666666667, ans=0.125 2023-11-19 02:07:34,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=516193.3333333333, ans=0.1 2023-11-19 02:07:41,877 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5300, loss[loss=0.08784, simple_loss=0.09067, pruned_loss=0.0244, audio_tagging_loss=0.0181, over 16533.00 frames. ], tot_loss[loss=0.09404, simple_loss=0.1117, pruned_loss=0.0274, audio_tagging_loss=0.0108, over 3045645.28 frames. ], batch size: 64, lr: 9.92e-03, grad_scale: 32.0 2023-11-19 02:07:42,432 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2023-11-19 02:07:49,118 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.99 vs. limit=22.5 2023-11-19 02:07:50,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=516260.0, ans=0.125 2023-11-19 02:08:01,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=516326.6666666667, ans=0.125 2023-11-19 02:08:06,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=516393.3333333333, ans=0.0 2023-11-19 02:08:09,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=516393.3333333333, ans=0.125 2023-11-19 02:08:09,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=516393.3333333333, ans=0.0 2023-11-19 02:08:11,107 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=12.0 2023-11-19 02:08:12,608 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=516393.3333333333, ans=0.125 2023-11-19 02:08:14,525 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.731e+01 9.566e+01 1.032e+02 1.487e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-19 02:08:34,423 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.18 vs. limit=10.0 2023-11-19 02:08:37,652 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5350, loss[loss=0.07797, simple_loss=0.09281, pruned_loss=0.02359, audio_tagging_loss=0.007982, over 14900.00 frames. ], tot_loss[loss=0.09401, simple_loss=0.1114, pruned_loss=0.02751, audio_tagging_loss=0.0108, over 3041661.28 frames. ], batch size: 56, lr: 9.91e-03, grad_scale: 32.0 2023-11-19 02:08:37,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=516593.3333333333, ans=0.0 2023-11-19 02:08:52,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=516660.0, ans=0.1 2023-11-19 02:09:07,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=516726.6666666667, ans=12.0 2023-11-19 02:09:15,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=516793.3333333333, ans=0.0 2023-11-19 02:09:26,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=516860.0, ans=0.025 2023-11-19 02:09:31,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=516860.0, ans=0.0 2023-11-19 02:09:33,615 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5400, loss[loss=0.08996, simple_loss=0.1044, pruned_loss=0.02694, audio_tagging_loss=0.01082, over 15650.00 frames. ], tot_loss[loss=0.09362, simple_loss=0.1105, pruned_loss=0.0274, audio_tagging_loss=0.01095, over 3041494.87 frames. ], batch size: 60, lr: 9.91e-03, grad_scale: 32.0 2023-11-19 02:09:33,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=516926.6666666667, ans=0.05 2023-11-19 02:09:55,682 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2023-11-19 02:09:57,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=517060.0, ans=0.0 2023-11-19 02:10:00,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=517060.0, ans=0.5 2023-11-19 02:10:00,514 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=517060.0, ans=0.0 2023-11-19 02:10:05,678 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.972e+01 8.660e+01 9.823e+01 1.115e+02 1.582e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-19 02:10:28,359 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5450, loss[loss=0.07344, simple_loss=0.07978, pruned_loss=0.02021, audio_tagging_loss=0.01334, over 14641.00 frames. ], tot_loss[loss=0.09366, simple_loss=0.1106, pruned_loss=0.02745, audio_tagging_loss=0.01091, over 3036196.07 frames. ], batch size: 56, lr: 9.91e-03, grad_scale: 32.0 2023-11-19 02:10:30,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=517260.0, ans=0.0 2023-11-19 02:10:30,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=517260.0, ans=0.1 2023-11-19 02:10:45,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=517326.6666666667, ans=0.125 2023-11-19 02:10:45,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2023-11-19 02:10:58,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517393.3333333333, ans=0.1 2023-11-19 02:10:59,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=517393.3333333333, ans=0.125 2023-11-19 02:11:01,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=517460.0, ans=0.2 2023-11-19 02:11:04,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2023-11-19 02:11:21,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=517526.6666666667, ans=0.2 2023-11-19 02:11:24,123 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5500, loss[loss=0.08509, simple_loss=0.102, pruned_loss=0.02165, audio_tagging_loss=0.01244, over 15420.00 frames. ], tot_loss[loss=0.09395, simple_loss=0.1112, pruned_loss=0.02752, audio_tagging_loss=0.01085, over 3039823.14 frames. ], batch size: 57, lr: 9.90e-03, grad_scale: 64.0 2023-11-19 02:11:30,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=517593.3333333333, ans=0.125 2023-11-19 02:11:34,455 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=15.0 2023-11-19 02:11:35,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=517660.0, ans=0.0 2023-11-19 02:11:46,961 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.88 vs. limit=10.0 2023-11-19 02:11:55,915 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.797e+01 8.498e+01 9.493e+01 1.061e+02 1.375e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-19 02:11:56,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.76 vs. limit=22.5 2023-11-19 02:12:02,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=517793.3333333333, ans=0.125 2023-11-19 02:12:04,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=517793.3333333333, ans=0.125 2023-11-19 02:12:04,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=517793.3333333333, ans=0.5 2023-11-19 02:12:06,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=517793.3333333333, ans=0.125 2023-11-19 02:12:07,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.42 vs. limit=10.0 2023-11-19 02:12:14,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=517860.0, ans=0.1 2023-11-19 02:12:20,007 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5550, loss[loss=0.1348, simple_loss=0.1629, pruned_loss=0.04439, audio_tagging_loss=0.008961, over 16758.00 frames. ], tot_loss[loss=0.0933, simple_loss=0.1104, pruned_loss=0.02718, audio_tagging_loss=0.0109, over 3042618.31 frames. ], batch size: 58, lr: 9.90e-03, grad_scale: 64.0 2023-11-19 02:12:37,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=517993.3333333333, ans=0.125 2023-11-19 02:13:00,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=518126.6666666667, ans=0.125 2023-11-19 02:13:01,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=518126.6666666667, ans=0.025 2023-11-19 02:13:15,067 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5600, loss[loss=0.08785, simple_loss=0.1009, pruned_loss=0.02645, audio_tagging_loss=0.01093, over 15790.00 frames. ], tot_loss[loss=0.09343, simple_loss=0.1108, pruned_loss=0.02697, audio_tagging_loss=0.01109, over 3047860.24 frames. ], batch size: 59, lr: 9.90e-03, grad_scale: 64.0 2023-11-19 02:13:18,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=12.0 2023-11-19 02:13:34,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=518326.6666666667, ans=0.0 2023-11-19 02:13:43,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.45 vs. limit=10.0 2023-11-19 02:13:44,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=518393.3333333333, ans=0.0 2023-11-19 02:13:46,977 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.885e+01 8.345e+01 9.240e+01 1.027e+02 1.400e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 02:13:47,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=518460.0, ans=0.125 2023-11-19 02:13:55,456 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 02:14:09,779 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5650, loss[loss=0.1055, simple_loss=0.1192, pruned_loss=0.03204, audio_tagging_loss=0.01386, over 14963.00 frames. ], tot_loss[loss=0.09413, simple_loss=0.1113, pruned_loss=0.0273, audio_tagging_loss=0.0112, over 3044091.27 frames. ], batch size: 57, lr: 9.90e-03, grad_scale: 64.0 2023-11-19 02:14:28,594 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.33 vs. limit=15.0 2023-11-19 02:14:30,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-11-19 02:14:43,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=518793.3333333333, ans=0.1 2023-11-19 02:14:56,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2023-11-19 02:15:01,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=518860.0, ans=0.125 2023-11-19 02:15:06,279 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5700, loss[loss=0.08275, simple_loss=0.09015, pruned_loss=0.02259, audio_tagging_loss=0.01508, over 16616.00 frames. ], tot_loss[loss=0.09388, simple_loss=0.111, pruned_loss=0.02707, audio_tagging_loss=0.01129, over 3042867.76 frames. ], batch size: 64, lr: 9.89e-03, grad_scale: 64.0 2023-11-19 02:15:25,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=518993.3333333333, ans=0.125 2023-11-19 02:15:34,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.18 vs. limit=15.0 2023-11-19 02:15:38,526 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.416e+01 9.360e+01 1.085e+02 2.101e+02, threshold=1.872e+02, percent-clipped=1.0 2023-11-19 02:15:55,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=519193.3333333333, ans=0.0 2023-11-19 02:15:55,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2023-11-19 02:16:01,543 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5750, loss[loss=0.1069, simple_loss=0.1338, pruned_loss=0.02862, audio_tagging_loss=0.01138, over 13879.00 frames. ], tot_loss[loss=0.094, simple_loss=0.1112, pruned_loss=0.02728, audio_tagging_loss=0.01111, over 3041967.93 frames. ], batch size: 52, lr: 9.89e-03, grad_scale: 32.0 2023-11-19 02:16:17,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=519326.6666666667, ans=0.0 2023-11-19 02:16:43,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=519460.0, ans=10.0 2023-11-19 02:16:51,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=519526.6666666667, ans=10.0 2023-11-19 02:16:52,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=519526.6666666667, ans=0.1 2023-11-19 02:16:56,607 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5800, loss[loss=0.07275, simple_loss=0.08052, pruned_loss=0.01828, audio_tagging_loss=0.01421, over 14569.00 frames. ], tot_loss[loss=0.09344, simple_loss=0.1105, pruned_loss=0.02719, audio_tagging_loss=0.01099, over 3046760.41 frames. ], batch size: 56, lr: 9.89e-03, grad_scale: 16.0 2023-11-19 02:17:31,533 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.960e+01 8.540e+01 9.536e+01 1.116e+02 2.278e+02, threshold=1.907e+02, percent-clipped=1.0 2023-11-19 02:17:41,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2023-11-19 02:17:53,439 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5850, loss[loss=0.1083, simple_loss=0.129, pruned_loss=0.03202, audio_tagging_loss=0.01181, over 14423.00 frames. ], tot_loss[loss=0.09425, simple_loss=0.1119, pruned_loss=0.02748, audio_tagging_loss=0.01083, over 3045975.38 frames. ], batch size: 53, lr: 9.88e-03, grad_scale: 16.0 2023-11-19 02:17:54,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=519926.6666666667, ans=0.125 2023-11-19 02:18:15,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=520060.0, ans=0.125 2023-11-19 02:18:37,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=520193.3333333333, ans=0.125 2023-11-19 02:18:49,470 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5900, loss[loss=0.1269, simple_loss=0.1423, pruned_loss=0.04617, audio_tagging_loss=0.009628, over 15414.00 frames. ], tot_loss[loss=0.09419, simple_loss=0.1118, pruned_loss=0.02751, audio_tagging_loss=0.01079, over 3043018.30 frames. ], batch size: 56, lr: 9.88e-03, grad_scale: 16.0 2023-11-19 02:18:56,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=520260.0, ans=0.1 2023-11-19 02:18:58,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=520260.0, ans=0.1 2023-11-19 02:19:02,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=520326.6666666667, ans=0.07 2023-11-19 02:19:07,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-19 02:19:23,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 8.828e+01 9.582e+01 1.059e+02 2.362e+02, threshold=1.916e+02, percent-clipped=1.0 2023-11-19 02:19:25,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=520460.0, ans=0.0 2023-11-19 02:19:44,674 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 5950, loss[loss=0.06211, simple_loss=0.06572, pruned_loss=0.01672, audio_tagging_loss=0.01253, over 16631.00 frames. ], tot_loss[loss=0.09367, simple_loss=0.111, pruned_loss=0.02733, audio_tagging_loss=0.01086, over 3043808.62 frames. ], batch size: 63, lr: 9.88e-03, grad_scale: 16.0 2023-11-19 02:19:51,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=520593.3333333333, ans=0.125 2023-11-19 02:19:53,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=520593.3333333333, ans=0.125 2023-11-19 02:20:22,236 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.43 vs. limit=10.0 2023-11-19 02:20:40,636 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6000, loss[loss=0.07695, simple_loss=0.09304, pruned_loss=0.0183, audio_tagging_loss=0.01213, over 15576.00 frames. ], tot_loss[loss=0.09317, simple_loss=0.1107, pruned_loss=0.02705, audio_tagging_loss=0.01078, over 3039570.96 frames. ], batch size: 58, lr: 9.87e-03, grad_scale: 32.0 2023-11-19 02:20:40,636 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 02:21:08,606 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.2709, 4.9913, 4.8100, 5.1621], device='cuda:1') 2023-11-19 02:21:13,026 INFO [train_asr.py:1147] (1/4) Epoch 7, validation: loss=0.06924, simple_loss=0.05776, pruned_loss=0.007549, audio_tagging_loss=0.0328, over 4681554.00 frames. 2023-11-19 02:21:13,027 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 02:21:30,705 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:21:47,401 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.133e+01 8.768e+01 9.511e+01 1.039e+02 1.786e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 02:21:55,328 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 02:21:55,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=521126.6666666667, ans=0.1 2023-11-19 02:22:07,960 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6050, loss[loss=0.07324, simple_loss=0.08197, pruned_loss=0.0183, audio_tagging_loss=0.01396, over 13504.00 frames. ], tot_loss[loss=0.0935, simple_loss=0.1111, pruned_loss=0.02711, audio_tagging_loss=0.01084, over 3046743.86 frames. ], batch size: 53, lr: 9.87e-03, grad_scale: 32.0 2023-11-19 02:22:09,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=521260.0, ans=0.125 2023-11-19 02:22:15,002 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521260.0, ans=0.1 2023-11-19 02:22:19,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=521326.6666666667, ans=0.125 2023-11-19 02:22:23,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=521326.6666666667, ans=0.0 2023-11-19 02:22:26,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.27 vs. limit=10.0 2023-11-19 02:22:38,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=521393.3333333333, ans=0.125 2023-11-19 02:22:49,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=521460.0, ans=0.035 2023-11-19 02:22:55,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=521526.6666666667, ans=0.125 2023-11-19 02:23:01,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=521526.6666666667, ans=0.125 2023-11-19 02:23:05,055 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6100, loss[loss=0.1242, simple_loss=0.1509, pruned_loss=0.04205, audio_tagging_loss=0.00674, over 15032.00 frames. ], tot_loss[loss=0.0935, simple_loss=0.1112, pruned_loss=0.02712, audio_tagging_loss=0.01076, over 3049663.29 frames. ], batch size: 56, lr: 9.87e-03, grad_scale: 32.0 2023-11-19 02:23:12,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=521593.3333333333, ans=0.2 2023-11-19 02:23:17,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=521660.0, ans=0.125 2023-11-19 02:23:25,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=521726.6666666667, ans=0.0 2023-11-19 02:23:25,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=521726.6666666667, ans=0.0 2023-11-19 02:23:27,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.31 vs. limit=22.5 2023-11-19 02:23:29,248 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-19 02:23:29,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.58 vs. limit=6.0 2023-11-19 02:23:38,647 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.659e+01 8.713e+01 9.332e+01 1.071e+02 1.394e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-19 02:23:44,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=521793.3333333333, ans=0.5 2023-11-19 02:23:59,651 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6150, loss[loss=0.06951, simple_loss=0.08869, pruned_loss=0.01628, audio_tagging_loss=0.008887, over 15437.00 frames. ], tot_loss[loss=0.09325, simple_loss=0.1108, pruned_loss=0.02698, audio_tagging_loss=0.01085, over 3048821.03 frames. ], batch size: 58, lr: 9.86e-03, grad_scale: 32.0 2023-11-19 02:24:21,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.57 vs. limit=12.0 2023-11-19 02:24:22,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=522060.0, ans=0.125 2023-11-19 02:24:28,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=522060.0, ans=0.07 2023-11-19 02:24:30,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=522060.0, ans=0.125 2023-11-19 02:24:30,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=522060.0, ans=0.125 2023-11-19 02:24:43,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.58 vs. limit=15.0 2023-11-19 02:24:49,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=522193.3333333333, ans=0.05 2023-11-19 02:24:54,744 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6200, loss[loss=0.07759, simple_loss=0.09301, pruned_loss=0.02048, audio_tagging_loss=0.01061, over 15264.00 frames. ], tot_loss[loss=0.09351, simple_loss=0.1109, pruned_loss=0.02705, audio_tagging_loss=0.01099, over 3040597.44 frames. ], batch size: 56, lr: 9.86e-03, grad_scale: 32.0 2023-11-19 02:25:16,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=522393.3333333333, ans=0.125 2023-11-19 02:25:26,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=522460.0, ans=0.125 2023-11-19 02:25:28,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.684e+01 8.853e+01 9.668e+01 1.071e+02 1.653e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-19 02:25:30,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=522460.0, ans=0.0 2023-11-19 02:25:37,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=522526.6666666667, ans=0.1 2023-11-19 02:25:41,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=522526.6666666667, ans=0.0 2023-11-19 02:25:43,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=522526.6666666667, ans=0.125 2023-11-19 02:25:50,571 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6250, loss[loss=0.08759, simple_loss=0.1025, pruned_loss=0.02331, audio_tagging_loss=0.01303, over 14938.00 frames. ], tot_loss[loss=0.09466, simple_loss=0.1123, pruned_loss=0.02752, audio_tagging_loss=0.01098, over 3034305.26 frames. ], batch size: 56, lr: 9.86e-03, grad_scale: 32.0 2023-11-19 02:26:27,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=522793.3333333333, ans=0.125 2023-11-19 02:26:30,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=522793.3333333333, ans=0.1 2023-11-19 02:26:34,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.59 vs. limit=22.5 2023-11-19 02:26:36,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=522860.0, ans=0.05 2023-11-19 02:26:45,718 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6300, loss[loss=0.1096, simple_loss=0.1326, pruned_loss=0.03386, audio_tagging_loss=0.009453, over 15337.00 frames. ], tot_loss[loss=0.09486, simple_loss=0.1122, pruned_loss=0.02764, audio_tagging_loss=0.01111, over 3034403.64 frames. ], batch size: 59, lr: 9.85e-03, grad_scale: 32.0 2023-11-19 02:26:58,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=522993.3333333333, ans=0.125 2023-11-19 02:26:59,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=522993.3333333333, ans=0.07 2023-11-19 02:27:22,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.961e+01 8.911e+01 9.683e+01 1.073e+02 1.509e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-19 02:27:41,851 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6350, loss[loss=0.08253, simple_loss=0.1015, pruned_loss=0.02241, audio_tagging_loss=0.009397, over 16292.00 frames. ], tot_loss[loss=0.09447, simple_loss=0.1114, pruned_loss=0.02757, audio_tagging_loss=0.01119, over 3028872.47 frames. ], batch size: 60, lr: 9.85e-03, grad_scale: 16.0 2023-11-19 02:27:46,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=523260.0, ans=0.0 2023-11-19 02:28:09,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.10 vs. limit=10.0 2023-11-19 02:28:32,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=523526.6666666667, ans=0.0 2023-11-19 02:28:38,391 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6400, loss[loss=0.08839, simple_loss=0.1126, pruned_loss=0.02048, audio_tagging_loss=0.0116, over 15055.00 frames. ], tot_loss[loss=0.09404, simple_loss=0.1108, pruned_loss=0.02732, audio_tagging_loss=0.01131, over 3032382.69 frames. ], batch size: 56, lr: 9.85e-03, grad_scale: 32.0 2023-11-19 02:28:42,878 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=523593.3333333333, ans=0.125 2023-11-19 02:28:51,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=523660.0, ans=0.2 2023-11-19 02:29:13,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 8.426e+01 9.075e+01 1.046e+02 1.342e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-19 02:29:17,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=523793.3333333333, ans=0.2 2023-11-19 02:29:25,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=523860.0, ans=0.05 2023-11-19 02:29:33,090 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6450, loss[loss=0.08129, simple_loss=0.09242, pruned_loss=0.02373, audio_tagging_loss=0.01134, over 13539.00 frames. ], tot_loss[loss=0.09464, simple_loss=0.1113, pruned_loss=0.0276, audio_tagging_loss=0.01138, over 3029031.67 frames. ], batch size: 52, lr: 9.85e-03, grad_scale: 32.0 2023-11-19 02:29:46,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=523993.3333333333, ans=0.125 2023-11-19 02:29:56,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=524060.0, ans=0.0 2023-11-19 02:30:08,966 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.82 vs. limit=6.0 2023-11-19 02:30:10,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=524126.6666666667, ans=0.0 2023-11-19 02:30:12,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=524126.6666666667, ans=0.125 2023-11-19 02:30:12,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=524126.6666666667, ans=0.125 2023-11-19 02:30:19,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=524193.3333333333, ans=0.0 2023-11-19 02:30:21,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=12.0 2023-11-19 02:30:25,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=524193.3333333333, ans=0.125 2023-11-19 02:30:28,442 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6500, loss[loss=0.05581, simple_loss=0.06357, pruned_loss=0.01202, audio_tagging_loss=0.012, over 14340.00 frames. ], tot_loss[loss=0.09349, simple_loss=0.11, pruned_loss=0.02714, audio_tagging_loss=0.01135, over 3037610.33 frames. ], batch size: 57, lr: 9.84e-03, grad_scale: 32.0 2023-11-19 02:30:44,783 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.54 vs. limit=15.0 2023-11-19 02:30:53,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=524393.3333333334, ans=0.125 2023-11-19 02:31:04,408 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.655e+01 9.206e+01 1.007e+02 1.269e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 02:31:19,101 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.093e-01 2023-11-19 02:31:24,664 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6550, loss[loss=0.07804, simple_loss=0.08624, pruned_loss=0.02405, audio_tagging_loss=0.01087, over 14359.00 frames. ], tot_loss[loss=0.0934, simple_loss=0.1103, pruned_loss=0.02712, audio_tagging_loss=0.01114, over 3040159.16 frames. ], batch size: 54, lr: 9.84e-03, grad_scale: 32.0 2023-11-19 02:31:48,001 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.83 vs. limit=10.0 2023-11-19 02:31:51,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=524726.6666666666, ans=15.0 2023-11-19 02:32:04,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=524793.3333333334, ans=0.125 2023-11-19 02:32:11,352 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2023-11-19 02:32:18,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=524860.0, ans=0.05 2023-11-19 02:32:20,067 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6600, loss[loss=0.1142, simple_loss=0.131, pruned_loss=0.03874, audio_tagging_loss=0.01002, over 15384.00 frames. ], tot_loss[loss=0.09328, simple_loss=0.1102, pruned_loss=0.02718, audio_tagging_loss=0.01098, over 3036085.91 frames. ], batch size: 60, lr: 9.84e-03, grad_scale: 32.0 2023-11-19 02:32:22,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=524926.6666666666, ans=0.0 2023-11-19 02:32:33,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=524993.3333333334, ans=0.0 2023-11-19 02:32:35,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=524993.3333333334, ans=0.125 2023-11-19 02:32:46,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2023-11-19 02:32:55,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.622e+01 9.360e+01 1.038e+02 1.373e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 02:32:59,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=525126.6666666666, ans=0.1 2023-11-19 02:33:14,897 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6650, loss[loss=0.1056, simple_loss=0.1319, pruned_loss=0.02908, audio_tagging_loss=0.01057, over 15876.00 frames. ], tot_loss[loss=0.09328, simple_loss=0.1101, pruned_loss=0.02724, audio_tagging_loss=0.01098, over 3041758.57 frames. ], batch size: 58, lr: 9.83e-03, grad_scale: 32.0 2023-11-19 02:33:21,542 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=525260.0, ans=0.025 2023-11-19 02:33:22,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=22.5 2023-11-19 02:33:24,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.52 vs. limit=15.0 2023-11-19 02:33:29,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=525326.6666666666, ans=0.2 2023-11-19 02:33:32,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=525326.6666666666, ans=0.07 2023-11-19 02:33:34,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=525326.6666666666, ans=0.0 2023-11-19 02:34:10,677 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6700, loss[loss=0.09951, simple_loss=0.1237, pruned_loss=0.02764, audio_tagging_loss=0.01003, over 15559.00 frames. ], tot_loss[loss=0.09266, simple_loss=0.1097, pruned_loss=0.02693, audio_tagging_loss=0.01089, over 3045537.68 frames. ], batch size: 58, lr: 9.83e-03, grad_scale: 32.0 2023-11-19 02:34:12,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=525593.3333333334, ans=0.125 2023-11-19 02:34:28,066 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2023-11-19 02:34:39,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=525726.6666666666, ans=0.04949747468305833 2023-11-19 02:34:45,466 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.124e+01 8.532e+01 9.161e+01 1.021e+02 1.335e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 02:35:05,523 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6750, loss[loss=0.08492, simple_loss=0.1045, pruned_loss=0.02464, audio_tagging_loss=0.008021, over 14746.00 frames. ], tot_loss[loss=0.09193, simple_loss=0.1088, pruned_loss=0.02669, audio_tagging_loss=0.01084, over 3037280.24 frames. ], batch size: 54, lr: 9.83e-03, grad_scale: 32.0 2023-11-19 02:35:13,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=525926.6666666666, ans=0.1 2023-11-19 02:35:44,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=526126.6666666666, ans=0.035 2023-11-19 02:35:44,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=526126.6666666666, ans=0.1 2023-11-19 02:35:53,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.16 vs. limit=15.0 2023-11-19 02:35:54,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526193.3333333334, ans=0.1 2023-11-19 02:36:00,006 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6800, loss[loss=0.09915, simple_loss=0.1181, pruned_loss=0.03082, audio_tagging_loss=0.009269, over 15464.00 frames. ], tot_loss[loss=0.09221, simple_loss=0.1092, pruned_loss=0.02684, audio_tagging_loss=0.01078, over 3041395.42 frames. ], batch size: 58, lr: 9.82e-03, grad_scale: 32.0 2023-11-19 02:36:03,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=526260.0, ans=0.125 2023-11-19 02:36:05,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=526260.0, ans=0.125 2023-11-19 02:36:07,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=526260.0, ans=0.0 2023-11-19 02:36:11,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=12.0 2023-11-19 02:36:20,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=526326.6666666666, ans=0.125 2023-11-19 02:36:25,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=526393.3333333334, ans=0.0 2023-11-19 02:36:29,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=526393.3333333334, ans=0.125 2023-11-19 02:36:32,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526460.0, ans=0.1 2023-11-19 02:36:35,400 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.668e+01 8.914e+01 9.908e+01 1.073e+02 1.400e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-19 02:36:54,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=526593.3333333334, ans=0.2 2023-11-19 02:36:54,909 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6850, loss[loss=0.1075, simple_loss=0.1342, pruned_loss=0.03277, audio_tagging_loss=0.007639, over 14904.00 frames. ], tot_loss[loss=0.0929, simple_loss=0.11, pruned_loss=0.02711, audio_tagging_loss=0.01079, over 3037530.08 frames. ], batch size: 54, lr: 9.82e-03, grad_scale: 32.0 2023-11-19 02:37:05,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526660.0, ans=0.1 2023-11-19 02:37:22,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=526726.6666666666, ans=0.1 2023-11-19 02:37:25,010 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=526726.6666666666, ans=0.125 2023-11-19 02:37:30,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=526793.3333333334, ans=0.125 2023-11-19 02:37:37,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=526793.3333333334, ans=0.0 2023-11-19 02:37:40,793 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=526860.0, ans=0.0 2023-11-19 02:37:51,105 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6900, loss[loss=0.09513, simple_loss=0.1126, pruned_loss=0.0286, audio_tagging_loss=0.01026, over 15210.00 frames. ], tot_loss[loss=0.09289, simple_loss=0.1103, pruned_loss=0.02696, audio_tagging_loss=0.01077, over 3041685.78 frames. ], batch size: 57, lr: 9.82e-03, grad_scale: 32.0 2023-11-19 02:38:09,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.78 vs. limit=10.0 2023-11-19 02:38:13,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=527060.0, ans=0.0 2023-11-19 02:38:15,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=527060.0, ans=0.125 2023-11-19 02:38:25,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=527126.6666666666, ans=0.125 2023-11-19 02:38:26,438 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 8.284e+01 8.942e+01 9.819e+01 1.283e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-19 02:38:32,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.55 vs. limit=6.0 2023-11-19 02:38:34,400 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 02:38:45,937 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 6950, loss[loss=0.1243, simple_loss=0.1521, pruned_loss=0.03787, audio_tagging_loss=0.01042, over 15952.00 frames. ], tot_loss[loss=0.093, simple_loss=0.1103, pruned_loss=0.02694, audio_tagging_loss=0.01092, over 3036760.91 frames. ], batch size: 58, lr: 9.81e-03, grad_scale: 32.0 2023-11-19 02:38:53,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=527260.0, ans=0.125 2023-11-19 02:39:21,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=527460.0, ans=0.0 2023-11-19 02:39:24,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2023-11-19 02:39:39,264 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=22.5 2023-11-19 02:39:40,942 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7000, loss[loss=0.08169, simple_loss=0.1027, pruned_loss=0.02036, audio_tagging_loss=0.01, over 15688.00 frames. ], tot_loss[loss=0.09347, simple_loss=0.1103, pruned_loss=0.02721, audio_tagging_loss=0.01108, over 3031269.48 frames. ], batch size: 59, lr: 9.81e-03, grad_scale: 32.0 2023-11-19 02:39:45,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=527593.3333333334, ans=0.125 2023-11-19 02:39:47,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.11 vs. limit=12.0 2023-11-19 02:39:48,861 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.54 vs. limit=22.5 2023-11-19 02:39:52,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=527660.0, ans=0.2 2023-11-19 02:40:13,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=527726.6666666666, ans=0.1 2023-11-19 02:40:13,300 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=12.0 2023-11-19 02:40:16,972 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.756e+01 9.618e+01 1.077e+02 1.519e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-19 02:40:26,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=527860.0, ans=0.5 2023-11-19 02:40:36,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=527860.0, ans=0.0 2023-11-19 02:40:37,802 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7050, loss[loss=0.08699, simple_loss=0.1018, pruned_loss=0.02523, audio_tagging_loss=0.01086, over 14542.00 frames. ], tot_loss[loss=0.09341, simple_loss=0.1103, pruned_loss=0.02717, audio_tagging_loss=0.01108, over 3033266.80 frames. ], batch size: 54, lr: 9.81e-03, grad_scale: 32.0 2023-11-19 02:40:43,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2023-11-19 02:40:58,964 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=528060.0, ans=0.125 2023-11-19 02:41:03,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=528060.0, ans=0.125 2023-11-19 02:41:13,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=528126.6666666666, ans=0.1 2023-11-19 02:41:16,715 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.17 vs. limit=10.0 2023-11-19 02:41:23,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=528193.3333333334, ans=0.1 2023-11-19 02:41:28,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=528193.3333333334, ans=0.1 2023-11-19 02:41:33,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.68 vs. limit=22.5 2023-11-19 02:41:33,641 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7100, loss[loss=0.0881, simple_loss=0.09892, pruned_loss=0.02399, audio_tagging_loss=0.01465, over 15643.00 frames. ], tot_loss[loss=0.09253, simple_loss=0.1092, pruned_loss=0.02667, audio_tagging_loss=0.01127, over 3037918.48 frames. ], batch size: 59, lr: 9.81e-03, grad_scale: 32.0 2023-11-19 02:41:46,758 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2023-11-19 02:42:05,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.46 vs. limit=12.0 2023-11-19 02:42:09,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.659e+01 8.375e+01 9.204e+01 1.018e+02 1.304e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 02:42:11,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=528460.0, ans=0.2 2023-11-19 02:42:12,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.42 vs. limit=15.0 2023-11-19 02:42:19,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=528526.6666666666, ans=0.1 2023-11-19 02:42:28,583 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7150, loss[loss=0.09605, simple_loss=0.1123, pruned_loss=0.02896, audio_tagging_loss=0.01094, over 15328.00 frames. ], tot_loss[loss=0.093, simple_loss=0.1098, pruned_loss=0.02689, audio_tagging_loss=0.01122, over 3036964.70 frames. ], batch size: 57, lr: 9.80e-03, grad_scale: 32.0 2023-11-19 02:43:03,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=528793.3333333334, ans=0.125 2023-11-19 02:43:07,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=528793.3333333334, ans=0.125 2023-11-19 02:43:16,805 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:43:25,055 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7200, loss[loss=0.08693, simple_loss=0.09632, pruned_loss=0.02574, audio_tagging_loss=0.01303, over 15652.00 frames. ], tot_loss[loss=0.09271, simple_loss=0.1094, pruned_loss=0.02674, audio_tagging_loss=0.01126, over 3047023.78 frames. ], batch size: 60, lr: 9.80e-03, grad_scale: 32.0 2023-11-19 02:43:56,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=529126.6666666666, ans=0.05 2023-11-19 02:43:59,943 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.223e+01 8.683e+01 9.470e+01 1.039e+02 1.567e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-19 02:44:06,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=529126.6666666666, ans=0.125 2023-11-19 02:44:20,482 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7250, loss[loss=0.09851, simple_loss=0.1206, pruned_loss=0.02984, audio_tagging_loss=0.008346, over 16345.00 frames. ], tot_loss[loss=0.09239, simple_loss=0.1091, pruned_loss=0.02651, audio_tagging_loss=0.01134, over 3050582.77 frames. ], batch size: 59, lr: 9.80e-03, grad_scale: 32.0 2023-11-19 02:44:29,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=529260.0, ans=0.125 2023-11-19 02:44:35,705 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:44:39,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=529326.6666666666, ans=0.125 2023-11-19 02:45:09,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=529526.6666666666, ans=0.5 2023-11-19 02:45:12,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=529526.6666666666, ans=0.0 2023-11-19 02:45:15,821 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7300, loss[loss=0.1014, simple_loss=0.1031, pruned_loss=0.03488, audio_tagging_loss=0.01503, over 16169.00 frames. ], tot_loss[loss=0.09317, simple_loss=0.1103, pruned_loss=0.02676, audio_tagging_loss=0.01129, over 3049785.64 frames. ], batch size: 61, lr: 9.79e-03, grad_scale: 32.0 2023-11-19 02:45:23,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=529593.3333333334, ans=0.1 2023-11-19 02:45:36,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=529660.0, ans=0.125 2023-11-19 02:45:36,940 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-11-19 02:45:48,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=529793.3333333334, ans=0.125 2023-11-19 02:45:51,672 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.930e+01 9.656e+01 1.070e+02 1.553e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-19 02:46:12,160 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7350, loss[loss=0.07735, simple_loss=0.1009, pruned_loss=0.01945, audio_tagging_loss=0.007428, over 14667.00 frames. ], tot_loss[loss=0.0925, simple_loss=0.1097, pruned_loss=0.02659, audio_tagging_loss=0.01108, over 3047057.32 frames. ], batch size: 57, lr: 9.79e-03, grad_scale: 32.0 2023-11-19 02:46:22,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.56 vs. limit=15.0 2023-11-19 02:46:22,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=529993.3333333334, ans=0.0 2023-11-19 02:46:33,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=530060.0, ans=15.0 2023-11-19 02:46:35,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=530060.0, ans=0.125 2023-11-19 02:46:44,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=530126.6666666666, ans=0.0 2023-11-19 02:46:45,501 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2023-11-19 02:46:47,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=530126.6666666666, ans=0.0 2023-11-19 02:46:51,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.93 vs. limit=15.0 2023-11-19 02:47:07,078 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7400, loss[loss=0.08538, simple_loss=0.09947, pruned_loss=0.02338, audio_tagging_loss=0.01227, over 15594.00 frames. ], tot_loss[loss=0.09279, simple_loss=0.1102, pruned_loss=0.02681, audio_tagging_loss=0.01088, over 3043060.00 frames. ], batch size: 61, lr: 9.79e-03, grad_scale: 32.0 2023-11-19 02:47:25,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=530326.6666666666, ans=0.125 2023-11-19 02:47:26,692 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.08 vs. limit=10.0 2023-11-19 02:47:43,077 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.685e+01 9.299e+01 1.013e+02 1.325e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 02:47:53,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=530526.6666666666, ans=0.1 2023-11-19 02:47:54,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=530526.6666666666, ans=0.04949747468305833 2023-11-19 02:48:02,730 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7450, loss[loss=0.08363, simple_loss=0.1041, pruned_loss=0.0223, audio_tagging_loss=0.009296, over 13689.00 frames. ], tot_loss[loss=0.0922, simple_loss=0.1095, pruned_loss=0.02658, audio_tagging_loss=0.01088, over 3039807.54 frames. ], batch size: 52, lr: 9.78e-03, grad_scale: 32.0 2023-11-19 02:48:08,982 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2023-11-19 02:48:10,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=530593.3333333334, ans=6.0 2023-11-19 02:48:11,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=530593.3333333334, ans=0.0 2023-11-19 02:48:19,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=530660.0, ans=0.0 2023-11-19 02:48:24,887 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.93 vs. limit=15.0 2023-11-19 02:48:37,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.04 vs. limit=15.0 2023-11-19 02:48:42,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=530793.3333333334, ans=0.2 2023-11-19 02:48:49,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=530860.0, ans=0.0 2023-11-19 02:48:56,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=530860.0, ans=0.125 2023-11-19 02:48:59,262 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7500, loss[loss=0.1122, simple_loss=0.1332, pruned_loss=0.03328, audio_tagging_loss=0.01238, over 15869.00 frames. ], tot_loss[loss=0.0923, simple_loss=0.1095, pruned_loss=0.02675, audio_tagging_loss=0.01079, over 3040616.04 frames. ], batch size: 60, lr: 9.78e-03, grad_scale: 32.0 2023-11-19 02:49:04,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=530926.6666666666, ans=0.0 2023-11-19 02:49:16,400 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.20 vs. limit=10.0 2023-11-19 02:49:33,664 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.745e+01 8.748e+01 9.301e+01 1.047e+02 1.348e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 02:49:35,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=531126.6666666666, ans=0.0 2023-11-19 02:49:45,622 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=531193.3333333334, ans=0.1 2023-11-19 02:49:53,802 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7550, loss[loss=0.08282, simple_loss=0.09287, pruned_loss=0.02198, audio_tagging_loss=0.01441, over 15339.00 frames. ], tot_loss[loss=0.09262, simple_loss=0.1096, pruned_loss=0.02693, audio_tagging_loss=0.0109, over 3038357.74 frames. ], batch size: 58, lr: 9.78e-03, grad_scale: 32.0 2023-11-19 02:50:04,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=531326.6666666666, ans=0.2 2023-11-19 02:50:26,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=531460.0, ans=0.125 2023-11-19 02:50:30,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=531460.0, ans=0.125 2023-11-19 02:50:34,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=15.0 2023-11-19 02:50:37,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=531526.6666666666, ans=0.04949747468305833 2023-11-19 02:50:40,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=531526.6666666666, ans=0.1 2023-11-19 02:50:48,803 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7600, loss[loss=0.1002, simple_loss=0.1118, pruned_loss=0.03371, audio_tagging_loss=0.01062, over 13448.00 frames. ], tot_loss[loss=0.09216, simple_loss=0.1088, pruned_loss=0.02693, audio_tagging_loss=0.01083, over 3041573.18 frames. ], batch size: 53, lr: 9.77e-03, grad_scale: 32.0 2023-11-19 02:51:06,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=531660.0, ans=0.125 2023-11-19 02:51:24,908 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.650e+01 9.572e+01 1.070e+02 1.390e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-19 02:51:27,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=531793.3333333334, ans=0.1 2023-11-19 02:51:42,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=531860.0, ans=0.125 2023-11-19 02:51:45,438 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7650, loss[loss=0.09743, simple_loss=0.115, pruned_loss=0.03088, audio_tagging_loss=0.009059, over 15106.00 frames. ], tot_loss[loss=0.09216, simple_loss=0.1088, pruned_loss=0.02698, audio_tagging_loss=0.01077, over 3044324.19 frames. ], batch size: 57, lr: 9.77e-03, grad_scale: 16.0 2023-11-19 02:51:50,448 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=531926.6666666666, ans=0.125 2023-11-19 02:51:50,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=531926.6666666666, ans=0.0 2023-11-19 02:51:55,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=531926.6666666666, ans=15.0 2023-11-19 02:51:55,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=531993.3333333334, ans=0.2 2023-11-19 02:51:58,149 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=531993.3333333334, ans=0.0 2023-11-19 02:52:28,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532126.6666666666, ans=0.1 2023-11-19 02:52:33,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=532193.3333333334, ans=0.125 2023-11-19 02:52:39,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=532193.3333333334, ans=0.2 2023-11-19 02:52:41,030 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7700, loss[loss=0.08261, simple_loss=0.09038, pruned_loss=0.02644, audio_tagging_loss=0.01098, over 14760.00 frames. ], tot_loss[loss=0.09222, simple_loss=0.1089, pruned_loss=0.02687, audio_tagging_loss=0.01091, over 3042419.55 frames. ], batch size: 57, lr: 9.77e-03, grad_scale: 16.0 2023-11-19 02:52:42,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=532260.0, ans=0.0 2023-11-19 02:52:42,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=532260.0, ans=0.2 2023-11-19 02:52:45,404 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=532260.0, ans=0.0 2023-11-19 02:52:58,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=532326.6666666666, ans=0.1 2023-11-19 02:53:08,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2023-11-19 02:53:17,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=15.0 2023-11-19 02:53:17,948 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.485e+01 9.381e+01 1.068e+02 1.739e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-19 02:53:30,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=532526.6666666666, ans=0.2 2023-11-19 02:53:35,822 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7750, loss[loss=0.08387, simple_loss=0.09313, pruned_loss=0.02407, audio_tagging_loss=0.01323, over 14535.00 frames. ], tot_loss[loss=0.09222, simple_loss=0.1091, pruned_loss=0.02671, audio_tagging_loss=0.01094, over 3043922.76 frames. ], batch size: 55, lr: 9.77e-03, grad_scale: 16.0 2023-11-19 02:53:39,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2023-11-19 02:53:40,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=532593.3333333334, ans=0.125 2023-11-19 02:53:43,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=532593.3333333334, ans=0.1 2023-11-19 02:53:53,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=532660.0, ans=0.0 2023-11-19 02:53:58,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=532726.6666666666, ans=0.125 2023-11-19 02:54:04,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=532726.6666666666, ans=0.0 2023-11-19 02:54:22,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=12.0 2023-11-19 02:54:23,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=532860.0, ans=0.125 2023-11-19 02:54:27,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=532860.0, ans=0.125 2023-11-19 02:54:31,620 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7800, loss[loss=0.08107, simple_loss=0.0962, pruned_loss=0.02225, audio_tagging_loss=0.01073, over 14586.00 frames. ], tot_loss[loss=0.0929, simple_loss=0.1102, pruned_loss=0.02691, audio_tagging_loss=0.01091, over 3037436.17 frames. ], batch size: 56, lr: 9.76e-03, grad_scale: 16.0 2023-11-19 02:54:31,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=532926.6666666666, ans=0.0 2023-11-19 02:54:45,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=532993.3333333334, ans=0.125 2023-11-19 02:54:57,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=533060.0, ans=0.0 2023-11-19 02:55:04,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=533126.6666666666, ans=0.2 2023-11-19 02:55:07,938 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.565e+01 9.591e+01 1.072e+02 1.947e+02, threshold=1.918e+02, percent-clipped=1.0 2023-11-19 02:55:27,526 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7850, loss[loss=0.1058, simple_loss=0.1222, pruned_loss=0.02966, audio_tagging_loss=0.01505, over 14466.00 frames. ], tot_loss[loss=0.09239, simple_loss=0.1096, pruned_loss=0.02668, audio_tagging_loss=0.01089, over 3038718.90 frames. ], batch size: 55, lr: 9.76e-03, grad_scale: 16.0 2023-11-19 02:55:42,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=533326.6666666666, ans=0.125 2023-11-19 02:55:50,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=533393.3333333334, ans=0.025 2023-11-19 02:55:58,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=533393.3333333334, ans=0.125 2023-11-19 02:56:24,747 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7900, loss[loss=0.1009, simple_loss=0.1189, pruned_loss=0.03118, audio_tagging_loss=0.01026, over 15210.00 frames. ], tot_loss[loss=0.09223, simple_loss=0.1093, pruned_loss=0.02665, audio_tagging_loss=0.01095, over 3041560.91 frames. ], batch size: 60, lr: 9.76e-03, grad_scale: 16.0 2023-11-19 02:56:41,117 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.86 vs. limit=10.0 2023-11-19 02:56:46,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=533660.0, ans=0.125 2023-11-19 02:56:51,200 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:56:55,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=533726.6666666666, ans=0.125 2023-11-19 02:56:57,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=533793.3333333334, ans=0.125 2023-11-19 02:57:01,474 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 8.512e+01 9.298e+01 1.008e+02 1.380e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 02:57:03,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=533793.3333333334, ans=0.2 2023-11-19 02:57:08,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=533860.0, ans=0.1 2023-11-19 02:57:14,469 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=533860.0, ans=0.2 2023-11-19 02:57:19,988 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 7950, loss[loss=0.09508, simple_loss=0.1177, pruned_loss=0.02748, audio_tagging_loss=0.008751, over 15966.00 frames. ], tot_loss[loss=0.09343, simple_loss=0.1112, pruned_loss=0.02698, audio_tagging_loss=0.01084, over 3043661.56 frames. ], batch size: 58, lr: 9.75e-03, grad_scale: 16.0 2023-11-19 02:57:23,501 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.075e-03 2023-11-19 02:57:34,939 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 02:57:38,218 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=533993.3333333334, ans=0.125 2023-11-19 02:57:45,270 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.59 vs. limit=8.0 2023-11-19 02:57:50,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=534060.0, ans=0.125 2023-11-19 02:57:54,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=12.0 2023-11-19 02:58:16,026 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8000, loss[loss=0.1191, simple_loss=0.1437, pruned_loss=0.03641, audio_tagging_loss=0.01086, over 15826.00 frames. ], tot_loss[loss=0.09265, simple_loss=0.1101, pruned_loss=0.0266, audio_tagging_loss=0.01099, over 3043755.94 frames. ], batch size: 59, lr: 9.75e-03, grad_scale: 32.0 2023-11-19 02:58:20,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2023-11-19 02:58:36,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=12.0 2023-11-19 02:58:51,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2023-11-19 02:58:52,225 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.488e+01 9.029e+01 9.898e+01 1.404e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-19 02:58:52,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=534460.0, ans=0.125 2023-11-19 02:59:10,642 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8050, loss[loss=0.0847, simple_loss=0.09729, pruned_loss=0.02535, audio_tagging_loss=0.01071, over 14857.00 frames. ], tot_loss[loss=0.09205, simple_loss=0.1088, pruned_loss=0.02646, audio_tagging_loss=0.01119, over 3037496.60 frames. ], batch size: 56, lr: 9.75e-03, grad_scale: 32.0 2023-11-19 02:59:29,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=534660.0, ans=0.125 2023-11-19 02:59:44,337 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.57 vs. limit=15.0 2023-11-19 02:59:49,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=534793.3333333334, ans=0.125 2023-11-19 02:59:53,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=534793.3333333334, ans=0.125 2023-11-19 02:59:57,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=534860.0, ans=0.035 2023-11-19 03:00:00,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=534860.0, ans=0.125 2023-11-19 03:00:06,566 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8100, loss[loss=0.1034, simple_loss=0.1178, pruned_loss=0.0362, audio_tagging_loss=0.008329, over 15689.00 frames. ], tot_loss[loss=0.09255, simple_loss=0.1095, pruned_loss=0.02667, audio_tagging_loss=0.01111, over 3041334.57 frames. ], batch size: 58, lr: 9.74e-03, grad_scale: 32.0 2023-11-19 03:00:11,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=534926.6666666666, ans=0.0 2023-11-19 03:00:12,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=534926.6666666666, ans=0.2 2023-11-19 03:00:22,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=534993.3333333334, ans=0.0 2023-11-19 03:00:32,935 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=15.0 2023-11-19 03:00:42,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=535126.6666666666, ans=0.125 2023-11-19 03:00:42,977 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.821e+01 9.637e+01 1.043e+02 1.464e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-19 03:00:51,216 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=535193.3333333334, ans=0.2 2023-11-19 03:01:02,686 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8150, loss[loss=0.09402, simple_loss=0.1165, pruned_loss=0.026, audio_tagging_loss=0.00977, over 15043.00 frames. ], tot_loss[loss=0.09271, simple_loss=0.11, pruned_loss=0.02672, audio_tagging_loss=0.011, over 3044013.99 frames. ], batch size: 54, lr: 9.74e-03, grad_scale: 32.0 2023-11-19 03:01:23,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2023-11-19 03:01:32,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=535393.3333333334, ans=0.04949747468305833 2023-11-19 03:01:35,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=535460.0, ans=0.025 2023-11-19 03:01:50,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=535526.6666666666, ans=0.125 2023-11-19 03:01:57,926 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8200, loss[loss=0.06893, simple_loss=0.0796, pruned_loss=0.01689, audio_tagging_loss=0.01224, over 15232.00 frames. ], tot_loss[loss=0.09334, simple_loss=0.111, pruned_loss=0.02704, audio_tagging_loss=0.0108, over 3046566.15 frames. ], batch size: 58, lr: 9.74e-03, grad_scale: 32.0 2023-11-19 03:02:00,015 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:02:06,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=535593.3333333334, ans=0.035 2023-11-19 03:02:08,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=535660.0, ans=0.025 2023-11-19 03:02:19,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.68 vs. limit=22.5 2023-11-19 03:02:23,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=535726.6666666666, ans=0.0 2023-11-19 03:02:29,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=535726.6666666666, ans=0.0 2023-11-19 03:02:30,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=535793.3333333334, ans=0.1 2023-11-19 03:02:34,968 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.718e+01 8.630e+01 9.276e+01 1.032e+02 1.538e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-19 03:02:53,479 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8250, loss[loss=0.1137, simple_loss=0.1324, pruned_loss=0.03651, audio_tagging_loss=0.01095, over 15727.00 frames. ], tot_loss[loss=0.09396, simple_loss=0.1117, pruned_loss=0.02742, audio_tagging_loss=0.01068, over 3051326.51 frames. ], batch size: 57, lr: 9.74e-03, grad_scale: 32.0 2023-11-19 03:03:37,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=536193.3333333334, ans=0.125 2023-11-19 03:03:48,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.79 vs. limit=22.5 2023-11-19 03:03:49,003 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=536260.0, ans=0.1 2023-11-19 03:03:49,896 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8300, loss[loss=0.09085, simple_loss=0.1145, pruned_loss=0.02201, audio_tagging_loss=0.0116, over 15037.00 frames. ], tot_loss[loss=0.09245, simple_loss=0.1099, pruned_loss=0.02672, audio_tagging_loss=0.01077, over 3043373.42 frames. ], batch size: 57, lr: 9.73e-03, grad_scale: 32.0 2023-11-19 03:04:11,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=536393.3333333334, ans=0.1 2023-11-19 03:04:11,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.49 vs. limit=22.5 2023-11-19 03:04:20,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=536393.3333333334, ans=0.0 2023-11-19 03:04:23,896 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:04:27,420 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.802e+01 9.688e+01 1.089e+02 1.659e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-19 03:04:38,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=536526.6666666666, ans=0.2 2023-11-19 03:04:45,392 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8350, loss[loss=0.08197, simple_loss=0.09484, pruned_loss=0.02402, audio_tagging_loss=0.01053, over 14968.00 frames. ], tot_loss[loss=0.0928, simple_loss=0.1106, pruned_loss=0.02681, audio_tagging_loss=0.01071, over 3044285.26 frames. ], batch size: 55, lr: 9.73e-03, grad_scale: 16.0 2023-11-19 03:04:46,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=536593.3333333334, ans=0.0 2023-11-19 03:05:17,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=536793.3333333334, ans=0.125 2023-11-19 03:05:33,277 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=536860.0, ans=0.09899494936611666 2023-11-19 03:05:38,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=536860.0, ans=0.125 2023-11-19 03:05:39,844 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=12.0 2023-11-19 03:05:40,347 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8400, loss[loss=0.06767, simple_loss=0.07349, pruned_loss=0.01754, audio_tagging_loss=0.01338, over 14153.00 frames. ], tot_loss[loss=0.09307, simple_loss=0.1108, pruned_loss=0.02691, audio_tagging_loss=0.01076, over 3041899.25 frames. ], batch size: 53, lr: 9.73e-03, grad_scale: 32.0 2023-11-19 03:05:44,761 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2023-11-19 03:05:47,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=536926.6666666666, ans=0.2 2023-11-19 03:05:53,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=536993.3333333334, ans=0.125 2023-11-19 03:05:53,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=536993.3333333334, ans=0.125 2023-11-19 03:05:56,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=536993.3333333334, ans=0.05 2023-11-19 03:06:03,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=537060.0, ans=0.1 2023-11-19 03:06:04,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.50 vs. limit=10.0 2023-11-19 03:06:18,488 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.679e+01 9.349e+01 1.017e+02 2.307e+02, threshold=1.870e+02, percent-clipped=1.0 2023-11-19 03:06:18,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=537126.6666666666, ans=0.05 2023-11-19 03:06:22,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=537126.6666666666, ans=0.0 2023-11-19 03:06:36,882 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8450, loss[loss=0.05583, simple_loss=0.06008, pruned_loss=0.01238, audio_tagging_loss=0.01341, over 15299.00 frames. ], tot_loss[loss=0.09263, simple_loss=0.1098, pruned_loss=0.02684, audio_tagging_loss=0.01089, over 3039319.45 frames. ], batch size: 60, lr: 9.72e-03, grad_scale: 32.0 2023-11-19 03:06:39,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=537260.0, ans=0.0 2023-11-19 03:07:04,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=537393.3333333334, ans=0.0 2023-11-19 03:07:09,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=537460.0, ans=0.0 2023-11-19 03:07:15,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=537460.0, ans=0.2 2023-11-19 03:07:21,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=537526.6666666666, ans=0.125 2023-11-19 03:07:28,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=537526.6666666666, ans=0.2 2023-11-19 03:07:29,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=537526.6666666666, ans=0.0 2023-11-19 03:07:31,465 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8500, loss[loss=0.1033, simple_loss=0.1195, pruned_loss=0.03306, audio_tagging_loss=0.01045, over 15588.00 frames. ], tot_loss[loss=0.09276, simple_loss=0.1099, pruned_loss=0.02684, audio_tagging_loss=0.01096, over 3043333.54 frames. ], batch size: 57, lr: 9.72e-03, grad_scale: 32.0 2023-11-19 03:07:38,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=537593.3333333334, ans=0.125 2023-11-19 03:07:50,932 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2023-11-19 03:08:09,036 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.775e+01 8.768e+01 9.309e+01 1.039e+02 1.379e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-19 03:08:26,575 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8550, loss[loss=0.07439, simple_loss=0.07383, pruned_loss=0.02117, audio_tagging_loss=0.0163, over 13995.00 frames. ], tot_loss[loss=0.0919, simple_loss=0.109, pruned_loss=0.02639, audio_tagging_loss=0.01099, over 3046435.96 frames. ], batch size: 57, lr: 9.72e-03, grad_scale: 32.0 2023-11-19 03:08:48,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=538060.0, ans=0.2 2023-11-19 03:08:51,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=538060.0, ans=0.125 2023-11-19 03:09:05,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=538126.6666666666, ans=0.0 2023-11-19 03:09:11,323 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2023-11-19 03:09:21,672 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:09:22,935 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8600, loss[loss=0.07605, simple_loss=0.08911, pruned_loss=0.01753, audio_tagging_loss=0.01397, over 15429.00 frames. ], tot_loss[loss=0.09262, simple_loss=0.1095, pruned_loss=0.02692, audio_tagging_loss=0.01095, over 3050576.44 frames. ], batch size: 58, lr: 9.71e-03, grad_scale: 32.0 2023-11-19 03:09:39,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=538326.6666666666, ans=0.125 2023-11-19 03:09:55,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=538460.0, ans=0.125 2023-11-19 03:09:59,804 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.620e+01 8.913e+01 9.582e+01 1.068e+02 1.371e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-19 03:10:10,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=538526.6666666666, ans=0.125 2023-11-19 03:10:17,869 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8650, loss[loss=0.1026, simple_loss=0.1256, pruned_loss=0.03343, audio_tagging_loss=0.006395, over 15227.00 frames. ], tot_loss[loss=0.09428, simple_loss=0.1114, pruned_loss=0.02762, audio_tagging_loss=0.01095, over 3057168.77 frames. ], batch size: 56, lr: 9.71e-03, grad_scale: 32.0 2023-11-19 03:10:51,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2023-11-19 03:10:54,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=538793.3333333334, ans=0.125 2023-11-19 03:11:02,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=538860.0, ans=0.125 2023-11-19 03:11:13,648 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8700, loss[loss=0.06439, simple_loss=0.06622, pruned_loss=0.01942, audio_tagging_loss=0.01186, over 14391.00 frames. ], tot_loss[loss=0.09434, simple_loss=0.1112, pruned_loss=0.02773, audio_tagging_loss=0.01101, over 3057446.53 frames. ], batch size: 56, lr: 9.71e-03, grad_scale: 32.0 2023-11-19 03:11:15,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=538926.6666666666, ans=0.0 2023-11-19 03:11:19,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=538926.6666666666, ans=0.0 2023-11-19 03:11:20,701 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=538926.6666666666, ans=0.125 2023-11-19 03:11:23,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=538993.3333333334, ans=0.125 2023-11-19 03:11:50,664 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.247e+01 8.784e+01 9.683e+01 1.064e+02 1.511e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-19 03:11:52,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=539126.6666666666, ans=0.125 2023-11-19 03:12:09,096 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8750, loss[loss=0.09615, simple_loss=0.1147, pruned_loss=0.02582, audio_tagging_loss=0.01297, over 15790.00 frames. ], tot_loss[loss=0.09377, simple_loss=0.1106, pruned_loss=0.02739, audio_tagging_loss=0.01107, over 3050675.10 frames. ], batch size: 58, lr: 9.71e-03, grad_scale: 32.0 2023-11-19 03:12:34,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.04 vs. limit=22.5 2023-11-19 03:12:37,660 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-11-19 03:12:47,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=539460.0, ans=0.125 2023-11-19 03:12:54,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.17 vs. limit=10.0 2023-11-19 03:13:03,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=539593.3333333334, ans=0.125 2023-11-19 03:13:04,390 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8800, loss[loss=0.1062, simple_loss=0.1302, pruned_loss=0.03029, audio_tagging_loss=0.01075, over 14125.00 frames. ], tot_loss[loss=0.09372, simple_loss=0.1107, pruned_loss=0.02722, audio_tagging_loss=0.01113, over 3043648.84 frames. ], batch size: 54, lr: 9.70e-03, grad_scale: 32.0 2023-11-19 03:13:05,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=539593.3333333334, ans=0.2 2023-11-19 03:13:15,251 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:13:24,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=539660.0, ans=0.0 2023-11-19 03:13:28,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=539726.6666666666, ans=0.2 2023-11-19 03:13:34,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=15.0 2023-11-19 03:13:40,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=539793.3333333334, ans=0.1 2023-11-19 03:13:42,593 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.511e+01 8.574e+01 9.508e+01 1.041e+02 1.765e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 03:13:46,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=539793.3333333334, ans=0.125 2023-11-19 03:13:59,443 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8850, loss[loss=0.08025, simple_loss=0.09342, pruned_loss=0.02155, audio_tagging_loss=0.01199, over 14154.00 frames. ], tot_loss[loss=0.093, simple_loss=0.11, pruned_loss=0.02676, audio_tagging_loss=0.01121, over 3048203.03 frames. ], batch size: 55, lr: 9.70e-03, grad_scale: 32.0 2023-11-19 03:14:12,407 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:14:13,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=539993.3333333334, ans=0.125 2023-11-19 03:14:28,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=540060.0, ans=0.125 2023-11-19 03:14:43,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=540193.3333333334, ans=0.125 2023-11-19 03:14:51,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.10 vs. limit=22.5 2023-11-19 03:14:52,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=540193.3333333334, ans=0.125 2023-11-19 03:14:55,121 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8900, loss[loss=0.08042, simple_loss=0.08934, pruned_loss=0.02576, audio_tagging_loss=0.009987, over 15850.00 frames. ], tot_loss[loss=0.09331, simple_loss=0.1102, pruned_loss=0.0271, audio_tagging_loss=0.01111, over 3044818.06 frames. ], batch size: 61, lr: 9.70e-03, grad_scale: 32.0 2023-11-19 03:15:06,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=540326.6666666666, ans=0.125 2023-11-19 03:15:10,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=540326.6666666666, ans=0.0 2023-11-19 03:15:18,676 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2023-11-19 03:15:32,227 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.117e+01 8.732e+01 9.510e+01 1.041e+02 1.883e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 03:15:43,894 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.36 vs. limit=22.5 2023-11-19 03:15:50,760 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 8950, loss[loss=0.09708, simple_loss=0.1123, pruned_loss=0.03032, audio_tagging_loss=0.01063, over 14326.00 frames. ], tot_loss[loss=0.09317, simple_loss=0.1105, pruned_loss=0.02707, audio_tagging_loss=0.01086, over 3047239.18 frames. ], batch size: 55, lr: 9.69e-03, grad_scale: 32.0 2023-11-19 03:15:55,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=540593.3333333334, ans=0.125 2023-11-19 03:16:02,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=540660.0, ans=0.0 2023-11-19 03:16:20,755 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:16:24,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=540793.3333333334, ans=0.125 2023-11-19 03:16:38,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=540860.0, ans=0.125 2023-11-19 03:16:41,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=540860.0, ans=0.125 2023-11-19 03:16:45,767 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9000, loss[loss=0.1206, simple_loss=0.1447, pruned_loss=0.03776, audio_tagging_loss=0.01049, over 15401.00 frames. ], tot_loss[loss=0.09378, simple_loss=0.1116, pruned_loss=0.02732, audio_tagging_loss=0.01067, over 3054477.34 frames. ], batch size: 56, lr: 9.69e-03, grad_scale: 32.0 2023-11-19 03:16:45,768 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 03:16:56,906 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5570, 3.6396, 4.2954, 3.4357], device='cuda:1') 2023-11-19 03:17:09,121 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([0.9544, 3.3012, 2.5089, 2.7368, 3.6062, 3.7092, 3.1197, 3.7435], device='cuda:1') 2023-11-19 03:17:18,029 INFO [train_asr.py:1147] (1/4) Epoch 7, validation: loss=0.06875, simple_loss=0.05761, pruned_loss=0.007498, audio_tagging_loss=0.03244, over 4681554.00 frames. 2023-11-19 03:17:18,029 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 03:17:18,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.41 vs. limit=10.0 2023-11-19 03:17:28,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=540993.3333333334, ans=0.0 2023-11-19 03:17:38,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=541060.0, ans=0.025 2023-11-19 03:17:54,257 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 8.732e+01 9.313e+01 1.034e+02 1.719e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 03:17:54,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=541126.6666666666, ans=0.0 2023-11-19 03:17:58,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=541126.6666666666, ans=0.125 2023-11-19 03:18:04,341 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2023-11-19 03:18:12,231 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9050, loss[loss=0.1051, simple_loss=0.1265, pruned_loss=0.03014, audio_tagging_loss=0.01172, over 15621.00 frames. ], tot_loss[loss=0.0929, simple_loss=0.1105, pruned_loss=0.02697, audio_tagging_loss=0.01067, over 3046605.18 frames. ], batch size: 58, lr: 9.69e-03, grad_scale: 32.0 2023-11-19 03:18:17,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=541260.0, ans=0.0 2023-11-19 03:18:18,209 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=15.0 2023-11-19 03:18:36,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541393.3333333334, ans=0.1 2023-11-19 03:18:47,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=541460.0, ans=0.0 2023-11-19 03:18:57,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541526.6666666666, ans=0.1 2023-11-19 03:19:07,363 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9100, loss[loss=0.08684, simple_loss=0.1061, pruned_loss=0.02492, audio_tagging_loss=0.008848, over 13277.00 frames. ], tot_loss[loss=0.09223, simple_loss=0.11, pruned_loss=0.0267, audio_tagging_loss=0.0105, over 3049296.30 frames. ], batch size: 52, lr: 9.68e-03, grad_scale: 32.0 2023-11-19 03:19:07,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=541593.3333333334, ans=0.0 2023-11-19 03:19:15,336 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2023-11-19 03:19:20,814 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=541660.0, ans=0.0 2023-11-19 03:19:36,996 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.23 vs. limit=6.0 2023-11-19 03:19:39,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=541793.3333333334, ans=0.125 2023-11-19 03:19:41,028 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:19:44,939 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.588e+01 9.392e+01 1.039e+02 1.289e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-19 03:20:02,292 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9150, loss[loss=0.09612, simple_loss=0.1146, pruned_loss=0.02704, audio_tagging_loss=0.01179, over 15012.00 frames. ], tot_loss[loss=0.09278, simple_loss=0.1108, pruned_loss=0.02681, audio_tagging_loss=0.01059, over 3044458.75 frames. ], batch size: 55, lr: 9.68e-03, grad_scale: 32.0 2023-11-19 03:20:12,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541926.6666666666, ans=0.1 2023-11-19 03:20:17,314 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:20:21,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=541993.3333333334, ans=0.0 2023-11-19 03:20:42,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=542126.6666666666, ans=10.0 2023-11-19 03:20:56,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=542260.0, ans=0.125 2023-11-19 03:20:57,895 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9200, loss[loss=0.09835, simple_loss=0.1068, pruned_loss=0.0347, audio_tagging_loss=0.01027, over 16883.00 frames. ], tot_loss[loss=0.09317, simple_loss=0.1109, pruned_loss=0.02701, audio_tagging_loss=0.01069, over 3047011.30 frames. ], batch size: 67, lr: 9.68e-03, grad_scale: 32.0 2023-11-19 03:21:00,613 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.23 vs. limit=22.5 2023-11-19 03:21:01,632 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2023-11-19 03:21:03,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=542260.0, ans=0.1 2023-11-19 03:21:20,265 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=542393.3333333334, ans=0.07 2023-11-19 03:21:34,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=542460.0, ans=0.0 2023-11-19 03:21:34,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=542460.0, ans=0.125 2023-11-19 03:21:36,294 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.578e+01 9.333e+01 1.009e+02 1.862e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-19 03:21:37,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=542460.0, ans=0.1 2023-11-19 03:21:52,038 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9250, loss[loss=0.1104, simple_loss=0.1395, pruned_loss=0.03207, audio_tagging_loss=0.008559, over 15842.00 frames. ], tot_loss[loss=0.09344, simple_loss=0.1113, pruned_loss=0.02717, audio_tagging_loss=0.01062, over 3053762.57 frames. ], batch size: 58, lr: 9.68e-03, grad_scale: 32.0 2023-11-19 03:21:57,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=542593.3333333334, ans=0.1 2023-11-19 03:22:06,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.26 vs. limit=15.0 2023-11-19 03:22:24,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=542793.3333333334, ans=0.09899494936611666 2023-11-19 03:22:26,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.07 vs. limit=22.5 2023-11-19 03:22:47,237 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9300, loss[loss=0.136, simple_loss=0.1681, pruned_loss=0.04355, audio_tagging_loss=0.008465, over 16264.00 frames. ], tot_loss[loss=0.09278, simple_loss=0.1107, pruned_loss=0.02678, audio_tagging_loss=0.01065, over 3051263.22 frames. ], batch size: 56, lr: 9.67e-03, grad_scale: 32.0 2023-11-19 03:22:50,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=542926.6666666666, ans=15.0 2023-11-19 03:22:56,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=542926.6666666666, ans=0.0 2023-11-19 03:22:57,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.86 vs. limit=15.0 2023-11-19 03:23:01,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=542993.3333333334, ans=0.125 2023-11-19 03:23:09,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=543060.0, ans=0.1 2023-11-19 03:23:26,427 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.462e+01 9.179e+01 9.907e+01 1.156e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-19 03:23:30,337 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=543193.3333333334, ans=0.0 2023-11-19 03:23:37,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=543193.3333333334, ans=0.0 2023-11-19 03:23:42,862 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9350, loss[loss=0.09334, simple_loss=0.1253, pruned_loss=0.02111, audio_tagging_loss=0.009584, over 15902.00 frames. ], tot_loss[loss=0.09293, simple_loss=0.1109, pruned_loss=0.02679, audio_tagging_loss=0.01068, over 3061942.15 frames. ], batch size: 59, lr: 9.67e-03, grad_scale: 16.0 2023-11-19 03:23:43,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=543260.0, ans=0.1 2023-11-19 03:23:58,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=543326.6666666666, ans=0.125 2023-11-19 03:23:59,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.09 vs. limit=10.0 2023-11-19 03:24:14,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=543460.0, ans=0.125 2023-11-19 03:24:16,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=543460.0, ans=0.0 2023-11-19 03:24:17,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=543460.0, ans=0.0 2023-11-19 03:24:18,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=543460.0, ans=0.2 2023-11-19 03:24:20,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=543460.0, ans=0.0 2023-11-19 03:24:37,155 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9400, loss[loss=0.09604, simple_loss=0.1062, pruned_loss=0.02877, audio_tagging_loss=0.01419, over 16218.00 frames. ], tot_loss[loss=0.09351, simple_loss=0.1115, pruned_loss=0.027, audio_tagging_loss=0.01074, over 3058267.39 frames. ], batch size: 64, lr: 9.67e-03, grad_scale: 16.0 2023-11-19 03:24:37,947 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-19 03:24:43,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=543593.3333333334, ans=0.0 2023-11-19 03:25:11,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=543793.3333333334, ans=0.125 2023-11-19 03:25:16,212 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2023-11-19 03:25:16,703 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.406e+01 9.126e+01 1.047e+02 1.267e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 03:25:16,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=543793.3333333334, ans=0.5 2023-11-19 03:25:21,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=15.0 2023-11-19 03:25:25,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=543860.0, ans=0.125 2023-11-19 03:25:25,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=543860.0, ans=0.125 2023-11-19 03:25:31,572 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9450, loss[loss=0.0904, simple_loss=0.1067, pruned_loss=0.02664, audio_tagging_loss=0.0104, over 15581.00 frames. ], tot_loss[loss=0.09288, simple_loss=0.1107, pruned_loss=0.02664, audio_tagging_loss=0.0109, over 3054345.55 frames. ], batch size: 57, lr: 9.66e-03, grad_scale: 16.0 2023-11-19 03:25:31,580 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:25:46,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-11-19 03:25:55,180 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.65 vs. limit=12.0 2023-11-19 03:25:59,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=15.0 2023-11-19 03:26:07,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=544126.6666666666, ans=0.125 2023-11-19 03:26:09,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=544126.6666666666, ans=0.0 2023-11-19 03:26:11,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=544126.6666666666, ans=0.0 2023-11-19 03:26:20,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=544193.3333333334, ans=0.0 2023-11-19 03:26:28,038 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9500, loss[loss=0.09774, simple_loss=0.1135, pruned_loss=0.02849, audio_tagging_loss=0.01251, over 15741.00 frames. ], tot_loss[loss=0.09314, simple_loss=0.1111, pruned_loss=0.02677, audio_tagging_loss=0.01083, over 3051999.82 frames. ], batch size: 58, lr: 9.66e-03, grad_scale: 16.0 2023-11-19 03:26:34,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=22.5 2023-11-19 03:26:38,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2023-11-19 03:26:54,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=544393.3333333334, ans=0.125 2023-11-19 03:26:55,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=544393.3333333334, ans=0.1 2023-11-19 03:27:08,271 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.649e+01 9.463e+01 1.058e+02 1.966e+02, threshold=1.893e+02, percent-clipped=1.0 2023-11-19 03:27:23,681 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9550, loss[loss=0.103, simple_loss=0.1173, pruned_loss=0.02913, audio_tagging_loss=0.01516, over 13833.00 frames. ], tot_loss[loss=0.09342, simple_loss=0.1115, pruned_loss=0.02685, audio_tagging_loss=0.01082, over 3044837.48 frames. ], batch size: 53, lr: 9.66e-03, grad_scale: 16.0 2023-11-19 03:27:29,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=544593.3333333334, ans=0.0 2023-11-19 03:27:37,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=544660.0, ans=0.1 2023-11-19 03:27:55,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=544726.6666666666, ans=0.125 2023-11-19 03:27:55,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=544726.6666666666, ans=0.0 2023-11-19 03:28:00,670 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=12.0 2023-11-19 03:28:06,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=544793.3333333334, ans=0.2 2023-11-19 03:28:06,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=544793.3333333334, ans=0.0 2023-11-19 03:28:06,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.58 vs. limit=22.5 2023-11-19 03:28:07,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=544860.0, ans=0.05 2023-11-19 03:28:10,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=544860.0, ans=0.07 2023-11-19 03:28:18,805 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9600, loss[loss=0.1129, simple_loss=0.1278, pruned_loss=0.0413, audio_tagging_loss=0.007763, over 14143.00 frames. ], tot_loss[loss=0.09326, simple_loss=0.111, pruned_loss=0.02675, audio_tagging_loss=0.011, over 3045454.32 frames. ], batch size: 56, lr: 9.66e-03, grad_scale: 32.0 2023-11-19 03:28:35,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=544993.3333333334, ans=0.0 2023-11-19 03:28:41,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=545060.0, ans=0.0 2023-11-19 03:28:44,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=545060.0, ans=0.2 2023-11-19 03:28:59,136 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.567e+01 8.431e+01 9.173e+01 1.006e+02 1.337e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 03:29:14,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=545260.0, ans=0.1 2023-11-19 03:29:15,057 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9650, loss[loss=0.07608, simple_loss=0.08549, pruned_loss=0.02208, audio_tagging_loss=0.01126, over 14971.00 frames. ], tot_loss[loss=0.09235, simple_loss=0.11, pruned_loss=0.02626, audio_tagging_loss=0.01108, over 3049424.34 frames. ], batch size: 56, lr: 9.65e-03, grad_scale: 32.0 2023-11-19 03:29:29,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=545326.6666666666, ans=0.125 2023-11-19 03:29:42,281 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:29:44,340 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=545393.3333333334, ans=0.0 2023-11-19 03:29:48,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=545460.0, ans=0.125 2023-11-19 03:30:01,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=545526.6666666666, ans=0.0 2023-11-19 03:30:10,038 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9700, loss[loss=0.1359, simple_loss=0.1558, pruned_loss=0.04878, audio_tagging_loss=0.009201, over 15997.00 frames. ], tot_loss[loss=0.09305, simple_loss=0.1112, pruned_loss=0.02662, audio_tagging_loss=0.01083, over 3055294.07 frames. ], batch size: 57, lr: 9.65e-03, grad_scale: 32.0 2023-11-19 03:30:13,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=545593.3333333334, ans=0.125 2023-11-19 03:30:50,520 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.966e+01 8.564e+01 9.508e+01 1.033e+02 1.418e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 03:30:52,931 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=545793.3333333334, ans=10.0 2023-11-19 03:31:01,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=545860.0, ans=0.125 2023-11-19 03:31:05,787 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9750, loss[loss=0.09035, simple_loss=0.1079, pruned_loss=0.02586, audio_tagging_loss=0.01054, over 15516.00 frames. ], tot_loss[loss=0.09297, simple_loss=0.111, pruned_loss=0.02674, audio_tagging_loss=0.01074, over 3053260.87 frames. ], batch size: 58, lr: 9.65e-03, grad_scale: 32.0 2023-11-19 03:31:31,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=546060.0, ans=0.125 2023-11-19 03:31:48,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=546126.6666666666, ans=0.125 2023-11-19 03:32:00,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=546193.3333333334, ans=0.125 2023-11-19 03:32:02,954 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9800, loss[loss=0.09798, simple_loss=0.1154, pruned_loss=0.02818, audio_tagging_loss=0.01209, over 15720.00 frames. ], tot_loss[loss=0.09312, simple_loss=0.1111, pruned_loss=0.02687, audio_tagging_loss=0.01069, over 3048176.53 frames. ], batch size: 58, lr: 9.64e-03, grad_scale: 32.0 2023-11-19 03:32:03,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=546260.0, ans=0.0 2023-11-19 03:32:06,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=546260.0, ans=0.125 2023-11-19 03:32:14,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=546326.6666666666, ans=0.0 2023-11-19 03:32:20,629 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2023-11-19 03:32:34,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=546460.0, ans=0.125 2023-11-19 03:32:43,159 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.045e+01 8.602e+01 9.393e+01 1.096e+02 1.685e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 03:32:52,699 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:32:57,950 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9850, loss[loss=0.08407, simple_loss=0.1038, pruned_loss=0.022, audio_tagging_loss=0.01019, over 14187.00 frames. ], tot_loss[loss=0.09349, simple_loss=0.1117, pruned_loss=0.027, audio_tagging_loss=0.01063, over 3046749.18 frames. ], batch size: 53, lr: 9.64e-03, grad_scale: 32.0 2023-11-19 03:33:49,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=546860.0, ans=0.2 2023-11-19 03:33:53,971 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9900, loss[loss=0.1044, simple_loss=0.1364, pruned_loss=0.02957, audio_tagging_loss=0.006622, over 14925.00 frames. ], tot_loss[loss=0.09421, simple_loss=0.1129, pruned_loss=0.02728, audio_tagging_loss=0.01046, over 3044687.06 frames. ], batch size: 57, lr: 9.64e-03, grad_scale: 32.0 2023-11-19 03:34:05,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=546993.3333333334, ans=0.0 2023-11-19 03:34:13,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=546993.3333333334, ans=0.125 2023-11-19 03:34:13,992 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.24 vs. limit=10.0 2023-11-19 03:34:23,402 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2023-11-19 03:34:34,603 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.700e+01 9.311e+01 1.023e+02 1.421e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-19 03:34:40,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=22.5 2023-11-19 03:34:47,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=547193.3333333334, ans=0.0 2023-11-19 03:34:47,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=547193.3333333334, ans=0.1 2023-11-19 03:34:48,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=547193.3333333334, ans=0.125 2023-11-19 03:34:50,609 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 9950, loss[loss=0.08949, simple_loss=0.09657, pruned_loss=0.02711, audio_tagging_loss=0.01409, over 15321.00 frames. ], tot_loss[loss=0.09358, simple_loss=0.1119, pruned_loss=0.02706, audio_tagging_loss=0.01059, over 3051957.32 frames. ], batch size: 58, lr: 9.64e-03, grad_scale: 16.0 2023-11-19 03:35:01,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=547326.6666666666, ans=0.0 2023-11-19 03:35:07,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.56 vs. limit=15.0 2023-11-19 03:35:11,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=547393.3333333334, ans=0.0 2023-11-19 03:35:29,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.29 vs. limit=15.0 2023-11-19 03:35:37,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=547526.6666666666, ans=0.125 2023-11-19 03:35:45,549 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10000, loss[loss=0.07919, simple_loss=0.09178, pruned_loss=0.02107, audio_tagging_loss=0.01223, over 15206.00 frames. ], tot_loss[loss=0.0928, simple_loss=0.1109, pruned_loss=0.02677, audio_tagging_loss=0.01059, over 3046725.02 frames. ], batch size: 61, lr: 9.63e-03, grad_scale: 32.0 2023-11-19 03:35:47,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=547593.3333333334, ans=0.0 2023-11-19 03:35:50,595 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.84 vs. limit=15.0 2023-11-19 03:35:53,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=547593.3333333334, ans=0.125 2023-11-19 03:35:59,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=547660.0, ans=0.125 2023-11-19 03:36:02,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=547660.0, ans=0.2 2023-11-19 03:36:08,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=547726.6666666666, ans=0.125 2023-11-19 03:36:09,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=547726.6666666666, ans=0.125 2023-11-19 03:36:26,883 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.651e+01 9.520e+01 1.034e+02 1.455e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 03:36:32,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=547860.0, ans=0.1 2023-11-19 03:36:40,557 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10050, loss[loss=0.09676, simple_loss=0.1197, pruned_loss=0.02624, audio_tagging_loss=0.01067, over 15077.00 frames. ], tot_loss[loss=0.09274, simple_loss=0.1104, pruned_loss=0.02685, audio_tagging_loss=0.01069, over 3042687.46 frames. ], batch size: 56, lr: 9.63e-03, grad_scale: 32.0 2023-11-19 03:36:46,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.17 vs. limit=15.0 2023-11-19 03:36:50,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=547926.6666666666, ans=0.0 2023-11-19 03:36:51,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=547993.3333333334, ans=0.1 2023-11-19 03:36:58,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.21 vs. limit=12.0 2023-11-19 03:36:59,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=547993.3333333334, ans=0.125 2023-11-19 03:37:01,629 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.45 vs. limit=10.0 2023-11-19 03:37:06,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=548060.0, ans=0.125 2023-11-19 03:37:10,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=548060.0, ans=0.2 2023-11-19 03:37:15,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=548126.6666666666, ans=0.125 2023-11-19 03:37:20,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=548126.6666666666, ans=22.5 2023-11-19 03:37:36,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=548260.0, ans=0.125 2023-11-19 03:37:37,699 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10100, loss[loss=0.08227, simple_loss=0.1074, pruned_loss=0.0177, audio_tagging_loss=0.01088, over 14509.00 frames. ], tot_loss[loss=0.09204, simple_loss=0.1096, pruned_loss=0.02648, audio_tagging_loss=0.01077, over 3041010.41 frames. ], batch size: 53, lr: 9.63e-03, grad_scale: 32.0 2023-11-19 03:37:49,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.12 vs. limit=22.5 2023-11-19 03:38:00,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=548393.3333333334, ans=0.0 2023-11-19 03:38:10,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-11-19 03:38:14,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=548460.0, ans=0.0 2023-11-19 03:38:18,419 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.585e+01 9.588e+01 1.090e+02 1.708e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-19 03:38:21,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=548526.6666666666, ans=0.125 2023-11-19 03:38:23,264 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:38:32,759 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10150, loss[loss=0.1049, simple_loss=0.123, pruned_loss=0.03355, audio_tagging_loss=0.00985, over 15691.00 frames. ], tot_loss[loss=0.09204, simple_loss=0.1095, pruned_loss=0.02641, audio_tagging_loss=0.01089, over 3043750.18 frames. ], batch size: 59, lr: 9.62e-03, grad_scale: 32.0 2023-11-19 03:38:37,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=548593.3333333334, ans=0.2 2023-11-19 03:38:38,203 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=548593.3333333334, ans=0.0 2023-11-19 03:38:39,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=548593.3333333334, ans=0.0 2023-11-19 03:38:45,705 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:38:59,161 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:39:05,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=548793.3333333334, ans=0.125 2023-11-19 03:39:08,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=548793.3333333334, ans=0.125 2023-11-19 03:39:10,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=548793.3333333334, ans=0.0 2023-11-19 03:39:10,993 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:39:27,538 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10200, loss[loss=0.1103, simple_loss=0.1256, pruned_loss=0.03863, audio_tagging_loss=0.008909, over 14756.00 frames. ], tot_loss[loss=0.09212, simple_loss=0.1095, pruned_loss=0.02638, audio_tagging_loss=0.01098, over 3044791.38 frames. ], batch size: 57, lr: 9.62e-03, grad_scale: 32.0 2023-11-19 03:39:28,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=548926.6666666666, ans=0.125 2023-11-19 03:39:44,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=548993.3333333334, ans=0.2 2023-11-19 03:39:45,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=548993.3333333334, ans=0.0 2023-11-19 03:39:48,747 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=46.32 vs. limit=15.0 2023-11-19 03:39:49,366 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:39:49,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=549060.0, ans=0.0 2023-11-19 03:39:53,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=549060.0, ans=0.125 2023-11-19 03:40:08,249 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 8.839e+01 9.897e+01 1.124e+02 1.590e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-19 03:40:08,536 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.535e-01 2023-11-19 03:40:18,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.93 vs. limit=15.0 2023-11-19 03:40:23,227 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10250, loss[loss=0.1032, simple_loss=0.1242, pruned_loss=0.02997, audio_tagging_loss=0.01111, over 14830.00 frames. ], tot_loss[loss=0.09265, simple_loss=0.11, pruned_loss=0.02667, audio_tagging_loss=0.01097, over 3049663.85 frames. ], batch size: 55, lr: 9.62e-03, grad_scale: 32.0 2023-11-19 03:40:34,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=549326.6666666666, ans=15.0 2023-11-19 03:40:35,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=549326.6666666666, ans=0.07 2023-11-19 03:40:44,923 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.272e-01 2023-11-19 03:40:58,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549460.0, ans=0.1 2023-11-19 03:41:03,242 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=549460.0, ans=0.125 2023-11-19 03:41:19,426 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10300, loss[loss=0.06442, simple_loss=0.07174, pruned_loss=0.01551, audio_tagging_loss=0.01303, over 15111.00 frames. ], tot_loss[loss=0.09304, simple_loss=0.1104, pruned_loss=0.02681, audio_tagging_loss=0.01102, over 3052642.73 frames. ], batch size: 61, lr: 9.61e-03, grad_scale: 32.0 2023-11-19 03:41:20,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=549593.3333333334, ans=0.125 2023-11-19 03:41:41,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549726.6666666666, ans=0.1 2023-11-19 03:41:50,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=549726.6666666666, ans=0.1 2023-11-19 03:41:57,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=549793.3333333334, ans=0.2 2023-11-19 03:42:00,390 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.361e+01 8.478e+01 9.203e+01 9.958e+01 1.173e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 03:42:14,016 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10350, loss[loss=0.06718, simple_loss=0.08136, pruned_loss=0.01605, audio_tagging_loss=0.01044, over 14555.00 frames. ], tot_loss[loss=0.09294, simple_loss=0.1102, pruned_loss=0.02676, audio_tagging_loss=0.01107, over 3044850.63 frames. ], batch size: 57, lr: 9.61e-03, grad_scale: 32.0 2023-11-19 03:42:20,595 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=549926.6666666666, ans=0.2 2023-11-19 03:42:26,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=549993.3333333334, ans=0.05 2023-11-19 03:42:27,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=549993.3333333334, ans=0.2 2023-11-19 03:42:29,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=549993.3333333334, ans=0.2 2023-11-19 03:42:48,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=550126.6666666666, ans=0.125 2023-11-19 03:42:53,792 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=550126.6666666666, ans=0.125 2023-11-19 03:43:03,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=550193.3333333334, ans=0.125 2023-11-19 03:43:05,667 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2023-11-19 03:43:08,766 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10400, loss[loss=0.09262, simple_loss=0.1087, pruned_loss=0.02458, audio_tagging_loss=0.01367, over 15861.00 frames. ], tot_loss[loss=0.09271, simple_loss=0.1099, pruned_loss=0.02669, audio_tagging_loss=0.01109, over 3046172.64 frames. ], batch size: 62, lr: 9.61e-03, grad_scale: 32.0 2023-11-19 03:43:15,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=550260.0, ans=0.1 2023-11-19 03:43:42,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=550460.0, ans=0.125 2023-11-19 03:43:44,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=550460.0, ans=0.0 2023-11-19 03:43:51,068 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.573e+01 9.410e+01 1.023e+02 1.490e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-19 03:43:51,375 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=550460.0, ans=0.1 2023-11-19 03:43:51,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2023-11-19 03:44:04,839 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10450, loss[loss=0.08913, simple_loss=0.1047, pruned_loss=0.02532, audio_tagging_loss=0.01147, over 14633.00 frames. ], tot_loss[loss=0.09297, simple_loss=0.11, pruned_loss=0.02691, audio_tagging_loss=0.01106, over 3042526.17 frames. ], batch size: 55, lr: 9.61e-03, grad_scale: 32.0 2023-11-19 03:44:12,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=550593.3333333334, ans=0.07 2023-11-19 03:44:39,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=550793.3333333334, ans=0.04949747468305833 2023-11-19 03:44:59,702 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10500, loss[loss=0.07908, simple_loss=0.08991, pruned_loss=0.02401, audio_tagging_loss=0.01012, over 15265.00 frames. ], tot_loss[loss=0.09265, simple_loss=0.1096, pruned_loss=0.02689, audio_tagging_loss=0.01095, over 3040017.68 frames. ], batch size: 60, lr: 9.60e-03, grad_scale: 32.0 2023-11-19 03:45:08,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=550926.6666666666, ans=0.125 2023-11-19 03:45:15,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=550993.3333333334, ans=0.125 2023-11-19 03:45:17,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=550993.3333333334, ans=0.0 2023-11-19 03:45:41,939 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.670e+01 8.439e+01 9.051e+01 1.036e+02 1.339e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 03:45:49,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=551193.3333333334, ans=0.125 2023-11-19 03:45:55,183 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10550, loss[loss=0.08697, simple_loss=0.1078, pruned_loss=0.02534, audio_tagging_loss=0.00771, over 14597.00 frames. ], tot_loss[loss=0.09269, simple_loss=0.11, pruned_loss=0.02685, audio_tagging_loss=0.01085, over 3045647.98 frames. ], batch size: 56, lr: 9.60e-03, grad_scale: 32.0 2023-11-19 03:46:04,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=551260.0, ans=0.1 2023-11-19 03:46:10,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=551326.6666666666, ans=0.125 2023-11-19 03:46:16,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=551326.6666666666, ans=0.07 2023-11-19 03:46:25,581 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=551393.3333333334, ans=0.1 2023-11-19 03:46:29,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.91 vs. limit=15.0 2023-11-19 03:46:50,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=551593.3333333334, ans=0.125 2023-11-19 03:46:51,180 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10600, loss[loss=0.08802, simple_loss=0.1013, pruned_loss=0.02475, audio_tagging_loss=0.01263, over 15081.00 frames. ], tot_loss[loss=0.0915, simple_loss=0.1085, pruned_loss=0.02638, audio_tagging_loss=0.01088, over 3033707.24 frames. ], batch size: 56, lr: 9.60e-03, grad_scale: 32.0 2023-11-19 03:46:54,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=551593.3333333334, ans=0.125 2023-11-19 03:47:12,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2023-11-19 03:47:33,798 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.037e+01 8.515e+01 9.253e+01 1.023e+02 1.317e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-19 03:47:39,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=22.5 2023-11-19 03:47:44,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=551860.0, ans=0.125 2023-11-19 03:47:47,251 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10650, loss[loss=0.04645, simple_loss=0.04947, pruned_loss=0.009091, audio_tagging_loss=0.01263, over 15968.00 frames. ], tot_loss[loss=0.09217, simple_loss=0.1096, pruned_loss=0.02656, audio_tagging_loss=0.01081, over 3038970.59 frames. ], batch size: 62, lr: 9.59e-03, grad_scale: 32.0 2023-11-19 03:48:10,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=552060.0, ans=0.0 2023-11-19 03:48:12,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=552060.0, ans=0.125 2023-11-19 03:48:12,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=552060.0, ans=0.0 2023-11-19 03:48:15,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=552060.0, ans=0.125 2023-11-19 03:48:22,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=552126.6666666666, ans=10.0 2023-11-19 03:48:22,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=552126.6666666666, ans=0.0 2023-11-19 03:48:41,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=552193.3333333334, ans=0.05 2023-11-19 03:48:43,085 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10700, loss[loss=0.07661, simple_loss=0.07893, pruned_loss=0.02262, audio_tagging_loss=0.01453, over 15473.00 frames. ], tot_loss[loss=0.09229, simple_loss=0.1099, pruned_loss=0.02662, audio_tagging_loss=0.01074, over 3036269.39 frames. ], batch size: 59, lr: 9.59e-03, grad_scale: 32.0 2023-11-19 03:49:25,174 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.213e+01 8.579e+01 9.318e+01 1.032e+02 2.166e+02, threshold=1.864e+02, percent-clipped=1.0 2023-11-19 03:49:39,636 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10750, loss[loss=0.0846, simple_loss=0.1044, pruned_loss=0.0225, audio_tagging_loss=0.00992, over 14449.00 frames. ], tot_loss[loss=0.09197, simple_loss=0.1096, pruned_loss=0.02643, audio_tagging_loss=0.01075, over 3040737.52 frames. ], batch size: 55, lr: 9.59e-03, grad_scale: 32.0 2023-11-19 03:49:43,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=552593.3333333334, ans=0.2 2023-11-19 03:49:50,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=552660.0, ans=0.125 2023-11-19 03:49:51,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=552660.0, ans=0.2 2023-11-19 03:49:54,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=552660.0, ans=0.2 2023-11-19 03:49:55,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552660.0, ans=0.1 2023-11-19 03:49:57,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552660.0, ans=0.1 2023-11-19 03:50:34,595 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10800, loss[loss=0.07827, simple_loss=0.1008, pruned_loss=0.02084, audio_tagging_loss=0.007022, over 15309.00 frames. ], tot_loss[loss=0.09214, simple_loss=0.11, pruned_loss=0.02646, audio_tagging_loss=0.01067, over 3046750.08 frames. ], batch size: 58, lr: 9.59e-03, grad_scale: 32.0 2023-11-19 03:50:41,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=552926.6666666666, ans=0.125 2023-11-19 03:50:43,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.36 vs. limit=12.0 2023-11-19 03:50:49,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=552993.3333333334, ans=0.0 2023-11-19 03:51:06,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=553060.0, ans=0.0 2023-11-19 03:51:11,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.40 vs. limit=22.5 2023-11-19 03:51:16,709 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.479e+01 8.594e+01 9.337e+01 1.055e+02 1.336e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-19 03:51:16,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=553126.6666666666, ans=0.1 2023-11-19 03:51:30,085 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10850, loss[loss=0.07215, simple_loss=0.0845, pruned_loss=0.01922, audio_tagging_loss=0.01068, over 14920.00 frames. ], tot_loss[loss=0.09231, simple_loss=0.1102, pruned_loss=0.02652, audio_tagging_loss=0.01068, over 3048259.64 frames. ], batch size: 57, lr: 9.58e-03, grad_scale: 32.0 2023-11-19 03:51:41,644 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:51:50,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=553326.6666666666, ans=0.1 2023-11-19 03:52:20,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=553526.6666666666, ans=0.2 2023-11-19 03:52:24,134 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:52:27,319 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10900, loss[loss=0.09849, simple_loss=0.1208, pruned_loss=0.02555, audio_tagging_loss=0.01252, over 16613.00 frames. ], tot_loss[loss=0.09232, simple_loss=0.1099, pruned_loss=0.02658, audio_tagging_loss=0.01081, over 3046629.04 frames. ], batch size: 61, lr: 9.58e-03, grad_scale: 32.0 2023-11-19 03:52:37,047 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=553660.0, ans=0.0 2023-11-19 03:53:08,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=553793.3333333334, ans=0.125 2023-11-19 03:53:09,521 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.290e+01 8.895e+01 9.757e+01 1.197e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-19 03:53:11,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=553860.0, ans=0.125 2023-11-19 03:53:22,210 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 10950, loss[loss=0.07491, simple_loss=0.08549, pruned_loss=0.01806, audio_tagging_loss=0.01411, over 15998.00 frames. ], tot_loss[loss=0.09184, simple_loss=0.1092, pruned_loss=0.02636, audio_tagging_loss=0.01087, over 3050327.22 frames. ], batch size: 61, lr: 9.58e-03, grad_scale: 32.0 2023-11-19 03:53:33,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=553993.3333333334, ans=0.025 2023-11-19 03:53:58,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=554126.6666666666, ans=0.125 2023-11-19 03:54:07,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=554193.3333333334, ans=0.125 2023-11-19 03:54:12,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=554193.3333333334, ans=0.125 2023-11-19 03:54:17,503 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11000, loss[loss=0.09591, simple_loss=0.1176, pruned_loss=0.02693, audio_tagging_loss=0.01018, over 15392.00 frames. ], tot_loss[loss=0.09172, simple_loss=0.1092, pruned_loss=0.02622, audio_tagging_loss=0.01088, over 3047751.28 frames. ], batch size: 57, lr: 9.57e-03, grad_scale: 32.0 2023-11-19 03:54:26,555 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:54:38,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=554393.3333333334, ans=0.1 2023-11-19 03:54:43,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=554393.3333333334, ans=0.2 2023-11-19 03:54:52,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=554460.0, ans=0.125 2023-11-19 03:54:58,748 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.653e+01 8.672e+01 9.432e+01 1.068e+02 1.333e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-19 03:54:59,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=554460.0, ans=0.125 2023-11-19 03:55:13,612 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11050, loss[loss=0.09687, simple_loss=0.1177, pruned_loss=0.02829, audio_tagging_loss=0.009756, over 14855.00 frames. ], tot_loss[loss=0.09111, simple_loss=0.1083, pruned_loss=0.02597, audio_tagging_loss=0.01102, over 3044478.95 frames. ], batch size: 56, lr: 9.57e-03, grad_scale: 32.0 2023-11-19 03:55:13,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=554593.3333333334, ans=0.1 2023-11-19 03:55:23,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=554660.0, ans=0.125 2023-11-19 03:55:29,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=554660.0, ans=0.1 2023-11-19 03:55:33,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=554660.0, ans=0.125 2023-11-19 03:55:41,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=554726.6666666666, ans=0.0 2023-11-19 03:55:44,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=554726.6666666666, ans=0.125 2023-11-19 03:55:50,154 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.11 vs. limit=15.0 2023-11-19 03:56:02,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=554860.0, ans=0.125 2023-11-19 03:56:08,849 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11100, loss[loss=0.08064, simple_loss=0.101, pruned_loss=0.01683, audio_tagging_loss=0.01329, over 14126.00 frames. ], tot_loss[loss=0.09169, simple_loss=0.1089, pruned_loss=0.02614, audio_tagging_loss=0.0111, over 3044449.88 frames. ], batch size: 53, lr: 9.57e-03, grad_scale: 32.0 2023-11-19 03:56:11,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=554926.6666666666, ans=0.125 2023-11-19 03:56:26,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=554993.3333333334, ans=0.035 2023-11-19 03:56:31,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=555060.0, ans=0.1 2023-11-19 03:56:50,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=555126.6666666666, ans=10.0 2023-11-19 03:56:51,124 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.615e+01 9.620e+01 1.023e+02 1.432e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-19 03:56:51,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=555126.6666666666, ans=0.125 2023-11-19 03:56:53,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=555193.3333333334, ans=0.1 2023-11-19 03:56:55,613 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=555193.3333333334, ans=0.125 2023-11-19 03:57:03,796 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11150, loss[loss=0.08064, simple_loss=0.0953, pruned_loss=0.02435, audio_tagging_loss=0.008647, over 14780.00 frames. ], tot_loss[loss=0.09224, simple_loss=0.1093, pruned_loss=0.02651, audio_tagging_loss=0.01109, over 3048891.15 frames. ], batch size: 57, lr: 9.57e-03, grad_scale: 32.0 2023-11-19 03:57:04,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=555260.0, ans=0.1 2023-11-19 03:57:08,350 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=555260.0, ans=0.0 2023-11-19 03:57:10,199 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=555260.0, ans=0.015 2023-11-19 03:57:15,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=555326.6666666666, ans=0.125 2023-11-19 03:57:16,768 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=555326.6666666666, ans=0.1 2023-11-19 03:57:38,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=555460.0, ans=0.125 2023-11-19 03:57:41,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=555460.0, ans=0.0 2023-11-19 03:57:59,486 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11200, loss[loss=0.1041, simple_loss=0.1332, pruned_loss=0.02865, audio_tagging_loss=0.008889, over 15843.00 frames. ], tot_loss[loss=0.09204, simple_loss=0.1089, pruned_loss=0.02638, audio_tagging_loss=0.01121, over 3047142.31 frames. ], batch size: 58, lr: 9.56e-03, grad_scale: 32.0 2023-11-19 03:58:03,725 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.04 vs. limit=22.5 2023-11-19 03:58:09,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=555660.0, ans=0.07 2023-11-19 03:58:26,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=555726.6666666666, ans=0.125 2023-11-19 03:58:28,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=555726.6666666666, ans=0.2 2023-11-19 03:58:29,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=555726.6666666666, ans=0.125 2023-11-19 03:58:30,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=555793.3333333334, ans=0.1 2023-11-19 03:58:36,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=555793.3333333334, ans=0.0 2023-11-19 03:58:37,037 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=555793.3333333334, ans=0.2 2023-11-19 03:58:41,678 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.070e+01 8.552e+01 9.021e+01 1.004e+02 1.285e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 03:58:49,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=555860.0, ans=0.125 2023-11-19 03:58:50,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=555860.0, ans=0.2 2023-11-19 03:58:51,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=555860.0, ans=0.125 2023-11-19 03:58:55,028 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11250, loss[loss=0.08222, simple_loss=0.1009, pruned_loss=0.02066, audio_tagging_loss=0.01109, over 15708.00 frames. ], tot_loss[loss=0.09224, simple_loss=0.1096, pruned_loss=0.02638, audio_tagging_loss=0.01109, over 3053178.82 frames. ], batch size: 59, lr: 9.56e-03, grad_scale: 32.0 2023-11-19 03:58:59,876 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2023-11-19 03:59:00,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=555926.6666666666, ans=0.0 2023-11-19 03:59:12,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=555993.3333333334, ans=0.07 2023-11-19 03:59:43,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=556193.3333333334, ans=0.0 2023-11-19 03:59:50,343 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11300, loss[loss=0.09753, simple_loss=0.1272, pruned_loss=0.02507, audio_tagging_loss=0.008868, over 15277.00 frames. ], tot_loss[loss=0.09238, simple_loss=0.1101, pruned_loss=0.02638, audio_tagging_loss=0.01097, over 3054885.53 frames. ], batch size: 56, lr: 9.56e-03, grad_scale: 32.0 2023-11-19 03:59:54,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=556260.0, ans=0.05 2023-11-19 03:59:55,176 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2023-11-19 03:59:58,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=556260.0, ans=0.1 2023-11-19 03:59:59,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=556260.0, ans=0.0 2023-11-19 04:00:14,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=556393.3333333334, ans=0.125 2023-11-19 04:00:32,262 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 8.772e+01 9.510e+01 1.073e+02 1.316e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 04:00:33,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=556526.6666666666, ans=0.1 2023-11-19 04:00:36,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-11-19 04:00:41,927 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.88 vs. limit=10.0 2023-11-19 04:00:46,037 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11350, loss[loss=0.09594, simple_loss=0.1238, pruned_loss=0.02656, audio_tagging_loss=0.007496, over 15146.00 frames. ], tot_loss[loss=0.09162, simple_loss=0.1093, pruned_loss=0.02615, audio_tagging_loss=0.01084, over 3049202.24 frames. ], batch size: 59, lr: 9.55e-03, grad_scale: 32.0 2023-11-19 04:01:01,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=556660.0, ans=0.2 2023-11-19 04:01:03,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=556660.0, ans=0.0 2023-11-19 04:01:35,848 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.10 vs. limit=22.5 2023-11-19 04:01:41,529 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11400, loss[loss=0.09666, simple_loss=0.1259, pruned_loss=0.02424, audio_tagging_loss=0.009462, over 15473.00 frames. ], tot_loss[loss=0.09193, simple_loss=0.11, pruned_loss=0.02628, audio_tagging_loss=0.01063, over 3046566.88 frames. ], batch size: 56, lr: 9.55e-03, grad_scale: 32.0 2023-11-19 04:01:59,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=556993.3333333334, ans=0.07 2023-11-19 04:02:03,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.66 vs. limit=22.5 2023-11-19 04:02:23,531 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.302e+01 8.621e+01 9.378e+01 1.036e+02 2.217e+02, threshold=1.876e+02, percent-clipped=1.0 2023-11-19 04:02:36,334 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11450, loss[loss=0.1072, simple_loss=0.1241, pruned_loss=0.03715, audio_tagging_loss=0.00795, over 14282.00 frames. ], tot_loss[loss=0.09203, simple_loss=0.1102, pruned_loss=0.02636, audio_tagging_loss=0.01056, over 3049560.27 frames. ], batch size: 55, lr: 9.55e-03, grad_scale: 32.0 2023-11-19 04:02:59,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=557393.3333333334, ans=0.125 2023-11-19 04:03:00,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=557393.3333333334, ans=0.125 2023-11-19 04:03:32,398 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11500, loss[loss=0.1141, simple_loss=0.1366, pruned_loss=0.03455, audio_tagging_loss=0.01125, over 15355.00 frames. ], tot_loss[loss=0.09224, simple_loss=0.1104, pruned_loss=0.02651, audio_tagging_loss=0.01054, over 3050233.03 frames. ], batch size: 55, lr: 9.55e-03, grad_scale: 16.0 2023-11-19 04:03:42,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=557660.0, ans=0.0 2023-11-19 04:03:48,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.85 vs. limit=6.0 2023-11-19 04:03:57,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.63 vs. limit=22.5 2023-11-19 04:04:08,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.39 vs. limit=22.5 2023-11-19 04:04:13,454 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=557793.3333333334, ans=0.025 2023-11-19 04:04:13,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=557793.3333333334, ans=0.125 2023-11-19 04:04:15,905 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.885e+01 8.787e+01 9.889e+01 1.125e+02 1.791e+02, threshold=1.978e+02, percent-clipped=0.0 2023-11-19 04:04:18,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.94 vs. limit=15.0 2023-11-19 04:04:28,844 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11550, loss[loss=0.08899, simple_loss=0.09444, pruned_loss=0.02629, audio_tagging_loss=0.01548, over 14262.00 frames. ], tot_loss[loss=0.09185, simple_loss=0.1098, pruned_loss=0.02633, audio_tagging_loss=0.01063, over 3051061.26 frames. ], batch size: 55, lr: 9.54e-03, grad_scale: 16.0 2023-11-19 04:04:43,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=557993.3333333334, ans=0.2 2023-11-19 04:05:00,680 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:05:02,032 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:05:05,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=558126.6666666666, ans=0.125 2023-11-19 04:05:13,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=558193.3333333334, ans=0.1 2023-11-19 04:05:18,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=558193.3333333334, ans=0.125 2023-11-19 04:05:19,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=558193.3333333334, ans=0.125 2023-11-19 04:05:21,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=558193.3333333334, ans=0.1 2023-11-19 04:05:23,631 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11600, loss[loss=0.114, simple_loss=0.1333, pruned_loss=0.03589, audio_tagging_loss=0.01142, over 14074.00 frames. ], tot_loss[loss=0.09235, simple_loss=0.1104, pruned_loss=0.02652, audio_tagging_loss=0.01064, over 3048971.77 frames. ], batch size: 53, lr: 9.54e-03, grad_scale: 32.0 2023-11-19 04:06:06,277 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:06:07,037 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.676e+01 9.320e+01 1.048e+02 1.345e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-19 04:06:11,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=558526.6666666666, ans=0.125 2023-11-19 04:06:13,914 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.94 vs. limit=10.0 2023-11-19 04:06:18,601 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11650, loss[loss=0.0843, simple_loss=0.1019, pruned_loss=0.02351, audio_tagging_loss=0.009836, over 15276.00 frames. ], tot_loss[loss=0.0927, simple_loss=0.1106, pruned_loss=0.02677, audio_tagging_loss=0.01065, over 3043562.13 frames. ], batch size: 58, lr: 9.54e-03, grad_scale: 32.0 2023-11-19 04:06:20,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=558593.3333333334, ans=0.1 2023-11-19 04:06:29,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=558660.0, ans=0.125 2023-11-19 04:06:34,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=558660.0, ans=0.125 2023-11-19 04:06:42,221 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=15.0 2023-11-19 04:06:51,702 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=558793.3333333334, ans=0.1 2023-11-19 04:06:59,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=558793.3333333334, ans=0.0 2023-11-19 04:07:14,580 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11700, loss[loss=0.1025, simple_loss=0.1251, pruned_loss=0.02946, audio_tagging_loss=0.01049, over 14800.00 frames. ], tot_loss[loss=0.09279, simple_loss=0.1105, pruned_loss=0.02681, audio_tagging_loss=0.01073, over 3041835.02 frames. ], batch size: 56, lr: 9.53e-03, grad_scale: 32.0 2023-11-19 04:07:14,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=558926.6666666666, ans=0.0 2023-11-19 04:07:16,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=558926.6666666666, ans=0.125 2023-11-19 04:07:20,083 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=558926.6666666666, ans=0.125 2023-11-19 04:07:22,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=558926.6666666666, ans=0.1 2023-11-19 04:07:45,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=559060.0, ans=0.1 2023-11-19 04:07:47,097 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2023-11-19 04:07:57,451 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.729e+01 8.689e+01 9.449e+01 1.084e+02 2.126e+02, threshold=1.890e+02, percent-clipped=1.0 2023-11-19 04:07:58,016 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.37 vs. limit=22.5 2023-11-19 04:08:07,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=559193.3333333334, ans=0.125 2023-11-19 04:08:09,616 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11750, loss[loss=0.1143, simple_loss=0.1363, pruned_loss=0.03422, audio_tagging_loss=0.0119, over 15263.00 frames. ], tot_loss[loss=0.09339, simple_loss=0.1113, pruned_loss=0.02703, audio_tagging_loss=0.01071, over 3045418.06 frames. ], batch size: 58, lr: 9.53e-03, grad_scale: 32.0 2023-11-19 04:08:30,462 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:08:35,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=559393.3333333334, ans=0.95 2023-11-19 04:08:39,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=559393.3333333334, ans=0.2 2023-11-19 04:08:40,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=559393.3333333334, ans=0.2 2023-11-19 04:08:40,443 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=559393.3333333334, ans=0.125 2023-11-19 04:08:56,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=559526.6666666666, ans=0.1 2023-11-19 04:09:03,944 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11800, loss[loss=0.1455, simple_loss=0.1754, pruned_loss=0.04807, audio_tagging_loss=0.00969, over 16018.00 frames. ], tot_loss[loss=0.09348, simple_loss=0.1112, pruned_loss=0.0271, audio_tagging_loss=0.01078, over 3043878.41 frames. ], batch size: 58, lr: 9.53e-03, grad_scale: 32.0 2023-11-19 04:09:16,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=559660.0, ans=0.125 2023-11-19 04:09:17,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=559660.0, ans=0.2 2023-11-19 04:09:27,332 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:09:44,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=559793.3333333334, ans=0.1 2023-11-19 04:09:46,602 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.768e+01 9.397e+01 1.015e+02 1.463e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 04:09:59,775 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11850, loss[loss=0.08815, simple_loss=0.103, pruned_loss=0.02374, audio_tagging_loss=0.01289, over 15882.00 frames. ], tot_loss[loss=0.0934, simple_loss=0.1109, pruned_loss=0.02701, audio_tagging_loss=0.01093, over 3034953.43 frames. ], batch size: 59, lr: 9.53e-03, grad_scale: 32.0 2023-11-19 04:10:14,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=559993.3333333334, ans=0.125 2023-11-19 04:10:51,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.50 vs. limit=22.5 2023-11-19 04:10:56,859 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11900, loss[loss=0.07381, simple_loss=0.09003, pruned_loss=0.01826, audio_tagging_loss=0.01054, over 13977.00 frames. ], tot_loss[loss=0.09335, simple_loss=0.1109, pruned_loss=0.02687, audio_tagging_loss=0.011, over 3040336.51 frames. ], batch size: 54, lr: 9.52e-03, grad_scale: 32.0 2023-11-19 04:11:10,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.85 vs. limit=10.0 2023-11-19 04:11:16,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=560326.6666666666, ans=0.0 2023-11-19 04:11:24,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=560393.3333333334, ans=0.125 2023-11-19 04:11:36,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=560460.0, ans=0.0 2023-11-19 04:11:37,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=560460.0, ans=0.125 2023-11-19 04:11:40,153 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.784e+01 9.525e+01 1.032e+02 1.390e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-19 04:11:40,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=560526.6666666666, ans=0.125 2023-11-19 04:11:41,455 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=560526.6666666666, ans=0.09899494936611666 2023-11-19 04:11:43,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=560526.6666666666, ans=0.125 2023-11-19 04:11:52,354 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 11950, loss[loss=0.06614, simple_loss=0.07947, pruned_loss=0.01635, audio_tagging_loss=0.01006, over 14892.00 frames. ], tot_loss[loss=0.09184, simple_loss=0.1089, pruned_loss=0.02628, audio_tagging_loss=0.0111, over 3039069.06 frames. ], batch size: 60, lr: 9.52e-03, grad_scale: 32.0 2023-11-19 04:12:01,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=560593.3333333334, ans=0.035 2023-11-19 04:12:05,821 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=22.5 2023-11-19 04:12:13,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=560660.0, ans=0.1 2023-11-19 04:12:18,453 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2023-11-19 04:12:19,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=22.5 2023-11-19 04:12:20,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=560726.6666666666, ans=0.125 2023-11-19 04:12:46,244 INFO [train_asr.py:1115] (1/4) Epoch 7, batch 12000, loss[loss=0.06691, simple_loss=0.07899, pruned_loss=0.01668, audio_tagging_loss=0.01074, over 14880.00 frames. ], tot_loss[loss=0.09315, simple_loss=0.1104, pruned_loss=0.02686, audio_tagging_loss=0.01109, over 3047507.69 frames. ], batch size: 55, lr: 9.52e-03, grad_scale: 32.0 2023-11-19 04:12:46,244 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 04:13:06,927 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([0.9094, 3.5651, 2.7724, 2.8951, 3.7766, 3.6649, 3.0546, 3.6991], device='cuda:1') 2023-11-19 04:13:08,341 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7091, 5.7494, 5.8356, 5.8796], device='cuda:1') 2023-11-19 04:13:19,245 INFO [train_asr.py:1147] (1/4) Epoch 7, validation: loss=0.0682, simple_loss=0.05751, pruned_loss=0.007422, audio_tagging_loss=0.03202, over 4681554.00 frames. 2023-11-19 04:13:19,246 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 04:13:29,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=560993.3333333334, ans=0.125 2023-11-19 04:13:34,681 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=560993.3333333334, ans=0.2 2023-11-19 04:13:35,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560993.3333333334, ans=0.1 2023-11-19 04:13:39,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=561060.0, ans=0.125 2023-11-19 04:14:20,117 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 0, loss[loss=0.1002, simple_loss=0.09024, pruned_loss=0.02337, audio_tagging_loss=0.0317, over 14142.00 frames. ], tot_loss[loss=0.1002, simple_loss=0.09024, pruned_loss=0.02337, audio_tagging_loss=0.0317, over 14142.00 frames. ], batch size: 57, lr: 8.97e-03, grad_scale: 32.0 2023-11-19 04:14:20,118 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 04:14:44,548 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8552, 5.0002, 4.8790, 4.9804], device='cuda:1') 2023-11-19 04:14:45,806 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5711, 2.5637, 3.8297, 2.9711], device='cuda:1') 2023-11-19 04:14:51,782 INFO [train_asr.py:1147] (1/4) Epoch 8, validation: loss=0.06722, simple_loss=0.05736, pruned_loss=0.007334, audio_tagging_loss=0.0312, over 4681554.00 frames. 2023-11-19 04:14:51,783 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 04:15:10,723 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 9.365e+01 1.076e+02 1.160e+02 2.715e+02, threshold=2.151e+02, percent-clipped=1.0 2023-11-19 04:15:23,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=561213.3333333334, ans=0.035 2023-11-19 04:15:32,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=561280.0, ans=0.0 2023-11-19 04:15:47,580 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 50, loss[loss=0.1081, simple_loss=0.1218, pruned_loss=0.02906, audio_tagging_loss=0.01816, over 14741.00 frames. ], tot_loss[loss=0.1044, simple_loss=0.1124, pruned_loss=0.02748, audio_tagging_loss=0.02074, over 691903.62 frames. ], batch size: 54, lr: 8.96e-03, grad_scale: 32.0 2023-11-19 04:15:47,994 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=15.0 2023-11-19 04:15:59,902 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=15.0 2023-11-19 04:16:30,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=561613.3333333334, ans=0.0 2023-11-19 04:16:43,669 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 100, loss[loss=0.06521, simple_loss=0.07067, pruned_loss=0.01167, audio_tagging_loss=0.01821, over 14921.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1111, pruned_loss=0.02637, audio_tagging_loss=0.01983, over 1212539.34 frames. ], batch size: 58, lr: 8.96e-03, grad_scale: 32.0 2023-11-19 04:17:02,330 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.971e+01 9.025e+01 9.629e+01 1.101e+02 1.552e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-19 04:17:07,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=561880.0, ans=0.125 2023-11-19 04:17:14,418 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=561880.0, ans=0.125 2023-11-19 04:17:15,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=561880.0, ans=0.05 2023-11-19 04:17:33,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=562013.3333333334, ans=0.1 2023-11-19 04:17:39,011 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 150, loss[loss=0.09213, simple_loss=0.109, pruned_loss=0.02518, audio_tagging_loss=0.01243, over 14918.00 frames. ], tot_loss[loss=0.09787, simple_loss=0.1088, pruned_loss=0.02554, audio_tagging_loss=0.01793, over 1628331.70 frames. ], batch size: 54, lr: 8.96e-03, grad_scale: 32.0 2023-11-19 04:17:40,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.53 vs. limit=22.5 2023-11-19 04:17:50,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=562146.6666666666, ans=0.125 2023-11-19 04:18:14,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=562280.0, ans=0.125 2023-11-19 04:18:31,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=562346.6666666666, ans=0.1 2023-11-19 04:18:35,264 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 200, loss[loss=0.08409, simple_loss=0.08784, pruned_loss=0.02307, audio_tagging_loss=0.0171, over 14928.00 frames. ], tot_loss[loss=0.0956, simple_loss=0.1084, pruned_loss=0.02537, audio_tagging_loss=0.01602, over 1939380.21 frames. ], batch size: 57, lr: 8.96e-03, grad_scale: 32.0 2023-11-19 04:18:37,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=562413.3333333334, ans=0.2 2023-11-19 04:18:38,077 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=15.0 2023-11-19 04:18:38,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=562413.3333333334, ans=0.125 2023-11-19 04:18:52,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=562480.0, ans=0.0 2023-11-19 04:18:52,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2023-11-19 04:18:54,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.820e+01 8.627e+01 9.281e+01 9.933e+01 1.355e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-19 04:19:23,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.61 vs. limit=22.5 2023-11-19 04:19:29,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=562680.0, ans=0.0 2023-11-19 04:19:31,610 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 250, loss[loss=0.08296, simple_loss=0.09699, pruned_loss=0.0231, audio_tagging_loss=0.01137, over 14469.00 frames. ], tot_loss[loss=0.09455, simple_loss=0.1084, pruned_loss=0.02583, audio_tagging_loss=0.01454, over 2179753.26 frames. ], batch size: 55, lr: 8.95e-03, grad_scale: 32.0 2023-11-19 04:19:34,324 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.88 vs. limit=15.0 2023-11-19 04:19:40,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=562746.6666666666, ans=0.0 2023-11-19 04:20:07,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=562946.6666666666, ans=0.1 2023-11-19 04:20:26,688 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 300, loss[loss=0.07652, simple_loss=0.09272, pruned_loss=0.02015, audio_tagging_loss=0.01002, over 14413.00 frames. ], tot_loss[loss=0.09347, simple_loss=0.1084, pruned_loss=0.0258, audio_tagging_loss=0.01348, over 2367667.81 frames. ], batch size: 54, lr: 8.95e-03, grad_scale: 32.0 2023-11-19 04:20:38,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563146.6666666666, ans=0.1 2023-11-19 04:20:44,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=563146.6666666666, ans=0.0 2023-11-19 04:20:45,702 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.292e+01 8.672e+01 9.179e+01 1.018e+02 1.268e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-19 04:20:54,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.93 vs. limit=15.0 2023-11-19 04:20:57,615 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:20:58,865 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:21:02,273 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-11-19 04:21:22,026 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 350, loss[loss=0.103, simple_loss=0.1188, pruned_loss=0.03159, audio_tagging_loss=0.01199, over 14469.00 frames. ], tot_loss[loss=0.09343, simple_loss=0.109, pruned_loss=0.02617, audio_tagging_loss=0.01274, over 2525245.49 frames. ], batch size: 53, lr: 8.95e-03, grad_scale: 32.0 2023-11-19 04:21:31,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=563413.3333333334, ans=10.0 2023-11-19 04:21:31,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=563413.3333333334, ans=0.125 2023-11-19 04:22:01,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=563613.3333333334, ans=0.2 2023-11-19 04:22:09,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=563680.0, ans=0.0 2023-11-19 04:22:17,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=563746.6666666666, ans=0.125 2023-11-19 04:22:18,684 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 400, loss[loss=0.09327, simple_loss=0.1145, pruned_loss=0.02553, audio_tagging_loss=0.01049, over 15150.00 frames. ], tot_loss[loss=0.0928, simple_loss=0.1093, pruned_loss=0.02597, audio_tagging_loss=0.01217, over 2646836.18 frames. ], batch size: 56, lr: 8.94e-03, grad_scale: 32.0 2023-11-19 04:22:21,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=563746.6666666666, ans=0.1 2023-11-19 04:22:21,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=563746.6666666666, ans=0.125 2023-11-19 04:22:32,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=563813.3333333334, ans=0.125 2023-11-19 04:22:34,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=563813.3333333334, ans=0.0 2023-11-19 04:22:36,770 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.154e+01 8.502e+01 9.440e+01 1.057e+02 1.683e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-19 04:23:12,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=564080.0, ans=0.125 2023-11-19 04:23:13,480 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 450, loss[loss=0.1033, simple_loss=0.1168, pruned_loss=0.034, audio_tagging_loss=0.01094, over 14919.00 frames. ], tot_loss[loss=0.09215, simple_loss=0.1086, pruned_loss=0.02599, audio_tagging_loss=0.01184, over 2732499.90 frames. ], batch size: 57, lr: 8.94e-03, grad_scale: 32.0 2023-11-19 04:23:13,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=564080.0, ans=0.0 2023-11-19 04:23:14,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=564080.0, ans=0.0 2023-11-19 04:23:15,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.62 vs. limit=15.0 2023-11-19 04:23:27,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=564146.6666666666, ans=0.0 2023-11-19 04:23:35,697 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=12.0 2023-11-19 04:23:48,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=564280.0, ans=0.1 2023-11-19 04:23:55,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=564280.0, ans=0.1 2023-11-19 04:24:01,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.00 vs. limit=22.5 2023-11-19 04:24:08,561 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 500, loss[loss=0.07721, simple_loss=0.08397, pruned_loss=0.02141, audio_tagging_loss=0.01381, over 13375.00 frames. ], tot_loss[loss=0.09163, simple_loss=0.1083, pruned_loss=0.02599, audio_tagging_loss=0.0115, over 2801210.86 frames. ], batch size: 52, lr: 8.94e-03, grad_scale: 32.0 2023-11-19 04:24:13,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=564413.3333333334, ans=0.2 2023-11-19 04:24:14,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=564413.3333333334, ans=0.0 2023-11-19 04:24:17,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=564413.3333333334, ans=0.035 2023-11-19 04:24:28,816 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 8.601e+01 9.237e+01 1.002e+02 1.241e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 04:24:43,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=564613.3333333334, ans=0.1 2023-11-19 04:24:47,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=564613.3333333334, ans=0.125 2023-11-19 04:24:59,889 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=22.5 2023-11-19 04:25:03,009 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=564680.0, ans=0.125 2023-11-19 04:25:04,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=564746.6666666666, ans=0.125 2023-11-19 04:25:04,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=564746.6666666666, ans=0.1 2023-11-19 04:25:04,836 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 550, loss[loss=0.09994, simple_loss=0.1296, pruned_loss=0.02517, audio_tagging_loss=0.009955, over 16027.00 frames. ], tot_loss[loss=0.09176, simple_loss=0.1086, pruned_loss=0.0261, audio_tagging_loss=0.01135, over 2858227.30 frames. ], batch size: 59, lr: 8.94e-03, grad_scale: 32.0 2023-11-19 04:25:26,062 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.32 vs. limit=22.5 2023-11-19 04:25:56,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=565013.3333333334, ans=0.0 2023-11-19 04:26:00,651 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 600, loss[loss=0.08584, simple_loss=0.1013, pruned_loss=0.02315, audio_tagging_loss=0.01206, over 14618.00 frames. ], tot_loss[loss=0.09221, simple_loss=0.1094, pruned_loss=0.0262, audio_tagging_loss=0.0113, over 2901338.69 frames. ], batch size: 56, lr: 8.93e-03, grad_scale: 32.0 2023-11-19 04:26:16,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=565146.6666666666, ans=0.0 2023-11-19 04:26:18,642 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.289e+01 8.437e+01 9.383e+01 9.998e+01 1.583e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-19 04:26:22,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=565213.3333333334, ans=0.125 2023-11-19 04:26:38,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.82 vs. limit=5.0 2023-11-19 04:26:43,901 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.02 vs. limit=15.0 2023-11-19 04:26:46,632 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=565346.6666666666, ans=0.09899494936611666 2023-11-19 04:26:56,077 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 650, loss[loss=0.09745, simple_loss=0.1228, pruned_loss=0.0257, audio_tagging_loss=0.01034, over 15151.00 frames. ], tot_loss[loss=0.0921, simple_loss=0.1092, pruned_loss=0.02631, audio_tagging_loss=0.01118, over 2934147.47 frames. ], batch size: 57, lr: 8.93e-03, grad_scale: 32.0 2023-11-19 04:27:05,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=565413.3333333334, ans=0.0 2023-11-19 04:27:12,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=565480.0, ans=0.1 2023-11-19 04:27:23,895 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=565546.6666666666, ans=0.125 2023-11-19 04:27:32,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.74 vs. limit=22.5 2023-11-19 04:27:36,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=565613.3333333334, ans=0.125 2023-11-19 04:27:52,225 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 700, loss[loss=0.0969, simple_loss=0.1254, pruned_loss=0.02613, audio_tagging_loss=0.008081, over 15033.00 frames. ], tot_loss[loss=0.0923, simple_loss=0.1095, pruned_loss=0.02641, audio_tagging_loss=0.01115, over 2962017.16 frames. ], batch size: 55, lr: 8.93e-03, grad_scale: 32.0 2023-11-19 04:28:05,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=15.0 2023-11-19 04:28:10,741 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.560e+01 9.295e+01 1.024e+02 1.604e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 04:28:26,285 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=15.0 2023-11-19 04:28:40,893 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2023-11-19 04:28:45,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=566013.3333333334, ans=0.2 2023-11-19 04:28:47,739 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 750, loss[loss=0.0726, simple_loss=0.08551, pruned_loss=0.02045, audio_tagging_loss=0.009395, over 15114.00 frames. ], tot_loss[loss=0.09264, simple_loss=0.1101, pruned_loss=0.0265, audio_tagging_loss=0.01107, over 2982175.97 frames. ], batch size: 56, lr: 8.93e-03, grad_scale: 16.0 2023-11-19 04:28:57,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=566146.6666666666, ans=0.125 2023-11-19 04:29:09,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=566213.3333333334, ans=0.04949747468305833 2023-11-19 04:29:42,386 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 800, loss[loss=0.08131, simple_loss=0.09694, pruned_loss=0.02214, audio_tagging_loss=0.0107, over 15454.00 frames. ], tot_loss[loss=0.09271, simple_loss=0.1098, pruned_loss=0.0266, audio_tagging_loss=0.01121, over 2993661.11 frames. ], batch size: 58, lr: 8.92e-03, grad_scale: 32.0 2023-11-19 04:29:49,070 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.51 vs. limit=15.0 2023-11-19 04:29:55,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=566480.0, ans=0.0 2023-11-19 04:30:02,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.557e+01 9.435e+01 1.048e+02 1.522e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-19 04:30:15,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=566613.3333333334, ans=0.2 2023-11-19 04:30:38,428 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 850, loss[loss=0.09975, simple_loss=0.12, pruned_loss=0.02986, audio_tagging_loss=0.009892, over 14827.00 frames. ], tot_loss[loss=0.09238, simple_loss=0.1095, pruned_loss=0.02637, audio_tagging_loss=0.01125, over 3003537.37 frames. ], batch size: 55, lr: 8.92e-03, grad_scale: 32.0 2023-11-19 04:30:38,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=566746.6666666666, ans=0.1 2023-11-19 04:31:03,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=566880.0, ans=0.125 2023-11-19 04:31:03,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=566880.0, ans=0.125 2023-11-19 04:31:08,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=566880.0, ans=0.025 2023-11-19 04:31:25,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=567013.3333333334, ans=0.125 2023-11-19 04:31:31,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.76 vs. limit=6.0 2023-11-19 04:31:34,306 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 900, loss[loss=0.07494, simple_loss=0.1022, pruned_loss=0.0165, audio_tagging_loss=0.007348, over 15522.00 frames. ], tot_loss[loss=0.09167, simple_loss=0.1085, pruned_loss=0.0261, audio_tagging_loss=0.01135, over 3014006.62 frames. ], batch size: 57, lr: 8.92e-03, grad_scale: 32.0 2023-11-19 04:31:36,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=567080.0, ans=0.125 2023-11-19 04:31:36,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=567080.0, ans=0.125 2023-11-19 04:31:50,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=567146.6666666666, ans=0.1 2023-11-19 04:31:53,696 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.468e+01 9.134e+01 1.003e+02 1.510e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 04:31:56,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=567213.3333333334, ans=0.05 2023-11-19 04:31:59,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=567213.3333333334, ans=0.0 2023-11-19 04:32:16,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.82 vs. limit=15.0 2023-11-19 04:32:22,451 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=567346.6666666666, ans=0.0 2023-11-19 04:32:22,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=567346.6666666666, ans=0.125 2023-11-19 04:32:29,739 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 950, loss[loss=0.07961, simple_loss=0.09957, pruned_loss=0.02327, audio_tagging_loss=0.006553, over 14339.00 frames. ], tot_loss[loss=0.09196, simple_loss=0.109, pruned_loss=0.02629, audio_tagging_loss=0.01117, over 3020822.58 frames. ], batch size: 54, lr: 8.92e-03, grad_scale: 32.0 2023-11-19 04:32:36,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=567413.3333333334, ans=0.2 2023-11-19 04:32:45,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.24 vs. limit=15.0 2023-11-19 04:32:45,242 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.34 vs. limit=10.0 2023-11-19 04:32:48,271 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.24 vs. limit=22.5 2023-11-19 04:32:53,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=567546.6666666666, ans=0.125 2023-11-19 04:32:54,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=567546.6666666666, ans=0.1 2023-11-19 04:33:25,107 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1000, loss[loss=0.08619, simple_loss=0.09816, pruned_loss=0.02652, audio_tagging_loss=0.01059, over 15665.00 frames. ], tot_loss[loss=0.09098, simple_loss=0.1077, pruned_loss=0.02604, audio_tagging_loss=0.01111, over 3022651.12 frames. ], batch size: 60, lr: 8.91e-03, grad_scale: 16.0 2023-11-19 04:33:26,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2023-11-19 04:33:35,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=567746.6666666666, ans=0.0 2023-11-19 04:33:36,052 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:33:47,007 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.670e+01 9.529e+01 1.041e+02 1.429e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-19 04:33:49,201 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:33:49,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=567880.0, ans=0.125 2023-11-19 04:33:53,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=567880.0, ans=0.125 2023-11-19 04:33:55,900 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2023-11-19 04:34:08,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=567946.6666666666, ans=0.1 2023-11-19 04:34:21,668 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1050, loss[loss=0.09783, simple_loss=0.1152, pruned_loss=0.02704, audio_tagging_loss=0.01318, over 16313.00 frames. ], tot_loss[loss=0.09087, simple_loss=0.1077, pruned_loss=0.02602, audio_tagging_loss=0.01098, over 3024602.04 frames. ], batch size: 62, lr: 8.91e-03, grad_scale: 16.0 2023-11-19 04:34:30,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=568080.0, ans=0.5 2023-11-19 04:34:37,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=568146.6666666666, ans=0.0 2023-11-19 04:34:40,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=568146.6666666666, ans=0.125 2023-11-19 04:34:51,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=568213.3333333334, ans=0.0 2023-11-19 04:34:53,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=568280.0, ans=0.125 2023-11-19 04:35:05,412 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.59 vs. limit=22.5 2023-11-19 04:35:17,062 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1100, loss[loss=0.1057, simple_loss=0.1301, pruned_loss=0.02742, audio_tagging_loss=0.01321, over 15041.00 frames. ], tot_loss[loss=0.09213, simple_loss=0.1095, pruned_loss=0.0265, audio_tagging_loss=0.01088, over 3031004.79 frames. ], batch size: 56, lr: 8.91e-03, grad_scale: 16.0 2023-11-19 04:35:19,230 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:35:20,926 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.94 vs. limit=10.0 2023-11-19 04:35:24,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=568413.3333333334, ans=0.2 2023-11-19 04:35:38,081 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.818e+01 9.664e+01 1.074e+02 1.667e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-19 04:35:43,201 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=568546.6666666666, ans=0.125 2023-11-19 04:35:46,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=568546.6666666666, ans=0.125 2023-11-19 04:35:56,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=568613.3333333334, ans=0.0 2023-11-19 04:36:10,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=568680.0, ans=0.1 2023-11-19 04:36:12,553 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1150, loss[loss=0.1005, simple_loss=0.1202, pruned_loss=0.03105, audio_tagging_loss=0.009362, over 16128.00 frames. ], tot_loss[loss=0.09256, simple_loss=0.1101, pruned_loss=0.02659, audio_tagging_loss=0.0109, over 3036221.93 frames. ], batch size: 58, lr: 8.91e-03, grad_scale: 16.0 2023-11-19 04:36:35,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=568880.0, ans=0.125 2023-11-19 04:36:40,848 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=568880.0, ans=0.1 2023-11-19 04:36:48,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2023-11-19 04:36:56,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=569013.3333333334, ans=0.125 2023-11-19 04:37:08,074 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:37:08,816 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1200, loss[loss=0.1191, simple_loss=0.1438, pruned_loss=0.03985, audio_tagging_loss=0.007347, over 15229.00 frames. ], tot_loss[loss=0.09184, simple_loss=0.1093, pruned_loss=0.02634, audio_tagging_loss=0.01083, over 3031388.54 frames. ], batch size: 58, lr: 8.90e-03, grad_scale: 32.0 2023-11-19 04:37:16,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=569080.0, ans=0.05 2023-11-19 04:37:29,360 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.648e+01 9.273e+01 1.051e+02 1.425e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-19 04:37:48,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569280.0, ans=0.1 2023-11-19 04:37:55,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=569346.6666666666, ans=0.125 2023-11-19 04:37:58,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=569346.6666666666, ans=0.0 2023-11-19 04:38:04,296 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1250, loss[loss=0.09674, simple_loss=0.1176, pruned_loss=0.02413, audio_tagging_loss=0.01383, over 14614.00 frames. ], tot_loss[loss=0.09179, simple_loss=0.1093, pruned_loss=0.02629, audio_tagging_loss=0.01086, over 3037450.80 frames. ], batch size: 54, lr: 8.90e-03, grad_scale: 32.0 2023-11-19 04:38:13,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=569413.3333333334, ans=0.0 2023-11-19 04:38:19,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=569480.0, ans=0.2 2023-11-19 04:38:43,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=569613.3333333334, ans=0.07 2023-11-19 04:38:51,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2023-11-19 04:38:58,075 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=569680.0, ans=0.125 2023-11-19 04:38:59,915 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1300, loss[loss=0.08523, simple_loss=0.1042, pruned_loss=0.02331, audio_tagging_loss=0.009817, over 14966.00 frames. ], tot_loss[loss=0.09171, simple_loss=0.1093, pruned_loss=0.02623, audio_tagging_loss=0.01085, over 3048751.01 frames. ], batch size: 56, lr: 8.90e-03, grad_scale: 32.0 2023-11-19 04:39:09,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.95 vs. limit=22.5 2023-11-19 04:39:15,594 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:39:18,082 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-11-19 04:39:19,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=569813.3333333334, ans=0.0 2023-11-19 04:39:21,656 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.412e+01 8.385e+01 9.003e+01 9.844e+01 1.320e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-19 04:39:42,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=569946.6666666666, ans=0.0 2023-11-19 04:39:48,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=570013.3333333334, ans=0.125 2023-11-19 04:39:50,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=570013.3333333334, ans=0.0 2023-11-19 04:39:56,372 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1350, loss[loss=0.1104, simple_loss=0.135, pruned_loss=0.03351, audio_tagging_loss=0.009386, over 16409.00 frames. ], tot_loss[loss=0.09268, simple_loss=0.1104, pruned_loss=0.02663, audio_tagging_loss=0.01083, over 3051953.28 frames. ], batch size: 61, lr: 8.90e-03, grad_scale: 32.0 2023-11-19 04:39:58,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2023-11-19 04:40:23,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=570213.3333333334, ans=0.0 2023-11-19 04:40:32,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=570280.0, ans=0.025 2023-11-19 04:40:33,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570280.0, ans=0.1 2023-11-19 04:40:36,364 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:40:51,808 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1400, loss[loss=0.0877, simple_loss=0.1112, pruned_loss=0.02108, audio_tagging_loss=0.01102, over 15364.00 frames. ], tot_loss[loss=0.092, simple_loss=0.1094, pruned_loss=0.02648, audio_tagging_loss=0.01082, over 3047157.13 frames. ], batch size: 56, lr: 8.89e-03, grad_scale: 32.0 2023-11-19 04:41:04,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=570480.0, ans=0.0 2023-11-19 04:41:04,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=570480.0, ans=0.0 2023-11-19 04:41:13,425 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.757e+01 9.593e+01 1.066e+02 1.571e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-19 04:41:15,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=570546.6666666666, ans=0.0 2023-11-19 04:41:17,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=570546.6666666666, ans=0.025 2023-11-19 04:41:25,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=570613.3333333334, ans=0.2 2023-11-19 04:41:32,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.43 vs. limit=22.5 2023-11-19 04:41:33,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=570613.3333333334, ans=0.09899494936611666 2023-11-19 04:41:35,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=570680.0, ans=0.035 2023-11-19 04:41:42,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.03 vs. limit=10.0 2023-11-19 04:41:43,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.30 vs. limit=15.0 2023-11-19 04:41:47,486 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1450, loss[loss=0.08948, simple_loss=0.1108, pruned_loss=0.02416, audio_tagging_loss=0.009922, over 14717.00 frames. ], tot_loss[loss=0.09117, simple_loss=0.1084, pruned_loss=0.02614, audio_tagging_loss=0.01084, over 3043284.27 frames. ], batch size: 55, lr: 8.89e-03, grad_scale: 32.0 2023-11-19 04:41:48,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=570746.6666666666, ans=0.5 2023-11-19 04:41:49,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=570746.6666666666, ans=0.0 2023-11-19 04:42:00,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-19 04:42:25,170 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=570946.6666666666, ans=0.1 2023-11-19 04:42:43,691 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1500, loss[loss=0.1036, simple_loss=0.1347, pruned_loss=0.02768, audio_tagging_loss=0.008564, over 15290.00 frames. ], tot_loss[loss=0.09142, simple_loss=0.1086, pruned_loss=0.02618, audio_tagging_loss=0.01093, over 3041419.04 frames. ], batch size: 56, lr: 8.89e-03, grad_scale: 32.0 2023-11-19 04:42:46,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.31 vs. limit=12.0 2023-11-19 04:43:01,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=571146.6666666666, ans=0.1 2023-11-19 04:43:02,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=22.5 2023-11-19 04:43:04,349 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.390e+01 9.200e+01 9.780e+01 1.571e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-19 04:43:10,950 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.141e-01 2023-11-19 04:43:24,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=12.0 2023-11-19 04:43:35,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=571346.6666666666, ans=0.125 2023-11-19 04:43:37,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2023-11-19 04:43:39,283 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1550, loss[loss=0.09536, simple_loss=0.1192, pruned_loss=0.02606, audio_tagging_loss=0.009686, over 15851.00 frames. ], tot_loss[loss=0.09168, simple_loss=0.1087, pruned_loss=0.02622, audio_tagging_loss=0.01111, over 3035638.51 frames. ], batch size: 56, lr: 8.88e-03, grad_scale: 32.0 2023-11-19 04:43:40,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=571413.3333333334, ans=0.125 2023-11-19 04:43:45,115 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.36 vs. limit=22.5 2023-11-19 04:43:50,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=571480.0, ans=0.125 2023-11-19 04:44:00,178 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.14 vs. limit=22.5 2023-11-19 04:44:07,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=571546.6666666666, ans=0.125 2023-11-19 04:44:19,247 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=12.0 2023-11-19 04:44:22,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=571613.3333333334, ans=0.2 2023-11-19 04:44:34,456 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1600, loss[loss=0.07024, simple_loss=0.08618, pruned_loss=0.01596, audio_tagging_loss=0.01119, over 15116.00 frames. ], tot_loss[loss=0.09232, simple_loss=0.1094, pruned_loss=0.02638, audio_tagging_loss=0.01123, over 3037303.79 frames. ], batch size: 57, lr: 8.88e-03, grad_scale: 32.0 2023-11-19 04:44:40,859 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.13 vs. limit=6.0 2023-11-19 04:44:45,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=571813.3333333334, ans=0.0 2023-11-19 04:44:56,106 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.975e+01 9.863e+01 1.094e+02 1.850e+02, threshold=1.973e+02, percent-clipped=1.0 2023-11-19 04:45:04,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=571880.0, ans=0.0 2023-11-19 04:45:09,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=571946.6666666666, ans=0.2 2023-11-19 04:45:12,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=571946.6666666666, ans=0.125 2023-11-19 04:45:31,000 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1650, loss[loss=0.09753, simple_loss=0.1287, pruned_loss=0.02613, audio_tagging_loss=0.007028, over 14537.00 frames. ], tot_loss[loss=0.09103, simple_loss=0.1079, pruned_loss=0.02586, audio_tagging_loss=0.01124, over 3040686.55 frames. ], batch size: 53, lr: 8.88e-03, grad_scale: 32.0 2023-11-19 04:45:32,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=572080.0, ans=0.125 2023-11-19 04:45:46,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=12.0 2023-11-19 04:45:51,401 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2023-11-19 04:45:59,486 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=572213.3333333334, ans=0.0 2023-11-19 04:46:17,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=572346.6666666666, ans=0.05 2023-11-19 04:46:22,227 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=572346.6666666666, ans=22.5 2023-11-19 04:46:26,831 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1700, loss[loss=0.06665, simple_loss=0.07479, pruned_loss=0.01854, audio_tagging_loss=0.01071, over 15757.00 frames. ], tot_loss[loss=0.09104, simple_loss=0.108, pruned_loss=0.02582, audio_tagging_loss=0.01122, over 3041961.64 frames. ], batch size: 62, lr: 8.88e-03, grad_scale: 32.0 2023-11-19 04:46:26,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=572413.3333333334, ans=0.0 2023-11-19 04:46:35,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=572413.3333333334, ans=0.125 2023-11-19 04:46:41,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.88 vs. limit=15.0 2023-11-19 04:46:47,239 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.408e+01 9.171e+01 1.022e+02 1.332e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-19 04:46:52,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=572546.6666666666, ans=0.0 2023-11-19 04:47:21,114 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1750, loss[loss=0.06537, simple_loss=0.07148, pruned_loss=0.01604, audio_tagging_loss=0.01359, over 15558.00 frames. ], tot_loss[loss=0.08987, simple_loss=0.1065, pruned_loss=0.02546, audio_tagging_loss=0.01115, over 3050779.67 frames. ], batch size: 62, lr: 8.87e-03, grad_scale: 32.0 2023-11-19 04:47:27,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=572746.6666666666, ans=0.07 2023-11-19 04:47:41,596 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=572813.3333333334, ans=0.125 2023-11-19 04:47:51,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=572880.0, ans=0.2 2023-11-19 04:48:02,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=572946.6666666666, ans=0.0 2023-11-19 04:48:14,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.28 vs. limit=15.0 2023-11-19 04:48:17,884 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1800, loss[loss=0.07561, simple_loss=0.0946, pruned_loss=0.01669, audio_tagging_loss=0.01162, over 16203.00 frames. ], tot_loss[loss=0.09032, simple_loss=0.1073, pruned_loss=0.02562, audio_tagging_loss=0.01104, over 3054261.98 frames. ], batch size: 59, lr: 8.87e-03, grad_scale: 32.0 2023-11-19 04:48:20,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=573080.0, ans=0.125 2023-11-19 04:48:38,399 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.757e+01 8.476e+01 9.222e+01 1.009e+02 1.227e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 04:48:42,083 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.53 vs. limit=22.5 2023-11-19 04:48:51,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=573280.0, ans=0.1 2023-11-19 04:49:13,825 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1850, loss[loss=0.1156, simple_loss=0.1475, pruned_loss=0.03427, audio_tagging_loss=0.007588, over 15305.00 frames. ], tot_loss[loss=0.09009, simple_loss=0.1073, pruned_loss=0.02559, audio_tagging_loss=0.01084, over 3050512.25 frames. ], batch size: 57, lr: 8.87e-03, grad_scale: 32.0 2023-11-19 04:49:22,860 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2023-11-19 04:49:31,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2023-11-19 04:50:09,113 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1900, loss[loss=0.07466, simple_loss=0.08646, pruned_loss=0.02056, audio_tagging_loss=0.01088, over 14038.00 frames. ], tot_loss[loss=0.08964, simple_loss=0.1069, pruned_loss=0.02533, audio_tagging_loss=0.01086, over 3058118.81 frames. ], batch size: 54, lr: 8.87e-03, grad_scale: 32.0 2023-11-19 04:50:23,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=573813.3333333334, ans=0.0 2023-11-19 04:50:25,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2023-11-19 04:50:28,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=573813.3333333334, ans=0.5 2023-11-19 04:50:31,206 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.045e+01 8.643e+01 9.371e+01 1.051e+02 1.561e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 04:50:31,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2023-11-19 04:50:32,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=573880.0, ans=0.125 2023-11-19 04:50:42,204 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=573946.6666666666, ans=0.125 2023-11-19 04:50:46,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=573946.6666666666, ans=0.125 2023-11-19 04:51:05,276 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 1950, loss[loss=0.06175, simple_loss=0.06477, pruned_loss=0.01613, audio_tagging_loss=0.01324, over 16538.00 frames. ], tot_loss[loss=0.09062, simple_loss=0.108, pruned_loss=0.02588, audio_tagging_loss=0.01073, over 3063324.98 frames. ], batch size: 66, lr: 8.86e-03, grad_scale: 32.0 2023-11-19 04:51:05,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2023-11-19 04:51:08,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=574080.0, ans=0.125 2023-11-19 04:51:16,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=574146.6666666666, ans=0.125 2023-11-19 04:51:16,586 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.44 vs. limit=22.5 2023-11-19 04:51:31,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.26 vs. limit=22.5 2023-11-19 04:51:38,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=574280.0, ans=0.09899494936611666 2023-11-19 04:51:39,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=574280.0, ans=0.125 2023-11-19 04:52:01,538 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2000, loss[loss=0.0938, simple_loss=0.1209, pruned_loss=0.02345, audio_tagging_loss=0.009909, over 15524.00 frames. ], tot_loss[loss=0.09081, simple_loss=0.1083, pruned_loss=0.02587, audio_tagging_loss=0.01081, over 3060217.71 frames. ], batch size: 56, lr: 8.86e-03, grad_scale: 32.0 2023-11-19 04:52:04,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=574413.3333333334, ans=0.125 2023-11-19 04:52:14,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=574480.0, ans=0.1 2023-11-19 04:52:15,658 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=574480.0, ans=0.0 2023-11-19 04:52:21,799 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.904e+01 9.748e+01 1.142e+02 1.614e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-19 04:52:37,805 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2023-11-19 04:52:41,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=574613.3333333334, ans=0.125 2023-11-19 04:52:54,611 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.19 vs. limit=12.0 2023-11-19 04:52:57,085 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2050, loss[loss=0.09884, simple_loss=0.1211, pruned_loss=0.02837, audio_tagging_loss=0.009932, over 15318.00 frames. ], tot_loss[loss=0.0915, simple_loss=0.1092, pruned_loss=0.02624, audio_tagging_loss=0.01069, over 3056096.67 frames. ], batch size: 57, lr: 8.86e-03, grad_scale: 32.0 2023-11-19 04:52:58,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=574746.6666666666, ans=0.125 2023-11-19 04:53:09,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=574813.3333333334, ans=0.2 2023-11-19 04:53:22,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=574880.0, ans=0.125 2023-11-19 04:53:23,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=574880.0, ans=0.125 2023-11-19 04:53:36,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=574946.6666666666, ans=0.125 2023-11-19 04:53:38,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=574946.6666666666, ans=0.1 2023-11-19 04:53:42,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=575013.3333333334, ans=0.0 2023-11-19 04:53:43,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2023-11-19 04:53:45,864 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=575013.3333333334, ans=0.1 2023-11-19 04:53:52,673 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2100, loss[loss=0.07819, simple_loss=0.09263, pruned_loss=0.02403, audio_tagging_loss=0.00784, over 15616.00 frames. ], tot_loss[loss=0.09028, simple_loss=0.1078, pruned_loss=0.02569, audio_tagging_loss=0.01071, over 3048036.61 frames. ], batch size: 61, lr: 8.86e-03, grad_scale: 32.0 2023-11-19 04:53:54,523 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:53:56,631 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=575080.0, ans=0.2 2023-11-19 04:54:07,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=575146.6666666666, ans=0.125 2023-11-19 04:54:14,253 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.268e+01 8.570e+01 9.138e+01 1.001e+02 1.384e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 04:54:18,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.34 vs. limit=10.0 2023-11-19 04:54:23,047 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=22.5 2023-11-19 04:54:28,070 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=575280.0, ans=0.125 2023-11-19 04:54:39,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=575346.6666666666, ans=0.2 2023-11-19 04:54:40,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=575346.6666666666, ans=0.0 2023-11-19 04:54:48,407 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2150, loss[loss=0.1032, simple_loss=0.1172, pruned_loss=0.03326, audio_tagging_loss=0.01133, over 15763.00 frames. ], tot_loss[loss=0.09107, simple_loss=0.1088, pruned_loss=0.02606, audio_tagging_loss=0.0106, over 3042997.77 frames. ], batch size: 61, lr: 8.85e-03, grad_scale: 32.0 2023-11-19 04:54:53,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.41 vs. limit=15.0 2023-11-19 04:55:04,684 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2023-11-19 04:55:20,962 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:55:24,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=575613.3333333334, ans=0.0 2023-11-19 04:55:29,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=575613.3333333334, ans=0.05 2023-11-19 04:55:43,940 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2200, loss[loss=0.1098, simple_loss=0.1425, pruned_loss=0.0299, audio_tagging_loss=0.00865, over 15615.00 frames. ], tot_loss[loss=0.09179, simple_loss=0.1101, pruned_loss=0.02627, audio_tagging_loss=0.01046, over 3051555.41 frames. ], batch size: 56, lr: 8.85e-03, grad_scale: 32.0 2023-11-19 04:56:04,955 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.523e+01 9.283e+01 9.995e+01 1.354e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-19 04:56:09,166 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.69 vs. limit=22.5 2023-11-19 04:56:29,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=576013.3333333334, ans=0.0 2023-11-19 04:56:32,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=576013.3333333334, ans=0.2 2023-11-19 04:56:34,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=576013.3333333334, ans=0.0 2023-11-19 04:56:39,878 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2250, loss[loss=0.1223, simple_loss=0.144, pruned_loss=0.04056, audio_tagging_loss=0.009769, over 14818.00 frames. ], tot_loss[loss=0.09159, simple_loss=0.1097, pruned_loss=0.02621, audio_tagging_loss=0.01054, over 3044888.12 frames. ], batch size: 58, lr: 8.85e-03, grad_scale: 32.0 2023-11-19 04:56:40,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.33 vs. limit=15.0 2023-11-19 04:56:48,175 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=576080.0, ans=0.025 2023-11-19 04:56:53,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=576146.6666666666, ans=0.1 2023-11-19 04:57:02,620 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2023-11-19 04:57:10,254 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=576213.3333333334, ans=0.0 2023-11-19 04:57:16,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=576280.0, ans=0.07 2023-11-19 04:57:29,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=576346.6666666666, ans=0.0 2023-11-19 04:57:31,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=576346.6666666666, ans=0.0 2023-11-19 04:57:35,792 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2300, loss[loss=0.09651, simple_loss=0.1159, pruned_loss=0.0274, audio_tagging_loss=0.01114, over 14840.00 frames. ], tot_loss[loss=0.09171, simple_loss=0.1095, pruned_loss=0.02619, audio_tagging_loss=0.01077, over 3046930.48 frames. ], batch size: 55, lr: 8.85e-03, grad_scale: 16.0 2023-11-19 04:57:42,846 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.24 vs. limit=15.0 2023-11-19 04:57:57,357 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.832e+01 8.335e+01 9.344e+01 1.048e+02 1.433e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-19 04:58:05,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=576546.6666666666, ans=0.1 2023-11-19 04:58:23,144 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:58:30,956 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2350, loss[loss=0.08513, simple_loss=0.09878, pruned_loss=0.02238, audio_tagging_loss=0.01336, over 15241.00 frames. ], tot_loss[loss=0.0922, simple_loss=0.1097, pruned_loss=0.02643, audio_tagging_loss=0.01093, over 3046366.92 frames. ], batch size: 54, lr: 8.84e-03, grad_scale: 16.0 2023-11-19 04:58:31,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=576746.6666666666, ans=0.0 2023-11-19 04:58:49,486 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=15.0 2023-11-19 04:58:51,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=576813.3333333334, ans=0.2 2023-11-19 04:58:52,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=576880.0, ans=0.125 2023-11-19 04:59:26,754 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2400, loss[loss=0.1216, simple_loss=0.1509, pruned_loss=0.03752, audio_tagging_loss=0.008633, over 14605.00 frames. ], tot_loss[loss=0.09204, simple_loss=0.1098, pruned_loss=0.02611, audio_tagging_loss=0.01101, over 3051701.96 frames. ], batch size: 54, lr: 8.84e-03, grad_scale: 32.0 2023-11-19 04:59:45,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=577146.6666666666, ans=0.125 2023-11-19 04:59:48,961 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.521e+01 9.139e+01 1.013e+02 1.981e+02, threshold=1.828e+02, percent-clipped=1.0 2023-11-19 04:59:51,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=577213.3333333334, ans=0.07 2023-11-19 04:59:56,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=577213.3333333334, ans=0.125 2023-11-19 04:59:58,855 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.54 vs. limit=5.0 2023-11-19 05:00:02,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=577280.0, ans=0.125 2023-11-19 05:00:23,037 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2450, loss[loss=0.0965, simple_loss=0.1222, pruned_loss=0.02491, audio_tagging_loss=0.01047, over 15972.00 frames. ], tot_loss[loss=0.09197, simple_loss=0.1097, pruned_loss=0.02603, audio_tagging_loss=0.0111, over 3049138.70 frames. ], batch size: 57, lr: 8.84e-03, grad_scale: 32.0 2023-11-19 05:00:42,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.81 vs. limit=10.0 2023-11-19 05:00:57,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=577613.3333333334, ans=0.125 2023-11-19 05:01:03,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=577613.3333333334, ans=0.0 2023-11-19 05:01:08,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=577680.0, ans=0.125 2023-11-19 05:01:18,010 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2500, loss[loss=0.07877, simple_loss=0.09186, pruned_loss=0.01983, audio_tagging_loss=0.01302, over 14860.00 frames. ], tot_loss[loss=0.09187, simple_loss=0.1094, pruned_loss=0.02602, audio_tagging_loss=0.01114, over 3048641.62 frames. ], batch size: 56, lr: 8.84e-03, grad_scale: 16.0 2023-11-19 05:01:19,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=577746.6666666666, ans=0.2 2023-11-19 05:01:32,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=577813.3333333334, ans=0.125 2023-11-19 05:01:37,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=577813.3333333334, ans=0.125 2023-11-19 05:01:41,721 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.089e+01 8.392e+01 9.155e+01 9.880e+01 1.151e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 05:01:56,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=577946.6666666666, ans=0.125 2023-11-19 05:01:56,162 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=577946.6666666666, ans=0.1 2023-11-19 05:02:13,212 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2550, loss[loss=0.0986, simple_loss=0.1241, pruned_loss=0.02713, audio_tagging_loss=0.0094, over 16716.00 frames. ], tot_loss[loss=0.09265, simple_loss=0.1104, pruned_loss=0.02639, audio_tagging_loss=0.01104, over 3043516.07 frames. ], batch size: 60, lr: 8.83e-03, grad_scale: 16.0 2023-11-19 05:02:36,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=578213.3333333334, ans=0.125 2023-11-19 05:02:52,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=578280.0, ans=0.0 2023-11-19 05:03:09,580 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2600, loss[loss=0.08309, simple_loss=0.0994, pruned_loss=0.02373, audio_tagging_loss=0.009661, over 14372.00 frames. ], tot_loss[loss=0.09237, simple_loss=0.1105, pruned_loss=0.02621, audio_tagging_loss=0.01093, over 3047196.72 frames. ], batch size: 54, lr: 8.83e-03, grad_scale: 16.0 2023-11-19 05:03:09,827 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=578413.3333333334, ans=0.125 2023-11-19 05:03:11,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=578413.3333333334, ans=0.0 2023-11-19 05:03:14,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=578413.3333333334, ans=0.125 2023-11-19 05:03:32,524 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.609e+01 9.579e+01 1.039e+02 1.650e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-19 05:03:46,068 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=12.0 2023-11-19 05:03:58,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=578680.0, ans=0.0 2023-11-19 05:04:02,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=578680.0, ans=0.0 2023-11-19 05:04:05,560 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2650, loss[loss=0.1091, simple_loss=0.1303, pruned_loss=0.03127, audio_tagging_loss=0.01264, over 16095.00 frames. ], tot_loss[loss=0.09115, simple_loss=0.1092, pruned_loss=0.02572, audio_tagging_loss=0.01081, over 3049279.90 frames. ], batch size: 58, lr: 8.83e-03, grad_scale: 16.0 2023-11-19 05:04:33,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=578880.0, ans=0.0 2023-11-19 05:04:40,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=578946.6666666666, ans=0.0 2023-11-19 05:04:40,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=578946.6666666666, ans=0.125 2023-11-19 05:04:53,656 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.38 vs. limit=10.0 2023-11-19 05:04:55,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=579013.3333333334, ans=0.125 2023-11-19 05:05:00,346 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2700, loss[loss=0.1068, simple_loss=0.1167, pruned_loss=0.0366, audio_tagging_loss=0.01185, over 15472.00 frames. ], tot_loss[loss=0.09171, simple_loss=0.11, pruned_loss=0.02602, audio_tagging_loss=0.0107, over 3053277.42 frames. ], batch size: 59, lr: 8.83e-03, grad_scale: 16.0 2023-11-19 05:05:14,756 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:05:20,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.43 vs. limit=15.0 2023-11-19 05:05:24,702 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 8.403e+01 9.195e+01 1.002e+02 1.372e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-19 05:05:28,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=579213.3333333334, ans=0.1 2023-11-19 05:05:31,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=12.0 2023-11-19 05:05:41,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=579280.0, ans=0.2 2023-11-19 05:05:42,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=579280.0, ans=0.1 2023-11-19 05:05:43,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=579280.0, ans=0.5 2023-11-19 05:05:44,683 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.38 vs. limit=22.5 2023-11-19 05:05:57,146 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2750, loss[loss=0.0801, simple_loss=0.09333, pruned_loss=0.02254, audio_tagging_loss=0.01089, over 14650.00 frames. ], tot_loss[loss=0.09162, simple_loss=0.11, pruned_loss=0.02596, audio_tagging_loss=0.01064, over 3049609.93 frames. ], batch size: 54, lr: 8.82e-03, grad_scale: 16.0 2023-11-19 05:06:01,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=579413.3333333334, ans=0.09899494936611666 2023-11-19 05:06:02,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2023-11-19 05:06:13,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=579480.0, ans=0.125 2023-11-19 05:06:19,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=579546.6666666666, ans=0.125 2023-11-19 05:06:26,674 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=579546.6666666666, ans=0.125 2023-11-19 05:06:44,673 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:06:52,986 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2800, loss[loss=0.08206, simple_loss=0.08819, pruned_loss=0.0211, audio_tagging_loss=0.01687, over 15374.00 frames. ], tot_loss[loss=0.09138, simple_loss=0.1097, pruned_loss=0.02583, audio_tagging_loss=0.01072, over 3046776.44 frames. ], batch size: 59, lr: 8.82e-03, grad_scale: 32.0 2023-11-19 05:06:57,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=579746.6666666666, ans=0.2 2023-11-19 05:06:59,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=579746.6666666666, ans=0.0 2023-11-19 05:07:00,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=579746.6666666666, ans=0.125 2023-11-19 05:07:16,554 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.973e+01 8.959e+01 9.930e+01 1.123e+02 1.609e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-19 05:07:38,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=580013.3333333334, ans=0.0 2023-11-19 05:07:39,042 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=12.0 2023-11-19 05:07:48,251 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2850, loss[loss=0.08842, simple_loss=0.106, pruned_loss=0.02647, audio_tagging_loss=0.008956, over 15353.00 frames. ], tot_loss[loss=0.09151, simple_loss=0.11, pruned_loss=0.02584, audio_tagging_loss=0.01067, over 3047466.72 frames. ], batch size: 59, lr: 8.82e-03, grad_scale: 32.0 2023-11-19 05:07:54,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=12.0 2023-11-19 05:08:03,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-19 05:08:09,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=580146.6666666666, ans=0.125 2023-11-19 05:08:12,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.08 vs. limit=15.0 2023-11-19 05:08:28,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=580280.0, ans=0.2 2023-11-19 05:08:45,315 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2900, loss[loss=0.09018, simple_loss=0.1093, pruned_loss=0.02655, audio_tagging_loss=0.008984, over 15698.00 frames. ], tot_loss[loss=0.09172, simple_loss=0.1105, pruned_loss=0.02599, audio_tagging_loss=0.01049, over 3045146.62 frames. ], batch size: 56, lr: 8.82e-03, grad_scale: 32.0 2023-11-19 05:08:47,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=580413.3333333334, ans=0.125 2023-11-19 05:08:51,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=580413.3333333334, ans=0.125 2023-11-19 05:09:04,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=580480.0, ans=0.0 2023-11-19 05:09:08,429 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.767e+01 8.526e+01 9.240e+01 1.001e+02 1.332e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 05:09:11,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=580546.6666666666, ans=0.04949747468305833 2023-11-19 05:09:32,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=580680.0, ans=0.2 2023-11-19 05:09:34,636 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2023-11-19 05:09:41,483 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 2950, loss[loss=0.0873, simple_loss=0.1053, pruned_loss=0.0228, audio_tagging_loss=0.01184, over 16454.00 frames. ], tot_loss[loss=0.09303, simple_loss=0.112, pruned_loss=0.02653, audio_tagging_loss=0.01052, over 3049029.22 frames. ], batch size: 64, lr: 8.81e-03, grad_scale: 32.0 2023-11-19 05:09:52,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=580813.3333333334, ans=0.125 2023-11-19 05:10:14,647 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=12.0 2023-11-19 05:10:19,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=580946.6666666666, ans=0.0 2023-11-19 05:10:28,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=581013.3333333334, ans=0.2 2023-11-19 05:10:36,773 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3000, loss[loss=0.07147, simple_loss=0.08824, pruned_loss=0.01827, audio_tagging_loss=0.009081, over 15163.00 frames. ], tot_loss[loss=0.09179, simple_loss=0.1103, pruned_loss=0.02598, audio_tagging_loss=0.01066, over 3045450.55 frames. ], batch size: 56, lr: 8.81e-03, grad_scale: 32.0 2023-11-19 05:10:36,774 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 05:10:47,073 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5142, 3.9817, 4.6255, 4.2730], device='cuda:1') 2023-11-19 05:10:49,175 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.3359, 4.7097, 5.1386, 4.6109], device='cuda:1') 2023-11-19 05:11:03,247 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.5548, 4.1333, 3.6006, 3.0300], device='cuda:1') 2023-11-19 05:11:08,545 INFO [train_asr.py:1147] (1/4) Epoch 8, validation: loss=0.06637, simple_loss=0.05694, pruned_loss=0.00724, audio_tagging_loss=0.03066, over 4681554.00 frames. 2023-11-19 05:11:08,545 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 05:11:08,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=581080.0, ans=0.125 2023-11-19 05:11:13,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=581080.0, ans=0.125 2023-11-19 05:11:27,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=581146.6666666666, ans=0.0 2023-11-19 05:11:31,380 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.149e+01 8.419e+01 9.133e+01 9.790e+01 1.190e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 05:11:33,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=581213.3333333334, ans=0.125 2023-11-19 05:11:40,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=581280.0, ans=0.125 2023-11-19 05:11:40,854 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.03 vs. limit=8.0 2023-11-19 05:11:46,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=581280.0, ans=0.0 2023-11-19 05:11:47,521 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:11:58,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=581346.6666666666, ans=0.0 2023-11-19 05:11:59,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=22.5 2023-11-19 05:12:04,543 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3050, loss[loss=0.08669, simple_loss=0.11, pruned_loss=0.02149, audio_tagging_loss=0.01019, over 15859.00 frames. ], tot_loss[loss=0.09166, simple_loss=0.11, pruned_loss=0.02595, audio_tagging_loss=0.01071, over 3045152.56 frames. ], batch size: 59, lr: 8.81e-03, grad_scale: 32.0 2023-11-19 05:12:10,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=581413.3333333334, ans=0.125 2023-11-19 05:12:18,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=581480.0, ans=0.125 2023-11-19 05:12:29,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=581546.6666666666, ans=0.1 2023-11-19 05:12:36,520 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:12:59,883 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3100, loss[loss=0.08734, simple_loss=0.1051, pruned_loss=0.02421, audio_tagging_loss=0.01055, over 15245.00 frames. ], tot_loss[loss=0.09281, simple_loss=0.1114, pruned_loss=0.02638, audio_tagging_loss=0.01075, over 3051817.95 frames. ], batch size: 55, lr: 8.81e-03, grad_scale: 32.0 2023-11-19 05:13:02,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=581746.6666666666, ans=0.125 2023-11-19 05:13:06,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=581746.6666666666, ans=0.04949747468305833 2023-11-19 05:13:07,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=581746.6666666666, ans=0.1 2023-11-19 05:13:24,919 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.266e+01 8.767e+01 9.718e+01 1.090e+02 1.747e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-19 05:13:40,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=581946.6666666666, ans=0.2 2023-11-19 05:13:55,153 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2023-11-19 05:13:55,665 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3150, loss[loss=0.08905, simple_loss=0.1049, pruned_loss=0.02836, audio_tagging_loss=0.008223, over 15765.00 frames. ], tot_loss[loss=0.09237, simple_loss=0.1104, pruned_loss=0.02615, audio_tagging_loss=0.01099, over 3046322.11 frames. ], batch size: 57, lr: 8.80e-03, grad_scale: 16.0 2023-11-19 05:14:03,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=582080.0, ans=0.125 2023-11-19 05:14:03,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.49 vs. limit=6.0 2023-11-19 05:14:19,419 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2023-11-19 05:14:51,790 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3200, loss[loss=0.1026, simple_loss=0.1223, pruned_loss=0.02817, audio_tagging_loss=0.01334, over 14035.00 frames. ], tot_loss[loss=0.09292, simple_loss=0.111, pruned_loss=0.02638, audio_tagging_loss=0.01101, over 3040383.79 frames. ], batch size: 53, lr: 8.80e-03, grad_scale: 32.0 2023-11-19 05:14:54,671 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=582413.3333333334, ans=6.0 2023-11-19 05:14:54,671 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-19 05:15:15,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.369e+01 9.165e+01 1.003e+02 1.359e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-19 05:15:31,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=582613.3333333334, ans=0.2 2023-11-19 05:15:36,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=582680.0, ans=0.125 2023-11-19 05:15:43,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=582680.0, ans=0.2 2023-11-19 05:15:46,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=582746.6666666666, ans=0.1 2023-11-19 05:15:47,232 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3250, loss[loss=0.09625, simple_loss=0.1126, pruned_loss=0.03079, audio_tagging_loss=0.009174, over 15238.00 frames. ], tot_loss[loss=0.09196, simple_loss=0.1099, pruned_loss=0.02602, audio_tagging_loss=0.011, over 3042298.11 frames. ], batch size: 57, lr: 8.80e-03, grad_scale: 32.0 2023-11-19 05:15:58,878 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.28 vs. limit=15.0 2023-11-19 05:15:59,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=582813.3333333334, ans=0.1 2023-11-19 05:16:08,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=582813.3333333334, ans=0.0 2023-11-19 05:16:31,143 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=583013.3333333334, ans=0.0 2023-11-19 05:16:35,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=583013.3333333334, ans=0.0 2023-11-19 05:16:42,947 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3300, loss[loss=0.08491, simple_loss=0.0987, pruned_loss=0.02422, audio_tagging_loss=0.01133, over 16028.00 frames. ], tot_loss[loss=0.0913, simple_loss=0.1092, pruned_loss=0.02573, audio_tagging_loss=0.01096, over 3055105.02 frames. ], batch size: 63, lr: 8.80e-03, grad_scale: 32.0 2023-11-19 05:16:51,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=583080.0, ans=0.0 2023-11-19 05:16:58,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=583146.6666666666, ans=0.125 2023-11-19 05:17:08,705 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.018e+01 8.571e+01 9.137e+01 1.011e+02 1.807e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 05:17:12,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=583213.3333333334, ans=0.0 2023-11-19 05:17:13,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=583213.3333333334, ans=0.0 2023-11-19 05:17:20,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2023-11-19 05:17:23,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=583280.0, ans=0.0 2023-11-19 05:17:38,942 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3350, loss[loss=0.1021, simple_loss=0.1235, pruned_loss=0.02738, audio_tagging_loss=0.01301, over 16304.00 frames. ], tot_loss[loss=0.09177, simple_loss=0.1098, pruned_loss=0.02597, audio_tagging_loss=0.01092, over 3054999.01 frames. ], batch size: 62, lr: 8.79e-03, grad_scale: 16.0 2023-11-19 05:17:39,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=583413.3333333334, ans=0.125 2023-11-19 05:17:47,124 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=583413.3333333334, ans=0.125 2023-11-19 05:18:00,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=583546.6666666666, ans=0.09899494936611666 2023-11-19 05:18:09,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=583546.6666666666, ans=0.125 2023-11-19 05:18:13,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=12.0 2023-11-19 05:18:21,959 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-11-19 05:18:26,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=583680.0, ans=0.0 2023-11-19 05:18:29,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583680.0, ans=0.1 2023-11-19 05:18:34,537 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3400, loss[loss=0.08895, simple_loss=0.1075, pruned_loss=0.02594, audio_tagging_loss=0.009278, over 15585.00 frames. ], tot_loss[loss=0.09165, simple_loss=0.1101, pruned_loss=0.02588, audio_tagging_loss=0.01069, over 3054078.46 frames. ], batch size: 56, lr: 8.79e-03, grad_scale: 16.0 2023-11-19 05:18:52,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583813.3333333334, ans=0.1 2023-11-19 05:19:00,608 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.104e+01 8.466e+01 9.111e+01 1.006e+02 1.690e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-19 05:19:05,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=583880.0, ans=0.125 2023-11-19 05:19:07,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=583946.6666666666, ans=0.125 2023-11-19 05:19:12,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=583946.6666666666, ans=0.125 2023-11-19 05:19:16,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=583946.6666666666, ans=0.125 2023-11-19 05:19:30,491 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3450, loss[loss=0.07819, simple_loss=0.09073, pruned_loss=0.01908, audio_tagging_loss=0.01375, over 16191.00 frames. ], tot_loss[loss=0.09128, simple_loss=0.1101, pruned_loss=0.02576, audio_tagging_loss=0.01049, over 3059487.05 frames. ], batch size: 61, lr: 8.79e-03, grad_scale: 16.0 2023-11-19 05:19:41,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=584146.6666666666, ans=0.035 2023-11-19 05:19:45,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=584146.6666666666, ans=0.0 2023-11-19 05:19:58,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=584213.3333333334, ans=0.2 2023-11-19 05:20:08,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=584280.0, ans=0.125 2023-11-19 05:20:13,840 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.42 vs. limit=22.5 2023-11-19 05:20:18,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=584346.6666666666, ans=0.125 2023-11-19 05:20:26,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.23 vs. limit=6.0 2023-11-19 05:20:27,023 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3500, loss[loss=0.1079, simple_loss=0.1274, pruned_loss=0.03435, audio_tagging_loss=0.009884, over 14463.00 frames. ], tot_loss[loss=0.09078, simple_loss=0.1095, pruned_loss=0.02558, audio_tagging_loss=0.01046, over 3049309.34 frames. ], batch size: 57, lr: 8.79e-03, grad_scale: 16.0 2023-11-19 05:20:28,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584413.3333333334, ans=0.1 2023-11-19 05:20:52,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.420e+01 9.270e+01 9.989e+01 2.188e+02, threshold=1.854e+02, percent-clipped=1.0 2023-11-19 05:20:54,689 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:21:00,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=584613.3333333334, ans=0.125 2023-11-19 05:21:07,071 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.90 vs. limit=22.5 2023-11-19 05:21:23,081 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3550, loss[loss=0.1036, simple_loss=0.1262, pruned_loss=0.03126, audio_tagging_loss=0.009237, over 14865.00 frames. ], tot_loss[loss=0.09061, simple_loss=0.1093, pruned_loss=0.02555, audio_tagging_loss=0.01042, over 3045590.77 frames. ], batch size: 57, lr: 8.78e-03, grad_scale: 16.0 2023-11-19 05:21:27,496 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:21:38,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=584813.3333333334, ans=0.125 2023-11-19 05:22:19,041 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3600, loss[loss=0.07049, simple_loss=0.07339, pruned_loss=0.01866, audio_tagging_loss=0.01513, over 14867.00 frames. ], tot_loss[loss=0.09023, simple_loss=0.1086, pruned_loss=0.02541, audio_tagging_loss=0.0105, over 3049026.80 frames. ], batch size: 58, lr: 8.78e-03, grad_scale: 32.0 2023-11-19 05:22:24,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=585080.0, ans=0.125 2023-11-19 05:22:44,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.417e+01 9.022e+01 1.001e+02 1.493e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 05:22:51,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=585280.0, ans=0.2 2023-11-19 05:23:07,031 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=12.0 2023-11-19 05:23:08,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2023-11-19 05:23:14,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-19 05:23:15,472 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3650, loss[loss=0.09668, simple_loss=0.1172, pruned_loss=0.02812, audio_tagging_loss=0.009973, over 15058.00 frames. ], tot_loss[loss=0.09069, simple_loss=0.1093, pruned_loss=0.02561, audio_tagging_loss=0.01045, over 3048876.85 frames. ], batch size: 57, lr: 8.78e-03, grad_scale: 32.0 2023-11-19 05:23:17,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=585413.3333333334, ans=0.07 2023-11-19 05:23:49,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.85 vs. limit=10.0 2023-11-19 05:23:50,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=585613.3333333334, ans=0.0 2023-11-19 05:23:51,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=585613.3333333334, ans=0.2 2023-11-19 05:23:58,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=585613.3333333334, ans=0.1 2023-11-19 05:24:00,434 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.84 vs. limit=15.0 2023-11-19 05:24:10,700 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3700, loss[loss=0.09206, simple_loss=0.09991, pruned_loss=0.03111, audio_tagging_loss=0.01099, over 14288.00 frames. ], tot_loss[loss=0.09081, simple_loss=0.1092, pruned_loss=0.02576, audio_tagging_loss=0.01045, over 3052564.09 frames. ], batch size: 55, lr: 8.78e-03, grad_scale: 32.0 2023-11-19 05:24:14,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=585746.6666666666, ans=0.125 2023-11-19 05:24:14,954 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.23 vs. limit=10.0 2023-11-19 05:24:20,874 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=585813.3333333334, ans=0.1 2023-11-19 05:24:24,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=585813.3333333334, ans=0.1 2023-11-19 05:24:37,262 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.128e+01 8.524e+01 9.099e+01 9.813e+01 1.282e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-19 05:24:39,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=15.0 2023-11-19 05:24:47,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=585946.6666666666, ans=0.0 2023-11-19 05:25:00,423 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=586013.3333333334, ans=0.125 2023-11-19 05:25:06,476 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3750, loss[loss=0.083, simple_loss=0.1009, pruned_loss=0.0226, audio_tagging_loss=0.009932, over 14910.00 frames. ], tot_loss[loss=0.09071, simple_loss=0.1089, pruned_loss=0.02569, audio_tagging_loss=0.01058, over 3053032.56 frames. ], batch size: 56, lr: 8.77e-03, grad_scale: 32.0 2023-11-19 05:25:13,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=586080.0, ans=0.1 2023-11-19 05:25:26,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=586146.6666666666, ans=0.125 2023-11-19 05:25:32,513 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=586213.3333333334, ans=0.035 2023-11-19 05:25:35,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=586213.3333333334, ans=0.2 2023-11-19 05:25:44,622 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:25:48,301 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.03 vs. limit=10.0 2023-11-19 05:25:50,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=586346.6666666666, ans=0.2 2023-11-19 05:25:55,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=586346.6666666666, ans=0.04949747468305833 2023-11-19 05:26:03,045 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3800, loss[loss=0.08231, simple_loss=0.0925, pruned_loss=0.02127, audio_tagging_loss=0.01479, over 15680.00 frames. ], tot_loss[loss=0.09129, simple_loss=0.1096, pruned_loss=0.02588, audio_tagging_loss=0.01058, over 3052548.39 frames. ], batch size: 61, lr: 8.77e-03, grad_scale: 32.0 2023-11-19 05:26:06,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=586413.3333333334, ans=0.125 2023-11-19 05:26:26,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=586546.6666666666, ans=0.125 2023-11-19 05:26:27,759 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.211e+01 8.964e+01 1.013e+02 1.295e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 05:26:32,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=586546.6666666666, ans=0.1 2023-11-19 05:26:57,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=586680.0, ans=0.0 2023-11-19 05:27:00,380 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3850, loss[loss=0.06642, simple_loss=0.07748, pruned_loss=0.01641, audio_tagging_loss=0.01127, over 15236.00 frames. ], tot_loss[loss=0.09077, simple_loss=0.1088, pruned_loss=0.02565, audio_tagging_loss=0.01069, over 3055533.10 frames. ], batch size: 56, lr: 8.77e-03, grad_scale: 32.0 2023-11-19 05:27:00,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=586746.6666666666, ans=0.1 2023-11-19 05:27:17,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=586813.3333333334, ans=0.125 2023-11-19 05:27:51,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=587013.3333333334, ans=0.0 2023-11-19 05:27:56,350 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3900, loss[loss=0.08323, simple_loss=0.08461, pruned_loss=0.02551, audio_tagging_loss=0.01542, over 14471.00 frames. ], tot_loss[loss=0.09027, simple_loss=0.108, pruned_loss=0.0255, audio_tagging_loss=0.01077, over 3043721.00 frames. ], batch size: 55, lr: 8.77e-03, grad_scale: 32.0 2023-11-19 05:27:57,006 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2023-11-19 05:28:09,164 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=587146.6666666666, ans=0.0 2023-11-19 05:28:21,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=587213.3333333334, ans=0.125 2023-11-19 05:28:22,403 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.532e+01 9.315e+01 9.987e+01 1.482e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 05:28:35,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=587280.0, ans=0.125 2023-11-19 05:28:52,868 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 3950, loss[loss=0.08182, simple_loss=0.09152, pruned_loss=0.02512, audio_tagging_loss=0.01094, over 14412.00 frames. ], tot_loss[loss=0.09017, simple_loss=0.1078, pruned_loss=0.02534, audio_tagging_loss=0.01092, over 3039859.70 frames. ], batch size: 54, lr: 8.76e-03, grad_scale: 32.0 2023-11-19 05:29:05,665 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.41 vs. limit=6.0 2023-11-19 05:29:35,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=587613.3333333334, ans=0.1 2023-11-19 05:29:43,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=587680.0, ans=0.2 2023-11-19 05:29:47,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=587746.6666666666, ans=0.125 2023-11-19 05:29:48,575 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4000, loss[loss=0.1089, simple_loss=0.1291, pruned_loss=0.0336, audio_tagging_loss=0.01072, over 15895.00 frames. ], tot_loss[loss=0.09173, simple_loss=0.1096, pruned_loss=0.02602, audio_tagging_loss=0.01091, over 3041356.40 frames. ], batch size: 58, lr: 8.76e-03, grad_scale: 32.0 2023-11-19 05:29:49,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=587746.6666666666, ans=0.125 2023-11-19 05:29:50,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=587746.6666666666, ans=0.04949747468305833 2023-11-19 05:29:58,449 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=587813.3333333334, ans=0.125 2023-11-19 05:30:10,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=587880.0, ans=0.1 2023-11-19 05:30:13,800 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=587880.0, ans=0.2 2023-11-19 05:30:14,660 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.556e+01 9.306e+01 1.024e+02 1.346e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-19 05:30:44,077 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4050, loss[loss=0.1084, simple_loss=0.1407, pruned_loss=0.02964, audio_tagging_loss=0.008392, over 15014.00 frames. ], tot_loss[loss=0.09264, simple_loss=0.1107, pruned_loss=0.02641, audio_tagging_loss=0.01091, over 3043325.88 frames. ], batch size: 54, lr: 8.76e-03, grad_scale: 32.0 2023-11-19 05:30:46,194 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:30:54,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=588080.0, ans=0.1 2023-11-19 05:31:00,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=588146.6666666666, ans=0.125 2023-11-19 05:31:15,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=588213.3333333334, ans=0.2 2023-11-19 05:31:16,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=588213.3333333334, ans=0.1 2023-11-19 05:31:18,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=588280.0, ans=0.125 2023-11-19 05:31:26,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=588280.0, ans=0.125 2023-11-19 05:31:29,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=588346.6666666666, ans=0.0 2023-11-19 05:31:32,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=588346.6666666666, ans=0.125 2023-11-19 05:31:39,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=588346.6666666666, ans=0.125 2023-11-19 05:31:40,828 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4100, loss[loss=0.09052, simple_loss=0.09763, pruned_loss=0.03097, audio_tagging_loss=0.01074, over 14460.00 frames. ], tot_loss[loss=0.09248, simple_loss=0.1106, pruned_loss=0.02622, audio_tagging_loss=0.01097, over 3042204.86 frames. ], batch size: 57, lr: 8.76e-03, grad_scale: 32.0 2023-11-19 05:31:56,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.09 vs. limit=22.5 2023-11-19 05:31:59,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=588480.0, ans=0.2 2023-11-19 05:32:02,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=588546.6666666666, ans=0.125 2023-11-19 05:32:02,952 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:32:05,812 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.777e+01 9.323e+01 1.010e+02 1.338e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 05:32:07,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=588546.6666666666, ans=0.125 2023-11-19 05:32:32,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=588680.0, ans=0.125 2023-11-19 05:32:36,887 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4150, loss[loss=0.09727, simple_loss=0.1174, pruned_loss=0.02736, audio_tagging_loss=0.01122, over 16152.00 frames. ], tot_loss[loss=0.09239, simple_loss=0.1107, pruned_loss=0.02626, audio_tagging_loss=0.01076, over 3043959.11 frames. ], batch size: 60, lr: 8.75e-03, grad_scale: 32.0 2023-11-19 05:32:47,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=588813.3333333334, ans=0.2 2023-11-19 05:33:08,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=588880.0, ans=0.0 2023-11-19 05:33:11,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=588946.6666666666, ans=0.125 2023-11-19 05:33:16,897 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:33:31,685 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4200, loss[loss=0.09348, simple_loss=0.1082, pruned_loss=0.03038, audio_tagging_loss=0.009019, over 14911.00 frames. ], tot_loss[loss=0.09171, simple_loss=0.1098, pruned_loss=0.02611, audio_tagging_loss=0.01069, over 3042515.78 frames. ], batch size: 56, lr: 8.75e-03, grad_scale: 32.0 2023-11-19 05:33:44,898 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-11-19 05:33:58,310 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.731e+01 8.836e+01 9.609e+01 1.061e+02 1.544e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-19 05:33:59,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=589213.3333333334, ans=0.125 2023-11-19 05:34:02,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=589213.3333333334, ans=0.2 2023-11-19 05:34:26,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=589346.6666666666, ans=0.1 2023-11-19 05:34:28,181 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4250, loss[loss=0.07652, simple_loss=0.08852, pruned_loss=0.02082, audio_tagging_loss=0.01145, over 16119.00 frames. ], tot_loss[loss=0.09098, simple_loss=0.1091, pruned_loss=0.02576, audio_tagging_loss=0.01067, over 3040343.96 frames. ], batch size: 59, lr: 8.75e-03, grad_scale: 32.0 2023-11-19 05:34:45,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=589480.0, ans=0.125 2023-11-19 05:34:45,989 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2023-11-19 05:34:55,054 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=589546.6666666666, ans=0.2 2023-11-19 05:34:59,206 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:35:02,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=589613.3333333334, ans=0.125 2023-11-19 05:35:02,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=589613.3333333334, ans=0.1 2023-11-19 05:35:13,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=589680.0, ans=0.0 2023-11-19 05:35:13,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=589680.0, ans=0.0 2023-11-19 05:35:13,804 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=12.0 2023-11-19 05:35:24,460 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4300, loss[loss=0.05788, simple_loss=0.05885, pruned_loss=0.01502, audio_tagging_loss=0.01343, over 14676.00 frames. ], tot_loss[loss=0.09108, simple_loss=0.1093, pruned_loss=0.0259, audio_tagging_loss=0.01054, over 3055691.61 frames. ], batch size: 57, lr: 8.75e-03, grad_scale: 32.0 2023-11-19 05:35:24,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=589746.6666666666, ans=0.125 2023-11-19 05:35:29,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=589746.6666666666, ans=0.1 2023-11-19 05:35:45,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=589880.0, ans=0.0 2023-11-19 05:35:49,995 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.689e+01 9.452e+01 1.019e+02 1.393e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-19 05:36:18,939 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4350, loss[loss=0.09674, simple_loss=0.1206, pruned_loss=0.02688, audio_tagging_loss=0.009537, over 14730.00 frames. ], tot_loss[loss=0.09107, simple_loss=0.1095, pruned_loss=0.02585, audio_tagging_loss=0.01045, over 3051861.47 frames. ], batch size: 54, lr: 8.74e-03, grad_scale: 16.0 2023-11-19 05:36:21,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=590080.0, ans=0.125 2023-11-19 05:36:32,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.89 vs. limit=10.0 2023-11-19 05:36:47,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=590213.3333333334, ans=12.0 2023-11-19 05:36:51,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=590280.0, ans=10.0 2023-11-19 05:37:13,238 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2023-11-19 05:37:14,720 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4400, loss[loss=0.0911, simple_loss=0.1129, pruned_loss=0.02353, audio_tagging_loss=0.01112, over 15102.00 frames. ], tot_loss[loss=0.0914, simple_loss=0.1099, pruned_loss=0.02599, audio_tagging_loss=0.01047, over 3048858.69 frames. ], batch size: 55, lr: 8.74e-03, grad_scale: 32.0 2023-11-19 05:37:27,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=590480.0, ans=0.125 2023-11-19 05:37:33,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=590480.0, ans=0.2 2023-11-19 05:37:40,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=590546.6666666666, ans=0.125 2023-11-19 05:37:41,048 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.028e+01 8.473e+01 9.223e+01 1.006e+02 1.257e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 05:37:44,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=590546.6666666666, ans=0.0 2023-11-19 05:37:59,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=590680.0, ans=0.0 2023-11-19 05:38:00,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=590680.0, ans=0.05 2023-11-19 05:38:11,087 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4450, loss[loss=0.07876, simple_loss=0.09478, pruned_loss=0.02028, audio_tagging_loss=0.01109, over 14610.00 frames. ], tot_loss[loss=0.09079, simple_loss=0.1093, pruned_loss=0.02572, audio_tagging_loss=0.01042, over 3052780.08 frames. ], batch size: 56, lr: 8.74e-03, grad_scale: 32.0 2023-11-19 05:38:26,062 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=590813.3333333334, ans=0.125 2023-11-19 05:38:30,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=590813.3333333334, ans=0.125 2023-11-19 05:38:56,845 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=591013.3333333334, ans=0.125 2023-11-19 05:39:02,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2023-11-19 05:39:06,188 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4500, loss[loss=0.0643, simple_loss=0.07598, pruned_loss=0.01576, audio_tagging_loss=0.01056, over 15970.00 frames. ], tot_loss[loss=0.09019, simple_loss=0.1085, pruned_loss=0.02558, audio_tagging_loss=0.01037, over 3057266.86 frames. ], batch size: 61, lr: 8.74e-03, grad_scale: 32.0 2023-11-19 05:39:09,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=591080.0, ans=0.2 2023-11-19 05:39:10,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=591080.0, ans=0.125 2023-11-19 05:39:31,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=591213.3333333334, ans=0.0 2023-11-19 05:39:33,306 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.438e+01 9.340e+01 1.045e+02 1.489e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 05:39:37,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=591213.3333333334, ans=0.125 2023-11-19 05:39:58,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=591346.6666666666, ans=0.1 2023-11-19 05:40:01,629 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=591413.3333333334, ans=0.0 2023-11-19 05:40:02,501 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4550, loss[loss=0.08167, simple_loss=0.1023, pruned_loss=0.02064, audio_tagging_loss=0.009856, over 16906.00 frames. ], tot_loss[loss=0.09022, simple_loss=0.1085, pruned_loss=0.02556, audio_tagging_loss=0.0104, over 3058291.78 frames. ], batch size: 62, lr: 8.73e-03, grad_scale: 32.0 2023-11-19 05:40:02,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=591413.3333333334, ans=0.125 2023-11-19 05:40:02,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=591413.3333333334, ans=0.125 2023-11-19 05:40:05,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2023-11-19 05:40:05,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=591413.3333333334, ans=0.125 2023-11-19 05:40:07,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=591413.3333333334, ans=0.035 2023-11-19 05:40:08,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=591413.3333333334, ans=0.125 2023-11-19 05:40:11,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=591413.3333333334, ans=0.2 2023-11-19 05:40:20,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2023-11-19 05:40:25,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=591546.6666666666, ans=0.015 2023-11-19 05:40:44,870 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:40:51,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=591680.0, ans=0.09899494936611666 2023-11-19 05:40:52,507 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:40:58,034 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4600, loss[loss=0.1043, simple_loss=0.1198, pruned_loss=0.0339, audio_tagging_loss=0.01048, over 14988.00 frames. ], tot_loss[loss=0.09038, simple_loss=0.1082, pruned_loss=0.02559, audio_tagging_loss=0.01068, over 3055295.39 frames. ], batch size: 56, lr: 8.73e-03, grad_scale: 16.0 2023-11-19 05:41:08,722 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.43 vs. limit=15.0 2023-11-19 05:41:25,349 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.649e+01 8.473e+01 9.207e+01 1.048e+02 1.617e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 05:41:39,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=591946.6666666666, ans=0.0 2023-11-19 05:41:53,858 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4650, loss[loss=0.1016, simple_loss=0.1171, pruned_loss=0.03453, audio_tagging_loss=0.00857, over 15648.00 frames. ], tot_loss[loss=0.09061, simple_loss=0.1081, pruned_loss=0.02567, audio_tagging_loss=0.01088, over 3051881.64 frames. ], batch size: 60, lr: 8.73e-03, grad_scale: 16.0 2023-11-19 05:41:54,005 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=592080.0, ans=0.125 2023-11-19 05:41:54,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=592080.0, ans=0.0 2023-11-19 05:42:09,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=592146.6666666666, ans=0.09899494936611666 2023-11-19 05:42:16,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.12 vs. limit=15.0 2023-11-19 05:42:21,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=592213.3333333334, ans=0.1 2023-11-19 05:42:39,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=592346.6666666666, ans=0.125 2023-11-19 05:42:41,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=592346.6666666666, ans=0.0 2023-11-19 05:42:48,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=592413.3333333334, ans=0.125 2023-11-19 05:42:49,074 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4700, loss[loss=0.0684, simple_loss=0.08017, pruned_loss=0.01773, audio_tagging_loss=0.01058, over 14245.00 frames. ], tot_loss[loss=0.09036, simple_loss=0.1077, pruned_loss=0.02547, audio_tagging_loss=0.01103, over 3047121.70 frames. ], batch size: 57, lr: 8.73e-03, grad_scale: 16.0 2023-11-19 05:42:56,985 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.07 vs. limit=22.5 2023-11-19 05:43:01,998 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.75 vs. limit=15.0 2023-11-19 05:43:04,772 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.52 vs. limit=15.0 2023-11-19 05:43:17,542 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.009e+01 8.535e+01 9.340e+01 1.049e+02 1.659e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 05:43:20,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=592546.6666666666, ans=0.0 2023-11-19 05:43:20,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=592546.6666666666, ans=0.125 2023-11-19 05:43:39,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=592680.0, ans=0.0 2023-11-19 05:43:42,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=592680.0, ans=0.0 2023-11-19 05:43:45,574 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4750, loss[loss=0.07743, simple_loss=0.09749, pruned_loss=0.01965, audio_tagging_loss=0.009036, over 15780.00 frames. ], tot_loss[loss=0.09006, simple_loss=0.1075, pruned_loss=0.0252, audio_tagging_loss=0.01111, over 3047663.39 frames. ], batch size: 62, lr: 8.72e-03, grad_scale: 16.0 2023-11-19 05:43:57,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=592813.3333333334, ans=0.125 2023-11-19 05:44:41,322 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4800, loss[loss=0.07888, simple_loss=0.09296, pruned_loss=0.02112, audio_tagging_loss=0.01128, over 15257.00 frames. ], tot_loss[loss=0.09003, simple_loss=0.1075, pruned_loss=0.02505, audio_tagging_loss=0.01121, over 3058967.61 frames. ], batch size: 56, lr: 8.72e-03, grad_scale: 32.0 2023-11-19 05:44:46,958 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=593080.0, ans=0.1 2023-11-19 05:44:50,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.60 vs. limit=10.0 2023-11-19 05:45:02,271 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=593213.3333333334, ans=0.2 2023-11-19 05:45:08,996 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.537e+01 9.216e+01 9.797e+01 1.751e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-19 05:45:16,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=593280.0, ans=0.125 2023-11-19 05:45:31,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=593346.6666666666, ans=0.125 2023-11-19 05:45:36,330 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4850, loss[loss=0.09761, simple_loss=0.1167, pruned_loss=0.02765, audio_tagging_loss=0.01161, over 14799.00 frames. ], tot_loss[loss=0.09075, simple_loss=0.1083, pruned_loss=0.0253, audio_tagging_loss=0.01128, over 3058666.41 frames. ], batch size: 54, lr: 8.72e-03, grad_scale: 32.0 2023-11-19 05:45:36,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=593413.3333333334, ans=0.0 2023-11-19 05:45:38,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=593413.3333333334, ans=0.0 2023-11-19 05:46:13,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=593613.3333333334, ans=0.2 2023-11-19 05:46:20,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=593680.0, ans=0.125 2023-11-19 05:46:31,477 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4900, loss[loss=0.08505, simple_loss=0.1028, pruned_loss=0.02256, audio_tagging_loss=0.01109, over 14600.00 frames. ], tot_loss[loss=0.08998, simple_loss=0.1074, pruned_loss=0.02506, audio_tagging_loss=0.01121, over 3055608.55 frames. ], batch size: 54, lr: 8.72e-03, grad_scale: 32.0 2023-11-19 05:46:32,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=593746.6666666666, ans=0.125 2023-11-19 05:46:38,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=593746.6666666666, ans=0.125 2023-11-19 05:46:40,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=593746.6666666666, ans=0.125 2023-11-19 05:46:43,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=593813.3333333334, ans=0.02 2023-11-19 05:46:47,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=593813.3333333334, ans=0.125 2023-11-19 05:46:52,552 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:46:56,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=593880.0, ans=0.0 2023-11-19 05:46:58,717 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.397e+01 9.302e+01 1.001e+02 1.305e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 05:47:02,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=593880.0, ans=0.1 2023-11-19 05:47:08,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=593946.6666666666, ans=0.0 2023-11-19 05:47:26,552 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 4950, loss[loss=0.08316, simple_loss=0.09179, pruned_loss=0.0275, audio_tagging_loss=0.009765, over 15209.00 frames. ], tot_loss[loss=0.08963, simple_loss=0.1068, pruned_loss=0.02519, audio_tagging_loss=0.01104, over 3057457.59 frames. ], batch size: 58, lr: 8.71e-03, grad_scale: 32.0 2023-11-19 05:47:36,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=594146.6666666666, ans=0.125 2023-11-19 05:47:36,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=594146.6666666666, ans=0.1 2023-11-19 05:47:42,521 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=12.0 2023-11-19 05:47:59,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=594280.0, ans=0.2 2023-11-19 05:48:01,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=594280.0, ans=0.02 2023-11-19 05:48:22,028 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5000, loss[loss=0.07703, simple_loss=0.08965, pruned_loss=0.02043, audio_tagging_loss=0.01177, over 15258.00 frames. ], tot_loss[loss=0.08917, simple_loss=0.1063, pruned_loss=0.02511, audio_tagging_loss=0.01091, over 3052418.69 frames. ], batch size: 58, lr: 8.71e-03, grad_scale: 16.0 2023-11-19 05:48:24,312 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=594413.3333333334, ans=0.2 2023-11-19 05:48:34,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=594480.0, ans=0.125 2023-11-19 05:48:34,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=594480.0, ans=0.125 2023-11-19 05:48:49,428 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2023-11-19 05:48:51,117 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.368e+01 8.482e+01 9.400e+01 1.026e+02 1.313e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-19 05:48:52,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=594546.6666666666, ans=0.0 2023-11-19 05:49:03,880 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=12.0 2023-11-19 05:49:17,739 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:49:18,522 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5050, loss[loss=0.08504, simple_loss=0.1123, pruned_loss=0.01969, audio_tagging_loss=0.009203, over 14700.00 frames. ], tot_loss[loss=0.08867, simple_loss=0.1059, pruned_loss=0.02486, audio_tagging_loss=0.01084, over 3042362.68 frames. ], batch size: 56, lr: 8.71e-03, grad_scale: 16.0 2023-11-19 05:49:42,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=594880.0, ans=0.0 2023-11-19 05:49:48,548 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=594880.0, ans=0.0 2023-11-19 05:50:14,332 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5100, loss[loss=0.09247, simple_loss=0.1052, pruned_loss=0.02933, audio_tagging_loss=0.01052, over 15144.00 frames. ], tot_loss[loss=0.08877, simple_loss=0.1066, pruned_loss=0.02478, audio_tagging_loss=0.0107, over 3040615.06 frames. ], batch size: 58, lr: 8.71e-03, grad_scale: 16.0 2023-11-19 05:50:34,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=595146.6666666666, ans=10.0 2023-11-19 05:50:37,351 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.983e-01 2023-11-19 05:50:37,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2023-11-19 05:50:41,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=595213.3333333334, ans=0.0 2023-11-19 05:50:43,359 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.142e+01 8.421e+01 9.048e+01 1.022e+02 1.450e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 05:50:43,917 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.24 vs. limit=22.5 2023-11-19 05:50:45,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=595213.3333333334, ans=0.0 2023-11-19 05:50:54,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=595280.0, ans=0.0 2023-11-19 05:50:59,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=595346.6666666666, ans=0.125 2023-11-19 05:51:07,358 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595346.6666666666, ans=0.1 2023-11-19 05:51:09,263 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5150, loss[loss=0.09821, simple_loss=0.1157, pruned_loss=0.03008, audio_tagging_loss=0.01027, over 14421.00 frames. ], tot_loss[loss=0.08853, simple_loss=0.1063, pruned_loss=0.02462, audio_tagging_loss=0.01074, over 3035140.08 frames. ], batch size: 54, lr: 8.71e-03, grad_scale: 16.0 2023-11-19 05:51:19,476 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=595413.3333333334, ans=0.125 2023-11-19 05:51:25,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=595480.0, ans=0.05 2023-11-19 05:51:31,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=595546.6666666666, ans=0.0 2023-11-19 05:52:05,713 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5200, loss[loss=0.08373, simple_loss=0.1058, pruned_loss=0.02174, audio_tagging_loss=0.009076, over 14661.00 frames. ], tot_loss[loss=0.08921, simple_loss=0.1071, pruned_loss=0.02491, audio_tagging_loss=0.01074, over 3035970.35 frames. ], batch size: 56, lr: 8.70e-03, grad_scale: 32.0 2023-11-19 05:52:23,360 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:52:26,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=595880.0, ans=0.125 2023-11-19 05:52:27,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.12 vs. limit=22.5 2023-11-19 05:52:31,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=595880.0, ans=0.125 2023-11-19 05:52:33,595 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.530e+01 8.322e+01 8.934e+01 9.832e+01 1.211e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-19 05:52:55,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=596013.3333333334, ans=0.125 2023-11-19 05:53:01,450 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5250, loss[loss=0.07893, simple_loss=0.09279, pruned_loss=0.01932, audio_tagging_loss=0.01321, over 14030.00 frames. ], tot_loss[loss=0.08979, simple_loss=0.1078, pruned_loss=0.02522, audio_tagging_loss=0.01066, over 3036270.21 frames. ], batch size: 55, lr: 8.70e-03, grad_scale: 32.0 2023-11-19 05:53:31,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=596213.3333333334, ans=0.125 2023-11-19 05:53:36,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=596280.0, ans=0.125 2023-11-19 05:53:38,518 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:53:56,337 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5300, loss[loss=0.09432, simple_loss=0.1158, pruned_loss=0.02836, audio_tagging_loss=0.008032, over 15587.00 frames. ], tot_loss[loss=0.09018, simple_loss=0.1084, pruned_loss=0.02535, audio_tagging_loss=0.01062, over 3043573.32 frames. ], batch size: 60, lr: 8.70e-03, grad_scale: 16.0 2023-11-19 05:54:10,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=596480.0, ans=0.05 2023-11-19 05:54:11,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=596480.0, ans=10.0 2023-11-19 05:54:15,167 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.29 vs. limit=15.0 2023-11-19 05:54:26,891 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.841e+01 9.894e+01 1.112e+02 1.416e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-19 05:54:27,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=596546.6666666666, ans=0.0 2023-11-19 05:54:29,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=596613.3333333334, ans=0.2 2023-11-19 05:54:46,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=596680.0, ans=0.125 2023-11-19 05:54:52,764 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5350, loss[loss=0.08703, simple_loss=0.08945, pruned_loss=0.02781, audio_tagging_loss=0.01449, over 14531.00 frames. ], tot_loss[loss=0.08989, simple_loss=0.1078, pruned_loss=0.02528, audio_tagging_loss=0.01073, over 3040078.17 frames. ], batch size: 56, lr: 8.70e-03, grad_scale: 16.0 2023-11-19 05:55:00,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=596746.6666666666, ans=0.125 2023-11-19 05:55:10,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2023-11-19 05:55:23,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.47 vs. limit=15.0 2023-11-19 05:55:24,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2023-11-19 05:55:31,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=596946.6666666666, ans=22.5 2023-11-19 05:55:36,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=597013.3333333334, ans=0.125 2023-11-19 05:55:46,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=597013.3333333334, ans=0.125 2023-11-19 05:55:48,439 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5400, loss[loss=0.09919, simple_loss=0.123, pruned_loss=0.03025, audio_tagging_loss=0.00744, over 15442.00 frames. ], tot_loss[loss=0.08967, simple_loss=0.1077, pruned_loss=0.02507, audio_tagging_loss=0.01076, over 3035947.14 frames. ], batch size: 58, lr: 8.69e-03, grad_scale: 16.0 2023-11-19 05:55:49,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.35 vs. limit=15.0 2023-11-19 05:56:01,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=15.0 2023-11-19 05:56:11,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=597213.3333333334, ans=0.0 2023-11-19 05:56:12,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=597213.3333333334, ans=0.125 2023-11-19 05:56:17,853 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.569e+01 9.325e+01 1.031e+02 1.430e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 05:56:18,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.80 vs. limit=22.5 2023-11-19 05:56:25,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=597280.0, ans=0.0 2023-11-19 05:56:32,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=597346.6666666666, ans=0.2 2023-11-19 05:56:38,494 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=597346.6666666666, ans=0.0 2023-11-19 05:56:42,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=597413.3333333334, ans=0.0 2023-11-19 05:56:43,653 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5450, loss[loss=0.08237, simple_loss=0.09391, pruned_loss=0.02078, audio_tagging_loss=0.01463, over 15068.00 frames. ], tot_loss[loss=0.09016, simple_loss=0.1083, pruned_loss=0.02523, audio_tagging_loss=0.01077, over 3037035.90 frames. ], batch size: 57, lr: 8.69e-03, grad_scale: 16.0 2023-11-19 05:57:02,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2023-11-19 05:57:06,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=597546.6666666666, ans=0.125 2023-11-19 05:57:22,286 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=597613.3333333334, ans=15.0 2023-11-19 05:57:35,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.78 vs. limit=10.0 2023-11-19 05:57:39,854 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5500, loss[loss=0.1273, simple_loss=0.1519, pruned_loss=0.04024, audio_tagging_loss=0.01113, over 15260.00 frames. ], tot_loss[loss=0.09067, simple_loss=0.1089, pruned_loss=0.02549, audio_tagging_loss=0.01074, over 3042618.59 frames. ], batch size: 58, lr: 8.69e-03, grad_scale: 16.0 2023-11-19 05:57:49,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=597746.6666666666, ans=0.125 2023-11-19 05:58:09,526 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.477e+01 9.706e+01 1.076e+02 1.326e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-19 05:58:12,207 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.29 vs. limit=6.0 2023-11-19 05:58:16,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=597946.6666666666, ans=0.1 2023-11-19 05:58:30,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=598013.3333333334, ans=0.125 2023-11-19 05:58:35,507 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5550, loss[loss=0.07377, simple_loss=0.08462, pruned_loss=0.0205, audio_tagging_loss=0.01096, over 15082.00 frames. ], tot_loss[loss=0.09029, simple_loss=0.1084, pruned_loss=0.02517, audio_tagging_loss=0.0109, over 3042339.63 frames. ], batch size: 57, lr: 8.69e-03, grad_scale: 16.0 2023-11-19 05:59:06,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=598213.3333333334, ans=0.125 2023-11-19 05:59:08,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=598280.0, ans=0.2 2023-11-19 05:59:30,945 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5600, loss[loss=0.1181, simple_loss=0.1431, pruned_loss=0.03595, audio_tagging_loss=0.01058, over 14605.00 frames. ], tot_loss[loss=0.08991, simple_loss=0.1078, pruned_loss=0.02506, audio_tagging_loss=0.01096, over 3040710.66 frames. ], batch size: 54, lr: 8.68e-03, grad_scale: 16.0 2023-11-19 05:59:35,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=598413.3333333334, ans=0.125 2023-11-19 05:59:44,804 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=598480.0, ans=0.0 2023-11-19 05:59:51,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=598480.0, ans=0.125 2023-11-19 05:59:53,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=598546.6666666666, ans=0.125 2023-11-19 05:59:54,909 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=598546.6666666666, ans=0.0 2023-11-19 06:00:02,416 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.036e+01 8.552e+01 9.221e+01 1.020e+02 1.317e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 06:00:06,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.86 vs. limit=22.5 2023-11-19 06:00:09,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=598613.3333333334, ans=0.1 2023-11-19 06:00:09,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=598613.3333333334, ans=0.125 2023-11-19 06:00:10,778 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:00:24,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=598680.0, ans=0.0 2023-11-19 06:00:27,009 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5650, loss[loss=0.1012, simple_loss=0.1232, pruned_loss=0.03057, audio_tagging_loss=0.008965, over 14781.00 frames. ], tot_loss[loss=0.09019, simple_loss=0.1079, pruned_loss=0.02527, audio_tagging_loss=0.01098, over 3044803.70 frames. ], batch size: 56, lr: 8.68e-03, grad_scale: 16.0 2023-11-19 06:00:39,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=598813.3333333334, ans=0.125 2023-11-19 06:00:40,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=598813.3333333334, ans=0.2 2023-11-19 06:00:44,293 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=598813.3333333334, ans=0.125 2023-11-19 06:00:45,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=598813.3333333334, ans=0.025 2023-11-19 06:00:47,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=598813.3333333334, ans=0.125 2023-11-19 06:01:00,283 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:01:06,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=598946.6666666666, ans=0.2 2023-11-19 06:01:14,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.14 vs. limit=22.5 2023-11-19 06:01:14,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=599013.3333333334, ans=0.125 2023-11-19 06:01:22,500 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5700, loss[loss=0.07224, simple_loss=0.08226, pruned_loss=0.01752, audio_tagging_loss=0.01359, over 15429.00 frames. ], tot_loss[loss=0.09058, simple_loss=0.1082, pruned_loss=0.02555, audio_tagging_loss=0.01091, over 3044835.58 frames. ], batch size: 58, lr: 8.68e-03, grad_scale: 16.0 2023-11-19 06:01:28,515 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=599080.0, ans=0.125 2023-11-19 06:01:48,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=599213.3333333334, ans=0.1 2023-11-19 06:01:49,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=599213.3333333334, ans=0.0 2023-11-19 06:01:53,408 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.633e+01 8.957e+01 9.901e+01 1.097e+02 1.583e+02, threshold=1.980e+02, percent-clipped=0.0 2023-11-19 06:02:01,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=599280.0, ans=0.0 2023-11-19 06:02:17,815 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5750, loss[loss=0.07168, simple_loss=0.08697, pruned_loss=0.01695, audio_tagging_loss=0.01125, over 14781.00 frames. ], tot_loss[loss=0.09071, simple_loss=0.1085, pruned_loss=0.02567, audio_tagging_loss=0.01079, over 3049274.13 frames. ], batch size: 54, lr: 8.68e-03, grad_scale: 16.0 2023-11-19 06:02:42,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=599546.6666666666, ans=0.0 2023-11-19 06:02:43,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=599546.6666666666, ans=0.125 2023-11-19 06:02:49,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=599546.6666666666, ans=0.0 2023-11-19 06:02:50,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=599613.3333333334, ans=0.0 2023-11-19 06:02:57,648 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-19 06:03:13,149 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5800, loss[loss=0.1009, simple_loss=0.1195, pruned_loss=0.03361, audio_tagging_loss=0.007517, over 15564.00 frames. ], tot_loss[loss=0.08956, simple_loss=0.107, pruned_loss=0.02528, audio_tagging_loss=0.01078, over 3048706.21 frames. ], batch size: 57, lr: 8.67e-03, grad_scale: 16.0 2023-11-19 06:03:21,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=599746.6666666666, ans=0.0 2023-11-19 06:03:22,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=599746.6666666666, ans=0.125 2023-11-19 06:03:23,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=599813.3333333334, ans=0.125 2023-11-19 06:03:32,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=599813.3333333334, ans=0.125 2023-11-19 06:03:36,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=599880.0, ans=10.0 2023-11-19 06:03:37,617 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.12 vs. limit=12.0 2023-11-19 06:03:44,336 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.060e+01 9.064e+01 9.956e+01 1.074e+02 1.617e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-19 06:03:47,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=599946.6666666666, ans=0.2 2023-11-19 06:04:09,091 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5850, loss[loss=0.06281, simple_loss=0.06608, pruned_loss=0.01713, audio_tagging_loss=0.01265, over 15507.00 frames. ], tot_loss[loss=0.08922, simple_loss=0.1066, pruned_loss=0.02523, audio_tagging_loss=0.01068, over 3045660.35 frames. ], batch size: 61, lr: 8.67e-03, grad_scale: 16.0 2023-11-19 06:04:24,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=600146.6666666666, ans=0.125 2023-11-19 06:04:31,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2023-11-19 06:04:54,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=600346.6666666666, ans=0.2 2023-11-19 06:04:56,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=600346.6666666666, ans=0.125 2023-11-19 06:05:04,594 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5900, loss[loss=0.08792, simple_loss=0.1149, pruned_loss=0.02003, audio_tagging_loss=0.01045, over 15275.00 frames. ], tot_loss[loss=0.08959, simple_loss=0.1072, pruned_loss=0.02533, audio_tagging_loss=0.01064, over 3048585.92 frames. ], batch size: 55, lr: 8.67e-03, grad_scale: 16.0 2023-11-19 06:05:10,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2023-11-19 06:05:35,553 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 8.269e+01 8.982e+01 9.905e+01 1.254e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 06:05:36,052 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.51 vs. limit=15.0 2023-11-19 06:05:37,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.39 vs. limit=10.0 2023-11-19 06:05:48,084 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=600680.0, ans=0.125 2023-11-19 06:05:51,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=600680.0, ans=0.0 2023-11-19 06:05:59,425 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 5950, loss[loss=0.08029, simple_loss=0.09719, pruned_loss=0.02102, audio_tagging_loss=0.01068, over 15562.00 frames. ], tot_loss[loss=0.0898, simple_loss=0.1075, pruned_loss=0.02538, audio_tagging_loss=0.01068, over 3053844.22 frames. ], batch size: 57, lr: 8.67e-03, grad_scale: 16.0 2023-11-19 06:06:37,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=600946.6666666666, ans=0.0 2023-11-19 06:06:43,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=601013.3333333334, ans=0.2 2023-11-19 06:06:54,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=601080.0, ans=0.125 2023-11-19 06:06:55,537 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6000, loss[loss=0.09362, simple_loss=0.1171, pruned_loss=0.02617, audio_tagging_loss=0.008919, over 15111.00 frames. ], tot_loss[loss=0.09028, simple_loss=0.1083, pruned_loss=0.02553, audio_tagging_loss=0.01061, over 3053876.85 frames. ], batch size: 57, lr: 8.66e-03, grad_scale: 32.0 2023-11-19 06:06:55,537 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 06:07:11,262 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9127, 3.6816, 5.3209, 3.9499], device='cuda:1') 2023-11-19 06:07:28,377 INFO [train_asr.py:1147] (1/4) Epoch 8, validation: loss=0.06748, simple_loss=0.0569, pruned_loss=0.007185, audio_tagging_loss=0.03185, over 4681554.00 frames. 2023-11-19 06:07:28,378 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 06:07:29,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=601080.0, ans=0.0 2023-11-19 06:07:54,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=601213.3333333334, ans=0.0 2023-11-19 06:07:56,165 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=601213.3333333334, ans=0.125 2023-11-19 06:07:59,100 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.682e+01 9.253e+01 9.954e+01 1.321e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-19 06:08:07,521 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:08:20,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=601346.6666666666, ans=0.2 2023-11-19 06:08:23,824 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6050, loss[loss=0.07202, simple_loss=0.07829, pruned_loss=0.01622, audio_tagging_loss=0.01665, over 15225.00 frames. ], tot_loss[loss=0.09003, simple_loss=0.1081, pruned_loss=0.02542, audio_tagging_loss=0.01057, over 3056527.01 frames. ], batch size: 58, lr: 8.66e-03, grad_scale: 16.0 2023-11-19 06:08:27,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=601413.3333333334, ans=0.125 2023-11-19 06:09:06,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=601680.0, ans=0.0 2023-11-19 06:09:06,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=601680.0, ans=0.0 2023-11-19 06:09:12,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=601680.0, ans=0.125 2023-11-19 06:09:18,655 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6100, loss[loss=0.08359, simple_loss=0.09507, pruned_loss=0.0248, audio_tagging_loss=0.01125, over 15327.00 frames. ], tot_loss[loss=0.08946, simple_loss=0.1074, pruned_loss=0.0252, audio_tagging_loss=0.01058, over 3058972.27 frames. ], batch size: 56, lr: 8.66e-03, grad_scale: 16.0 2023-11-19 06:09:50,159 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.701e+01 8.581e+01 9.083e+01 1.023e+02 1.492e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 06:10:12,869 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6150, loss[loss=0.1134, simple_loss=0.1383, pruned_loss=0.03423, audio_tagging_loss=0.01001, over 14798.00 frames. ], tot_loss[loss=0.08982, simple_loss=0.1078, pruned_loss=0.0253, audio_tagging_loss=0.01063, over 3058618.99 frames. ], batch size: 53, lr: 8.66e-03, grad_scale: 16.0 2023-11-19 06:10:22,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.79 vs. limit=10.0 2023-11-19 06:10:47,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=602280.0, ans=0.1 2023-11-19 06:10:57,195 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:11:08,562 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6200, loss[loss=0.1031, simple_loss=0.1131, pruned_loss=0.03344, audio_tagging_loss=0.01315, over 15120.00 frames. ], tot_loss[loss=0.09039, simple_loss=0.1082, pruned_loss=0.02562, audio_tagging_loss=0.01068, over 3058315.81 frames. ], batch size: 55, lr: 8.65e-03, grad_scale: 16.0 2023-11-19 06:11:19,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=602480.0, ans=0.125 2023-11-19 06:11:26,224 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=602480.0, ans=0.125 2023-11-19 06:11:35,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=602546.6666666666, ans=0.0 2023-11-19 06:11:39,729 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.753e+01 8.400e+01 8.989e+01 9.962e+01 1.274e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 06:11:41,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=602613.3333333334, ans=0.125 2023-11-19 06:11:47,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=602613.3333333334, ans=0.125 2023-11-19 06:11:49,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-11-19 06:11:57,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=602680.0, ans=0.2 2023-11-19 06:12:03,527 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6250, loss[loss=0.08596, simple_loss=0.1009, pruned_loss=0.02453, audio_tagging_loss=0.01097, over 15684.00 frames. ], tot_loss[loss=0.09063, simple_loss=0.1083, pruned_loss=0.02571, audio_tagging_loss=0.01079, over 3057489.46 frames. ], batch size: 60, lr: 8.65e-03, grad_scale: 16.0 2023-11-19 06:12:03,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=602746.6666666666, ans=0.125 2023-11-19 06:12:22,138 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2023-11-19 06:12:58,167 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6300, loss[loss=0.07903, simple_loss=0.09419, pruned_loss=0.01975, audio_tagging_loss=0.01218, over 15931.00 frames. ], tot_loss[loss=0.09069, simple_loss=0.1085, pruned_loss=0.02556, audio_tagging_loss=0.01088, over 3050845.19 frames. ], batch size: 61, lr: 8.65e-03, grad_scale: 16.0 2023-11-19 06:12:59,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=603080.0, ans=0.09899494936611666 2023-11-19 06:13:21,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.18 vs. limit=12.0 2023-11-19 06:13:30,513 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 8.462e+01 9.271e+01 1.035e+02 1.313e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-19 06:13:37,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=603280.0, ans=0.0 2023-11-19 06:13:44,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=603346.6666666666, ans=0.0 2023-11-19 06:13:50,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.94 vs. limit=15.0 2023-11-19 06:13:52,814 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6350, loss[loss=0.07987, simple_loss=0.1012, pruned_loss=0.02112, audio_tagging_loss=0.008144, over 15524.00 frames. ], tot_loss[loss=0.09061, simple_loss=0.1084, pruned_loss=0.02548, audio_tagging_loss=0.01094, over 3046494.08 frames. ], batch size: 59, lr: 8.65e-03, grad_scale: 16.0 2023-11-19 06:13:57,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=603413.3333333334, ans=0.125 2023-11-19 06:14:05,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=603480.0, ans=0.125 2023-11-19 06:14:10,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=603480.0, ans=0.1 2023-11-19 06:14:17,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=603546.6666666666, ans=0.0 2023-11-19 06:14:32,694 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=603613.3333333334, ans=0.125 2023-11-19 06:14:34,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=603613.3333333334, ans=0.125 2023-11-19 06:14:37,966 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:14:42,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=603680.0, ans=0.1 2023-11-19 06:14:48,827 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6400, loss[loss=0.07816, simple_loss=0.08408, pruned_loss=0.02332, audio_tagging_loss=0.0128, over 15000.00 frames. ], tot_loss[loss=0.09057, simple_loss=0.1082, pruned_loss=0.02552, audio_tagging_loss=0.01096, over 3043075.01 frames. ], batch size: 57, lr: 8.65e-03, grad_scale: 32.0 2023-11-19 06:14:56,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=603746.6666666666, ans=0.125 2023-11-19 06:14:57,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=603746.6666666666, ans=0.125 2023-11-19 06:15:06,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=603813.3333333334, ans=0.1 2023-11-19 06:15:08,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=603813.3333333334, ans=0.0 2023-11-19 06:15:15,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=603880.0, ans=0.125 2023-11-19 06:15:20,943 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.708e+01 9.476e+01 1.030e+02 1.332e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-19 06:15:28,033 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:15:30,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2023-11-19 06:15:36,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=604013.3333333334, ans=0.125 2023-11-19 06:15:44,367 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6450, loss[loss=0.1058, simple_loss=0.1365, pruned_loss=0.02559, audio_tagging_loss=0.012, over 14484.00 frames. ], tot_loss[loss=0.09064, simple_loss=0.108, pruned_loss=0.02552, audio_tagging_loss=0.0111, over 3038391.70 frames. ], batch size: 53, lr: 8.64e-03, grad_scale: 32.0 2023-11-19 06:16:03,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=604146.6666666666, ans=0.0 2023-11-19 06:16:30,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=604346.6666666666, ans=0.0 2023-11-19 06:16:39,328 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6500, loss[loss=0.09889, simple_loss=0.1161, pruned_loss=0.03192, audio_tagging_loss=0.008911, over 15044.00 frames. ], tot_loss[loss=0.08991, simple_loss=0.1073, pruned_loss=0.02525, audio_tagging_loss=0.01099, over 3041171.20 frames. ], batch size: 55, lr: 8.64e-03, grad_scale: 32.0 2023-11-19 06:16:53,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=604480.0, ans=0.125 2023-11-19 06:16:54,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=604480.0, ans=0.125 2023-11-19 06:17:04,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=604546.6666666666, ans=0.125 2023-11-19 06:17:11,945 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.719e+01 8.554e+01 9.296e+01 1.013e+02 1.424e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 06:17:15,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=604613.3333333334, ans=0.0 2023-11-19 06:17:21,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=604613.3333333334, ans=0.0 2023-11-19 06:17:23,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=604680.0, ans=0.125 2023-11-19 06:17:29,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=604680.0, ans=0.025 2023-11-19 06:17:35,738 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6550, loss[loss=0.09078, simple_loss=0.1175, pruned_loss=0.02342, audio_tagging_loss=0.008608, over 16036.00 frames. ], tot_loss[loss=0.09003, simple_loss=0.1079, pruned_loss=0.02533, audio_tagging_loss=0.01074, over 3045857.04 frames. ], batch size: 58, lr: 8.64e-03, grad_scale: 32.0 2023-11-19 06:17:38,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=604746.6666666666, ans=0.125 2023-11-19 06:17:41,769 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=604746.6666666666, ans=0.125 2023-11-19 06:17:58,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=604880.0, ans=0.125 2023-11-19 06:18:03,240 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2023-11-19 06:18:14,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=604946.6666666666, ans=0.035 2023-11-19 06:18:19,830 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.51 vs. limit=22.5 2023-11-19 06:18:20,845 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.12 vs. limit=12.0 2023-11-19 06:18:31,329 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6600, loss[loss=0.07977, simple_loss=0.09978, pruned_loss=0.01935, audio_tagging_loss=0.01053, over 15497.00 frames. ], tot_loss[loss=0.08957, simple_loss=0.1072, pruned_loss=0.02525, audio_tagging_loss=0.01075, over 3046826.30 frames. ], batch size: 56, lr: 8.64e-03, grad_scale: 32.0 2023-11-19 06:18:32,621 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=605080.0, ans=0.0 2023-11-19 06:18:45,091 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=605146.6666666666, ans=0.125 2023-11-19 06:18:47,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=605146.6666666666, ans=0.125 2023-11-19 06:18:51,591 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=605146.6666666666, ans=0.125 2023-11-19 06:18:54,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=605213.3333333334, ans=0.0 2023-11-19 06:18:55,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=605213.3333333334, ans=0.125 2023-11-19 06:19:03,867 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.408e+01 8.547e+01 9.371e+01 1.021e+02 1.350e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 06:19:26,466 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6650, loss[loss=0.07512, simple_loss=0.09357, pruned_loss=0.01769, audio_tagging_loss=0.01065, over 14501.00 frames. ], tot_loss[loss=0.08909, simple_loss=0.1068, pruned_loss=0.02497, audio_tagging_loss=0.0107, over 3039963.63 frames. ], batch size: 55, lr: 8.63e-03, grad_scale: 32.0 2023-11-19 06:20:00,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=605613.3333333334, ans=0.0 2023-11-19 06:20:05,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=605613.3333333334, ans=0.125 2023-11-19 06:20:10,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.07 vs. limit=15.0 2023-11-19 06:20:21,089 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2023-11-19 06:20:22,602 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6700, loss[loss=0.07378, simple_loss=0.08198, pruned_loss=0.02023, audio_tagging_loss=0.01257, over 14751.00 frames. ], tot_loss[loss=0.08969, simple_loss=0.1077, pruned_loss=0.02521, audio_tagging_loss=0.01063, over 3045500.00 frames. ], batch size: 57, lr: 8.63e-03, grad_scale: 32.0 2023-11-19 06:20:53,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=605880.0, ans=0.2 2023-11-19 06:20:53,952 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.187e+01 8.900e+01 9.907e+01 1.762e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 06:20:57,508 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=605946.6666666666, ans=0.0 2023-11-19 06:20:59,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=605946.6666666666, ans=0.2 2023-11-19 06:21:01,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=605946.6666666666, ans=0.1 2023-11-19 06:21:08,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=606013.3333333334, ans=0.1 2023-11-19 06:21:14,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=606013.3333333334, ans=0.2 2023-11-19 06:21:18,299 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6750, loss[loss=0.07738, simple_loss=0.08059, pruned_loss=0.02435, audio_tagging_loss=0.01274, over 14770.00 frames. ], tot_loss[loss=0.08923, simple_loss=0.1073, pruned_loss=0.02498, audio_tagging_loss=0.01061, over 3037773.52 frames. ], batch size: 57, lr: 8.63e-03, grad_scale: 32.0 2023-11-19 06:21:18,864 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2023-11-19 06:21:19,552 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=606080.0, ans=0.125 2023-11-19 06:21:21,692 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=606080.0, ans=0.125 2023-11-19 06:21:22,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.85 vs. limit=15.0 2023-11-19 06:21:24,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=606080.0, ans=0.125 2023-11-19 06:21:50,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=606280.0, ans=0.2 2023-11-19 06:22:02,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=606346.6666666666, ans=0.125 2023-11-19 06:22:02,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=606346.6666666666, ans=0.1 2023-11-19 06:22:11,426 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=606346.6666666666, ans=0.0 2023-11-19 06:22:13,363 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6800, loss[loss=0.1047, simple_loss=0.1362, pruned_loss=0.02845, audio_tagging_loss=0.008153, over 15545.00 frames. ], tot_loss[loss=0.08877, simple_loss=0.1068, pruned_loss=0.02485, audio_tagging_loss=0.01054, over 3031227.86 frames. ], batch size: 57, lr: 8.63e-03, grad_scale: 32.0 2023-11-19 06:22:19,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=606413.3333333334, ans=0.125 2023-11-19 06:22:43,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.68 vs. limit=15.0 2023-11-19 06:22:45,965 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.152e+01 8.455e+01 9.265e+01 1.067e+02 1.623e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-19 06:22:56,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=606613.3333333334, ans=0.125 2023-11-19 06:22:59,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=606680.0, ans=0.1 2023-11-19 06:23:00,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=606680.0, ans=0.0 2023-11-19 06:23:09,234 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6850, loss[loss=0.06002, simple_loss=0.07843, pruned_loss=0.01175, audio_tagging_loss=0.00906, over 14835.00 frames. ], tot_loss[loss=0.08818, simple_loss=0.1061, pruned_loss=0.02456, audio_tagging_loss=0.01058, over 3032736.78 frames. ], batch size: 56, lr: 8.62e-03, grad_scale: 32.0 2023-11-19 06:23:16,415 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0 2023-11-19 06:23:29,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=606813.3333333334, ans=0.0 2023-11-19 06:23:44,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=606946.6666666666, ans=0.0 2023-11-19 06:24:04,762 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6900, loss[loss=0.1071, simple_loss=0.1365, pruned_loss=0.0294, audio_tagging_loss=0.009481, over 14975.00 frames. ], tot_loss[loss=0.08875, simple_loss=0.1067, pruned_loss=0.02477, audio_tagging_loss=0.01062, over 3032997.54 frames. ], batch size: 55, lr: 8.62e-03, grad_scale: 32.0 2023-11-19 06:24:16,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=607146.6666666666, ans=0.1 2023-11-19 06:24:37,156 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.334e+01 9.172e+01 9.913e+01 1.941e+02, threshold=1.834e+02, percent-clipped=1.0 2023-11-19 06:24:48,210 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:25:00,439 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 6950, loss[loss=0.09183, simple_loss=0.1146, pruned_loss=0.02488, audio_tagging_loss=0.009666, over 15972.00 frames. ], tot_loss[loss=0.08912, simple_loss=0.1074, pruned_loss=0.02483, audio_tagging_loss=0.01058, over 3039653.52 frames. ], batch size: 59, lr: 8.62e-03, grad_scale: 32.0 2023-11-19 06:25:01,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=607413.3333333334, ans=0.2 2023-11-19 06:25:13,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=607480.0, ans=0.1 2023-11-19 06:25:21,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=607546.6666666666, ans=0.0 2023-11-19 06:25:29,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=607546.6666666666, ans=0.125 2023-11-19 06:25:30,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=607546.6666666666, ans=0.125 2023-11-19 06:25:39,937 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=607613.3333333334, ans=0.125 2023-11-19 06:25:51,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=607680.0, ans=0.1 2023-11-19 06:25:51,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.82 vs. limit=15.0 2023-11-19 06:25:53,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=607680.0, ans=0.2 2023-11-19 06:25:56,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=607746.6666666666, ans=0.0 2023-11-19 06:25:56,730 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7000, loss[loss=0.08443, simple_loss=0.09284, pruned_loss=0.02604, audio_tagging_loss=0.01197, over 15351.00 frames. ], tot_loss[loss=0.08843, simple_loss=0.1058, pruned_loss=0.02471, audio_tagging_loss=0.01082, over 3035509.22 frames. ], batch size: 58, lr: 8.62e-03, grad_scale: 32.0 2023-11-19 06:26:09,001 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=607813.3333333334, ans=0.0 2023-11-19 06:26:20,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=607880.0, ans=0.125 2023-11-19 06:26:28,331 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.831e+01 8.487e+01 9.225e+01 1.011e+02 1.458e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 06:26:43,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=608013.3333333334, ans=0.1 2023-11-19 06:26:45,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=608013.3333333334, ans=0.0 2023-11-19 06:26:52,397 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7050, loss[loss=0.07438, simple_loss=0.0888, pruned_loss=0.01704, audio_tagging_loss=0.01295, over 16736.00 frames. ], tot_loss[loss=0.08845, simple_loss=0.106, pruned_loss=0.02465, audio_tagging_loss=0.01082, over 3039695.51 frames. ], batch size: 62, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:27:05,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=608146.6666666666, ans=0.0 2023-11-19 06:27:06,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=608146.6666666666, ans=0.2 2023-11-19 06:27:12,316 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:27:18,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=608213.3333333334, ans=0.125 2023-11-19 06:27:28,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=608280.0, ans=0.125 2023-11-19 06:27:32,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=608280.0, ans=0.2 2023-11-19 06:27:32,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=608280.0, ans=6.0 2023-11-19 06:27:35,047 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:27:46,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=608346.6666666666, ans=0.0 2023-11-19 06:27:48,154 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7100, loss[loss=0.08084, simple_loss=0.09078, pruned_loss=0.02363, audio_tagging_loss=0.01182, over 14447.00 frames. ], tot_loss[loss=0.08927, simple_loss=0.1069, pruned_loss=0.02503, audio_tagging_loss=0.01081, over 3048642.28 frames. ], batch size: 57, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:27:52,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=608413.3333333334, ans=0.0 2023-11-19 06:27:53,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=608413.3333333334, ans=0.125 2023-11-19 06:27:57,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=608480.0, ans=0.125 2023-11-19 06:28:05,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-19 06:28:19,657 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 9.012e+01 9.917e+01 1.109e+02 1.355e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-19 06:28:34,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=608680.0, ans=0.0 2023-11-19 06:28:43,571 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7150, loss[loss=0.07203, simple_loss=0.08735, pruned_loss=0.01595, audio_tagging_loss=0.01241, over 15397.00 frames. ], tot_loss[loss=0.08984, simple_loss=0.108, pruned_loss=0.02516, audio_tagging_loss=0.01069, over 3055169.84 frames. ], batch size: 58, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:29:00,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=608813.3333333334, ans=0.0 2023-11-19 06:29:09,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.58 vs. limit=22.5 2023-11-19 06:29:33,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=609013.3333333334, ans=0.0 2023-11-19 06:29:36,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=609013.3333333334, ans=0.5 2023-11-19 06:29:38,977 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7200, loss[loss=0.09999, simple_loss=0.1198, pruned_loss=0.02927, audio_tagging_loss=0.01083, over 16134.00 frames. ], tot_loss[loss=0.08967, simple_loss=0.1078, pruned_loss=0.02508, audio_tagging_loss=0.0107, over 3057240.37 frames. ], batch size: 61, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:30:10,691 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.882e+01 8.596e+01 9.352e+01 1.022e+02 1.385e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-19 06:30:22,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=609346.6666666666, ans=0.125 2023-11-19 06:30:33,336 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7250, loss[loss=0.06185, simple_loss=0.07341, pruned_loss=0.0118, audio_tagging_loss=0.01334, over 15292.00 frames. ], tot_loss[loss=0.08958, simple_loss=0.1073, pruned_loss=0.02503, audio_tagging_loss=0.0109, over 3053795.24 frames. ], batch size: 58, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:30:50,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=609480.0, ans=0.125 2023-11-19 06:30:52,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=609480.0, ans=0.0 2023-11-19 06:30:52,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=609480.0, ans=0.125 2023-11-19 06:30:53,340 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=6.0 2023-11-19 06:30:56,205 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.46 vs. limit=15.0 2023-11-19 06:31:09,063 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=609613.3333333334, ans=0.125 2023-11-19 06:31:28,340 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7300, loss[loss=0.08155, simple_loss=0.09315, pruned_loss=0.02128, audio_tagging_loss=0.0137, over 14087.00 frames. ], tot_loss[loss=0.09025, simple_loss=0.1083, pruned_loss=0.02534, audio_tagging_loss=0.01077, over 3052825.18 frames. ], batch size: 54, lr: 8.60e-03, grad_scale: 32.0 2023-11-19 06:31:38,921 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.45 vs. limit=15.0 2023-11-19 06:31:50,747 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=609880.0, ans=0.2 2023-11-19 06:31:59,847 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.576e+01 9.670e+01 1.044e+02 1.433e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-19 06:32:06,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=609946.6666666666, ans=0.0 2023-11-19 06:32:06,889 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=609946.6666666666, ans=0.125 2023-11-19 06:32:11,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=610013.3333333334, ans=0.125 2023-11-19 06:32:11,884 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2023-11-19 06:32:23,119 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7350, loss[loss=0.09081, simple_loss=0.1095, pruned_loss=0.02546, audio_tagging_loss=0.0106, over 15096.00 frames. ], tot_loss[loss=0.09008, simple_loss=0.1082, pruned_loss=0.02525, audio_tagging_loss=0.01073, over 3053749.55 frames. ], batch size: 59, lr: 8.60e-03, grad_scale: 32.0 2023-11-19 06:32:30,527 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2023-11-19 06:32:35,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=610146.6666666666, ans=0.025 2023-11-19 06:32:38,635 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=610146.6666666666, ans=0.07 2023-11-19 06:32:57,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=610280.0, ans=0.125 2023-11-19 06:33:09,317 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=610346.6666666666, ans=0.2 2023-11-19 06:33:14,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=610346.6666666666, ans=0.125 2023-11-19 06:33:18,533 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7400, loss[loss=0.07283, simple_loss=0.08715, pruned_loss=0.01794, audio_tagging_loss=0.01132, over 15488.00 frames. ], tot_loss[loss=0.08953, simple_loss=0.1077, pruned_loss=0.02505, audio_tagging_loss=0.01065, over 3056923.22 frames. ], batch size: 58, lr: 8.60e-03, grad_scale: 32.0 2023-11-19 06:33:41,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.91 vs. limit=12.0 2023-11-19 06:33:45,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=610546.6666666666, ans=0.125 2023-11-19 06:33:51,303 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.147e+01 8.574e+01 9.523e+01 1.112e+02 1.475e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-19 06:33:54,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=610613.3333333334, ans=0.125 2023-11-19 06:33:55,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=610613.3333333334, ans=0.0 2023-11-19 06:34:12,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=610680.0, ans=0.0 2023-11-19 06:34:14,352 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7450, loss[loss=0.118, simple_loss=0.1406, pruned_loss=0.03813, audio_tagging_loss=0.009558, over 14740.00 frames. ], tot_loss[loss=0.08943, simple_loss=0.1074, pruned_loss=0.02507, audio_tagging_loss=0.01065, over 3050436.89 frames. ], batch size: 55, lr: 8.60e-03, grad_scale: 32.0 2023-11-19 06:34:18,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=610746.6666666666, ans=0.125 2023-11-19 06:34:19,277 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=22.5 2023-11-19 06:34:23,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=610746.6666666666, ans=0.0 2023-11-19 06:34:36,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=610880.0, ans=0.025 2023-11-19 06:34:46,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=610946.6666666666, ans=0.125 2023-11-19 06:34:50,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=610946.6666666666, ans=0.0 2023-11-19 06:35:06,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=611013.3333333334, ans=0.125 2023-11-19 06:35:09,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=611080.0, ans=0.125 2023-11-19 06:35:10,321 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7500, loss[loss=0.1228, simple_loss=0.1499, pruned_loss=0.03693, audio_tagging_loss=0.0109, over 14565.00 frames. ], tot_loss[loss=0.08965, simple_loss=0.1075, pruned_loss=0.02528, audio_tagging_loss=0.01061, over 3051732.68 frames. ], batch size: 53, lr: 8.59e-03, grad_scale: 32.0 2023-11-19 06:35:11,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.25 vs. limit=10.0 2023-11-19 06:35:23,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=611146.6666666666, ans=0.025 2023-11-19 06:35:37,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=611213.3333333334, ans=0.125 2023-11-19 06:35:42,588 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 8.409e+01 9.431e+01 1.041e+02 1.502e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-19 06:35:52,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=611280.0, ans=0.0 2023-11-19 06:35:53,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=611346.6666666666, ans=0.0 2023-11-19 06:36:04,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=611413.3333333334, ans=0.125 2023-11-19 06:36:04,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=611413.3333333334, ans=0.04949747468305833 2023-11-19 06:36:05,187 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7550, loss[loss=0.09414, simple_loss=0.1125, pruned_loss=0.02817, audio_tagging_loss=0.009715, over 16480.00 frames. ], tot_loss[loss=0.08895, simple_loss=0.1069, pruned_loss=0.02488, audio_tagging_loss=0.01061, over 3054899.73 frames. ], batch size: 58, lr: 8.59e-03, grad_scale: 32.0 2023-11-19 06:36:08,529 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=611413.3333333334, ans=0.07 2023-11-19 06:36:20,499 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2023-11-19 06:36:30,938 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-19 06:36:44,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611613.3333333334, ans=0.1 2023-11-19 06:36:58,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=611746.6666666666, ans=0.125 2023-11-19 06:36:59,379 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7600, loss[loss=0.08556, simple_loss=0.1082, pruned_loss=0.02356, audio_tagging_loss=0.007882, over 16728.00 frames. ], tot_loss[loss=0.08861, simple_loss=0.1066, pruned_loss=0.02473, audio_tagging_loss=0.01056, over 3050642.66 frames. ], batch size: 62, lr: 8.59e-03, grad_scale: 32.0 2023-11-19 06:37:02,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.91 vs. limit=10.0 2023-11-19 06:37:10,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=611813.3333333334, ans=0.125 2023-11-19 06:37:14,873 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:37:32,019 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.740e+01 8.282e+01 9.217e+01 9.907e+01 1.227e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-19 06:37:36,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=611946.6666666666, ans=0.05 2023-11-19 06:37:56,174 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7650, loss[loss=0.06128, simple_loss=0.06915, pruned_loss=0.0143, audio_tagging_loss=0.0124, over 14562.00 frames. ], tot_loss[loss=0.08881, simple_loss=0.107, pruned_loss=0.02471, audio_tagging_loss=0.0106, over 3054692.99 frames. ], batch size: 58, lr: 8.59e-03, grad_scale: 32.0 2023-11-19 06:38:16,901 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=612213.3333333334, ans=0.05 2023-11-19 06:38:22,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=612213.3333333334, ans=0.0 2023-11-19 06:38:31,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-11-19 06:38:47,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=612346.6666666666, ans=0.125 2023-11-19 06:38:51,528 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7700, loss[loss=0.08514, simple_loss=0.09639, pruned_loss=0.02457, audio_tagging_loss=0.01239, over 14929.00 frames. ], tot_loss[loss=0.08987, simple_loss=0.1084, pruned_loss=0.02524, audio_tagging_loss=0.01045, over 3048789.00 frames. ], batch size: 59, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:38:51,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=612413.3333333334, ans=0.2 2023-11-19 06:38:52,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=612413.3333333334, ans=0.125 2023-11-19 06:39:19,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=612546.6666666666, ans=0.1 2023-11-19 06:39:21,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=612546.6666666666, ans=0.125 2023-11-19 06:39:23,617 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.913e+01 9.609e+01 1.128e+02 1.741e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-19 06:39:46,133 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7750, loss[loss=0.0833, simple_loss=0.1025, pruned_loss=0.02097, audio_tagging_loss=0.01106, over 15534.00 frames. ], tot_loss[loss=0.09021, simple_loss=0.1087, pruned_loss=0.02536, audio_tagging_loss=0.01049, over 3054621.66 frames. ], batch size: 58, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:39:52,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=612746.6666666666, ans=0.0 2023-11-19 06:39:56,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=612813.3333333334, ans=0.125 2023-11-19 06:40:42,169 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7800, loss[loss=0.0765, simple_loss=0.09976, pruned_loss=0.01798, audio_tagging_loss=0.008635, over 14236.00 frames. ], tot_loss[loss=0.08932, simple_loss=0.1076, pruned_loss=0.02494, audio_tagging_loss=0.01059, over 3046404.52 frames. ], batch size: 55, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:40:52,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=613146.6666666666, ans=0.125 2023-11-19 06:41:03,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=613213.3333333334, ans=0.1 2023-11-19 06:41:13,884 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.392e+01 9.222e+01 1.047e+02 1.457e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 06:41:22,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=613280.0, ans=0.125 2023-11-19 06:41:35,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=613346.6666666666, ans=0.1 2023-11-19 06:41:39,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.26 vs. limit=15.0 2023-11-19 06:41:40,454 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7850, loss[loss=0.09604, simple_loss=0.121, pruned_loss=0.02572, audio_tagging_loss=0.009809, over 15225.00 frames. ], tot_loss[loss=0.08959, simple_loss=0.108, pruned_loss=0.02501, audio_tagging_loss=0.01058, over 3050554.10 frames. ], batch size: 55, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:42:19,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=613613.3333333334, ans=0.0 2023-11-19 06:42:35,061 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7900, loss[loss=0.08135, simple_loss=0.09826, pruned_loss=0.02189, audio_tagging_loss=0.01033, over 14684.00 frames. ], tot_loss[loss=0.09033, simple_loss=0.1088, pruned_loss=0.02524, audio_tagging_loss=0.01069, over 3051387.06 frames. ], batch size: 56, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:42:55,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=613813.3333333334, ans=0.04949747468305833 2023-11-19 06:42:59,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=613880.0, ans=0.0 2023-11-19 06:43:02,146 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.80 vs. limit=15.0 2023-11-19 06:43:07,836 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.644e+01 9.372e+01 1.085e+02 1.414e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 06:43:31,086 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 7950, loss[loss=0.0768, simple_loss=0.08864, pruned_loss=0.02106, audio_tagging_loss=0.01142, over 15390.00 frames. ], tot_loss[loss=0.08925, simple_loss=0.1072, pruned_loss=0.02484, audio_tagging_loss=0.01083, over 3048767.62 frames. ], batch size: 59, lr: 8.57e-03, grad_scale: 32.0 2023-11-19 06:43:38,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=614080.0, ans=0.125 2023-11-19 06:43:44,720 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:43:45,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=614146.6666666666, ans=0.125 2023-11-19 06:43:51,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=614146.6666666666, ans=0.125 2023-11-19 06:44:02,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=614280.0, ans=0.1 2023-11-19 06:44:09,171 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=614280.0, ans=0.125 2023-11-19 06:44:26,292 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8000, loss[loss=0.09596, simple_loss=0.1106, pruned_loss=0.02808, audio_tagging_loss=0.01256, over 15363.00 frames. ], tot_loss[loss=0.0895, simple_loss=0.1071, pruned_loss=0.02501, audio_tagging_loss=0.01092, over 3051772.25 frames. ], batch size: 59, lr: 8.57e-03, grad_scale: 32.0 2023-11-19 06:44:32,835 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:44:35,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2023-11-19 06:44:57,957 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.896e+01 9.629e+01 1.081e+02 1.400e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-19 06:45:13,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=614680.0, ans=0.0 2023-11-19 06:45:15,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.84 vs. limit=22.5 2023-11-19 06:45:20,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=614746.6666666666, ans=0.125 2023-11-19 06:45:21,591 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8050, loss[loss=0.07232, simple_loss=0.08237, pruned_loss=0.01985, audio_tagging_loss=0.01129, over 14831.00 frames. ], tot_loss[loss=0.08949, simple_loss=0.107, pruned_loss=0.02503, audio_tagging_loss=0.01095, over 3052719.82 frames. ], batch size: 55, lr: 8.57e-03, grad_scale: 64.0 2023-11-19 06:45:26,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2023-11-19 06:45:40,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=614813.3333333334, ans=0.0 2023-11-19 06:45:42,342 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=12.0 2023-11-19 06:45:53,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=614880.0, ans=0.0 2023-11-19 06:45:55,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=614946.6666666666, ans=0.125 2023-11-19 06:46:17,957 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8100, loss[loss=0.05767, simple_loss=0.0663, pruned_loss=0.01206, audio_tagging_loss=0.01246, over 14597.00 frames. ], tot_loss[loss=0.0901, simple_loss=0.1078, pruned_loss=0.02526, audio_tagging_loss=0.01093, over 3047121.03 frames. ], batch size: 55, lr: 8.57e-03, grad_scale: 64.0 2023-11-19 06:46:25,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2023-11-19 06:46:34,133 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=615146.6666666666, ans=0.125 2023-11-19 06:46:49,784 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.425e+01 9.284e+01 1.022e+02 1.413e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-19 06:46:59,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2023-11-19 06:47:13,504 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8150, loss[loss=0.07562, simple_loss=0.08851, pruned_loss=0.0198, audio_tagging_loss=0.01156, over 15478.00 frames. ], tot_loss[loss=0.09032, simple_loss=0.1082, pruned_loss=0.02545, audio_tagging_loss=0.01075, over 3047861.00 frames. ], batch size: 60, lr: 8.56e-03, grad_scale: 64.0 2023-11-19 06:47:41,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=615546.6666666666, ans=0.0 2023-11-19 06:48:08,938 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8200, loss[loss=0.08537, simple_loss=0.1001, pruned_loss=0.02701, audio_tagging_loss=0.008331, over 15419.00 frames. ], tot_loss[loss=0.09038, simple_loss=0.1087, pruned_loss=0.02551, audio_tagging_loss=0.0105, over 3052089.08 frames. ], batch size: 60, lr: 8.56e-03, grad_scale: 64.0 2023-11-19 06:48:08,985 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:48:32,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=615880.0, ans=0.2 2023-11-19 06:48:35,907 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=615880.0, ans=0.125 2023-11-19 06:48:38,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=615880.0, ans=0.09899494936611666 2023-11-19 06:48:41,184 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.380e+01 9.339e+01 1.044e+02 1.327e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 06:48:52,600 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=616013.3333333334, ans=0.2 2023-11-19 06:48:54,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=616013.3333333334, ans=0.125 2023-11-19 06:48:56,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=616013.3333333334, ans=0.0 2023-11-19 06:48:57,878 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:48:57,963 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:49:02,334 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=616013.3333333334, ans=10.0 2023-11-19 06:49:05,268 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8250, loss[loss=0.06939, simple_loss=0.08013, pruned_loss=0.01659, audio_tagging_loss=0.01274, over 15408.00 frames. ], tot_loss[loss=0.09015, simple_loss=0.1085, pruned_loss=0.02537, audio_tagging_loss=0.01052, over 3051386.67 frames. ], batch size: 61, lr: 8.56e-03, grad_scale: 64.0 2023-11-19 06:49:08,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=616080.0, ans=0.07 2023-11-19 06:49:14,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=616146.6666666666, ans=0.125 2023-11-19 06:49:30,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=616213.3333333334, ans=0.0 2023-11-19 06:49:41,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=616280.0, ans=0.1 2023-11-19 06:50:00,758 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8300, loss[loss=0.1113, simple_loss=0.1427, pruned_loss=0.03176, audio_tagging_loss=0.00822, over 15796.00 frames. ], tot_loss[loss=0.09079, simple_loss=0.1092, pruned_loss=0.02559, audio_tagging_loss=0.0106, over 3050371.55 frames. ], batch size: 56, lr: 8.56e-03, grad_scale: 32.0 2023-11-19 06:50:19,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=616480.0, ans=0.125 2023-11-19 06:50:34,215 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.747e+01 9.487e+01 1.050e+02 1.506e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-19 06:50:36,875 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.94 vs. limit=22.5 2023-11-19 06:50:56,412 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8350, loss[loss=0.08676, simple_loss=0.1085, pruned_loss=0.02214, audio_tagging_loss=0.01035, over 14678.00 frames. ], tot_loss[loss=0.08993, simple_loss=0.1082, pruned_loss=0.02525, audio_tagging_loss=0.01058, over 3046140.50 frames. ], batch size: 55, lr: 8.55e-03, grad_scale: 32.0 2023-11-19 06:50:59,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=616746.6666666666, ans=0.0 2023-11-19 06:51:12,763 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.83 vs. limit=10.0 2023-11-19 06:51:14,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=616813.3333333334, ans=0.125 2023-11-19 06:51:28,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=616946.6666666666, ans=0.125 2023-11-19 06:51:40,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=617013.3333333334, ans=0.2 2023-11-19 06:51:47,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=617013.3333333334, ans=0.2 2023-11-19 06:51:51,368 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8400, loss[loss=0.08073, simple_loss=0.09413, pruned_loss=0.02497, audio_tagging_loss=0.008695, over 16135.00 frames. ], tot_loss[loss=0.08979, simple_loss=0.1083, pruned_loss=0.02511, audio_tagging_loss=0.01053, over 3049578.60 frames. ], batch size: 62, lr: 8.55e-03, grad_scale: 32.0 2023-11-19 06:51:51,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=617080.0, ans=0.125 2023-11-19 06:51:51,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2023-11-19 06:52:08,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=617146.6666666666, ans=0.125 2023-11-19 06:52:14,627 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2023-11-19 06:52:25,115 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.578e+01 9.429e+01 1.025e+02 1.342e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-19 06:52:47,679 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8450, loss[loss=0.09149, simple_loss=0.1126, pruned_loss=0.02694, audio_tagging_loss=0.008258, over 15469.00 frames. ], tot_loss[loss=0.08951, simple_loss=0.1081, pruned_loss=0.02498, audio_tagging_loss=0.01047, over 3040898.69 frames. ], batch size: 60, lr: 8.55e-03, grad_scale: 32.0 2023-11-19 06:53:06,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=617480.0, ans=0.125 2023-11-19 06:53:34,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=617680.0, ans=0.0 2023-11-19 06:53:43,103 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8500, loss[loss=0.09314, simple_loss=0.1174, pruned_loss=0.02596, audio_tagging_loss=0.008501, over 15100.00 frames. ], tot_loss[loss=0.09042, simple_loss=0.1093, pruned_loss=0.0254, audio_tagging_loss=0.01038, over 3043949.66 frames. ], batch size: 55, lr: 8.55e-03, grad_scale: 32.0 2023-11-19 06:53:44,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=617746.6666666666, ans=0.025 2023-11-19 06:53:45,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=617746.6666666666, ans=0.125 2023-11-19 06:54:16,818 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.808e+01 8.976e+01 1.039e+02 1.170e+02 1.800e+02, threshold=2.077e+02, percent-clipped=0.0 2023-11-19 06:54:17,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2023-11-19 06:54:19,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=617946.6666666666, ans=0.0 2023-11-19 06:54:22,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=617946.6666666666, ans=0.125 2023-11-19 06:54:32,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=618013.3333333334, ans=0.0 2023-11-19 06:54:38,559 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8550, loss[loss=0.06356, simple_loss=0.06418, pruned_loss=0.02008, audio_tagging_loss=0.0114, over 15416.00 frames. ], tot_loss[loss=0.09126, simple_loss=0.1104, pruned_loss=0.02572, audio_tagging_loss=0.01033, over 3045805.48 frames. ], batch size: 61, lr: 8.55e-03, grad_scale: 16.0 2023-11-19 06:54:39,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=618080.0, ans=0.2 2023-11-19 06:54:50,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=618146.6666666666, ans=0.125 2023-11-19 06:54:54,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=618146.6666666666, ans=0.1 2023-11-19 06:54:58,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=618146.6666666666, ans=0.0 2023-11-19 06:55:14,739 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2023-11-19 06:55:15,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=618280.0, ans=0.0 2023-11-19 06:55:22,093 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2023-11-19 06:55:31,160 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=618346.6666666666, ans=0.0 2023-11-19 06:55:34,196 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8600, loss[loss=0.08986, simple_loss=0.1078, pruned_loss=0.02771, audio_tagging_loss=0.008247, over 16424.00 frames. ], tot_loss[loss=0.09045, simple_loss=0.109, pruned_loss=0.02544, audio_tagging_loss=0.01049, over 3052654.82 frames. ], batch size: 60, lr: 8.54e-03, grad_scale: 16.0 2023-11-19 06:56:09,131 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.491e+01 8.546e+01 9.397e+01 1.054e+02 1.390e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 06:56:10,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=618613.3333333334, ans=0.0 2023-11-19 06:56:15,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=618613.3333333334, ans=0.0 2023-11-19 06:56:18,253 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=618680.0, ans=0.125 2023-11-19 06:56:30,046 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8650, loss[loss=0.1275, simple_loss=0.1657, pruned_loss=0.03817, audio_tagging_loss=0.00649, over 15485.00 frames. ], tot_loss[loss=0.09065, simple_loss=0.1091, pruned_loss=0.02551, audio_tagging_loss=0.01056, over 3041882.35 frames. ], batch size: 55, lr: 8.54e-03, grad_scale: 16.0 2023-11-19 06:56:39,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=618813.3333333334, ans=0.0 2023-11-19 06:56:51,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=618880.0, ans=0.125 2023-11-19 06:57:24,839 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8700, loss[loss=0.1058, simple_loss=0.1315, pruned_loss=0.0292, audio_tagging_loss=0.01089, over 14917.00 frames. ], tot_loss[loss=0.09108, simple_loss=0.1097, pruned_loss=0.02568, audio_tagging_loss=0.01057, over 3047724.63 frames. ], batch size: 55, lr: 8.54e-03, grad_scale: 16.0 2023-11-19 06:57:28,743 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=619080.0, ans=0.1 2023-11-19 06:57:33,291 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2023-11-19 06:58:00,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 9.176e+01 9.937e+01 1.111e+02 1.947e+02, threshold=1.987e+02, percent-clipped=1.0 2023-11-19 06:58:02,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=619280.0, ans=0.125 2023-11-19 06:58:20,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=619413.3333333334, ans=0.1 2023-11-19 06:58:21,803 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8750, loss[loss=0.08006, simple_loss=0.09216, pruned_loss=0.02436, audio_tagging_loss=0.009626, over 14991.00 frames. ], tot_loss[loss=0.09181, simple_loss=0.1106, pruned_loss=0.0259, audio_tagging_loss=0.0106, over 3038204.56 frames. ], batch size: 56, lr: 8.54e-03, grad_scale: 16.0 2023-11-19 06:58:38,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.74 vs. limit=15.0 2023-11-19 06:58:50,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=619546.6666666666, ans=0.0 2023-11-19 06:59:02,770 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2023-11-19 06:59:08,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=619680.0, ans=15.0 2023-11-19 06:59:16,523 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8800, loss[loss=0.1262, simple_loss=0.1504, pruned_loss=0.04267, audio_tagging_loss=0.008356, over 14949.00 frames. ], tot_loss[loss=0.09237, simple_loss=0.1114, pruned_loss=0.0261, audio_tagging_loss=0.01059, over 3038484.39 frames. ], batch size: 55, lr: 8.53e-03, grad_scale: 32.0 2023-11-19 06:59:23,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=619746.6666666666, ans=8.0 2023-11-19 06:59:37,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=619880.0, ans=0.05 2023-11-19 06:59:50,693 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 8.376e+01 8.906e+01 9.929e+01 1.528e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-19 06:59:50,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=619946.6666666666, ans=0.0 2023-11-19 07:00:11,477 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8850, loss[loss=0.08463, simple_loss=0.1117, pruned_loss=0.01937, audio_tagging_loss=0.009416, over 15708.00 frames. ], tot_loss[loss=0.09068, simple_loss=0.1092, pruned_loss=0.02535, audio_tagging_loss=0.01075, over 3036527.29 frames. ], batch size: 59, lr: 8.53e-03, grad_scale: 32.0 2023-11-19 07:00:23,164 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:00:25,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=620146.6666666666, ans=0.2 2023-11-19 07:00:28,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.93 vs. limit=22.5 2023-11-19 07:00:35,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=620213.3333333334, ans=0.125 2023-11-19 07:00:36,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=620213.3333333334, ans=0.09899494936611666 2023-11-19 07:00:51,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=620280.0, ans=0.125 2023-11-19 07:00:56,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=620346.6666666666, ans=0.0 2023-11-19 07:00:59,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=620346.6666666666, ans=0.0 2023-11-19 07:01:07,559 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8900, loss[loss=0.08636, simple_loss=0.1056, pruned_loss=0.02434, audio_tagging_loss=0.009198, over 15615.00 frames. ], tot_loss[loss=0.09093, simple_loss=0.1097, pruned_loss=0.02552, audio_tagging_loss=0.01053, over 3032611.69 frames. ], batch size: 59, lr: 8.53e-03, grad_scale: 32.0 2023-11-19 07:01:09,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=620413.3333333334, ans=0.1 2023-11-19 07:01:10,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=620413.3333333334, ans=0.025 2023-11-19 07:01:30,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=620546.6666666666, ans=0.1 2023-11-19 07:01:40,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=620613.3333333334, ans=0.125 2023-11-19 07:01:42,268 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 8.332e+01 9.247e+01 1.033e+02 1.504e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-19 07:01:42,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=620613.3333333334, ans=0.0 2023-11-19 07:01:58,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=620680.0, ans=0.125 2023-11-19 07:02:02,941 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 8950, loss[loss=0.1207, simple_loss=0.1495, pruned_loss=0.03623, audio_tagging_loss=0.009778, over 17472.00 frames. ], tot_loss[loss=0.09131, simple_loss=0.11, pruned_loss=0.0258, audio_tagging_loss=0.0105, over 3039788.75 frames. ], batch size: 60, lr: 8.53e-03, grad_scale: 16.0 2023-11-19 07:02:08,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=620746.6666666666, ans=0.0 2023-11-19 07:02:16,948 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=620813.3333333334, ans=0.125 2023-11-19 07:02:27,702 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.21 vs. limit=10.0 2023-11-19 07:02:34,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.51 vs. limit=15.0 2023-11-19 07:02:39,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=620946.6666666666, ans=0.125 2023-11-19 07:02:57,849 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9000, loss[loss=0.1118, simple_loss=0.144, pruned_loss=0.03079, audio_tagging_loss=0.008958, over 15188.00 frames. ], tot_loss[loss=0.09246, simple_loss=0.1116, pruned_loss=0.02627, audio_tagging_loss=0.01038, over 3044869.73 frames. ], batch size: 56, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:02:57,850 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 07:03:28,106 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5178, 2.8174, 3.6935, 3.0310], device='cuda:1') 2023-11-19 07:03:30,607 INFO [train_asr.py:1147] (1/4) Epoch 8, validation: loss=0.06719, simple_loss=0.05665, pruned_loss=0.006997, audio_tagging_loss=0.03186, over 4681554.00 frames. 2023-11-19 07:03:30,607 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 07:03:40,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=621146.6666666666, ans=0.1 2023-11-19 07:03:44,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=621146.6666666666, ans=0.07 2023-11-19 07:03:47,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=621146.6666666666, ans=0.125 2023-11-19 07:03:58,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=621213.3333333334, ans=0.1 2023-11-19 07:04:03,240 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=621280.0, ans=0.125 2023-11-19 07:04:05,133 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.568e+01 9.180e+01 1.028e+02 1.650e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-19 07:04:18,296 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=621346.6666666666, ans=0.125 2023-11-19 07:04:25,999 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9050, loss[loss=0.06853, simple_loss=0.08456, pruned_loss=0.01616, audio_tagging_loss=0.01009, over 14983.00 frames. ], tot_loss[loss=0.09161, simple_loss=0.1106, pruned_loss=0.02588, audio_tagging_loss=0.01045, over 3050671.28 frames. ], batch size: 56, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:04:34,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=621413.3333333334, ans=0.015 2023-11-19 07:04:45,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=621480.0, ans=0.1 2023-11-19 07:04:54,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=621546.6666666666, ans=0.125 2023-11-19 07:05:01,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=621613.3333333334, ans=0.125 2023-11-19 07:05:15,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=621680.0, ans=0.1 2023-11-19 07:05:16,749 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.66 vs. limit=15.0 2023-11-19 07:05:20,480 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9100, loss[loss=0.1218, simple_loss=0.1528, pruned_loss=0.03737, audio_tagging_loss=0.008021, over 15856.00 frames. ], tot_loss[loss=0.09066, simple_loss=0.1094, pruned_loss=0.02554, audio_tagging_loss=0.01044, over 3048234.57 frames. ], batch size: 56, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:05:21,774 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=621746.6666666666, ans=0.125 2023-11-19 07:05:24,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=621746.6666666666, ans=0.125 2023-11-19 07:05:24,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=621746.6666666666, ans=0.125 2023-11-19 07:05:32,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=621813.3333333334, ans=0.0 2023-11-19 07:05:33,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=621813.3333333334, ans=0.0 2023-11-19 07:05:56,146 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.531e+01 9.413e+01 1.050e+02 2.515e+02, threshold=1.883e+02, percent-clipped=1.0 2023-11-19 07:06:16,308 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9150, loss[loss=0.1042, simple_loss=0.1282, pruned_loss=0.02747, audio_tagging_loss=0.01261, over 15478.00 frames. ], tot_loss[loss=0.09094, simple_loss=0.1096, pruned_loss=0.02571, audio_tagging_loss=0.01043, over 3048042.95 frames. ], batch size: 58, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:06:17,925 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.04 vs. limit=22.5 2023-11-19 07:06:28,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=622146.6666666666, ans=0.125 2023-11-19 07:06:42,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=622213.3333333334, ans=0.125 2023-11-19 07:06:46,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=622213.3333333334, ans=0.0 2023-11-19 07:06:52,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=622280.0, ans=10.0 2023-11-19 07:06:58,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=622280.0, ans=0.125 2023-11-19 07:07:06,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=622346.6666666666, ans=0.125 2023-11-19 07:07:12,189 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9200, loss[loss=0.05538, simple_loss=0.06819, pruned_loss=0.01113, audio_tagging_loss=0.01016, over 13620.00 frames. ], tot_loss[loss=0.08998, simple_loss=0.1083, pruned_loss=0.02536, audio_tagging_loss=0.01048, over 3043875.52 frames. ], batch size: 54, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:07:19,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=622413.3333333334, ans=0.125 2023-11-19 07:07:22,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=622480.0, ans=15.0 2023-11-19 07:07:36,608 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-19 07:07:48,137 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.097e+01 8.490e+01 9.151e+01 1.009e+02 3.492e+02, threshold=1.830e+02, percent-clipped=1.0 2023-11-19 07:08:06,981 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9250, loss[loss=0.07968, simple_loss=0.09604, pruned_loss=0.02168, audio_tagging_loss=0.009978, over 16212.00 frames. ], tot_loss[loss=0.08984, simple_loss=0.1079, pruned_loss=0.02539, audio_tagging_loss=0.01052, over 3051017.42 frames. ], batch size: 58, lr: 8.51e-03, grad_scale: 16.0 2023-11-19 07:08:32,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.90 vs. limit=6.0 2023-11-19 07:08:36,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=622880.0, ans=0.0 2023-11-19 07:08:39,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=622880.0, ans=0.125 2023-11-19 07:08:41,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2023-11-19 07:08:45,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=622946.6666666666, ans=0.125 2023-11-19 07:08:52,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=623013.3333333334, ans=0.125 2023-11-19 07:09:03,093 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9300, loss[loss=0.08708, simple_loss=0.1026, pruned_loss=0.02461, audio_tagging_loss=0.01117, over 15375.00 frames. ], tot_loss[loss=0.0895, simple_loss=0.1078, pruned_loss=0.02499, audio_tagging_loss=0.01063, over 3048883.11 frames. ], batch size: 56, lr: 8.51e-03, grad_scale: 16.0 2023-11-19 07:09:16,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=623146.6666666666, ans=0.0 2023-11-19 07:09:18,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=623146.6666666666, ans=0.05 2023-11-19 07:09:38,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=623280.0, ans=0.2 2023-11-19 07:09:39,349 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.509e+01 9.283e+01 1.013e+02 1.341e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-19 07:09:44,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=623280.0, ans=0.125 2023-11-19 07:09:52,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=623346.6666666666, ans=0.125 2023-11-19 07:09:58,417 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9350, loss[loss=0.09485, simple_loss=0.1062, pruned_loss=0.02626, audio_tagging_loss=0.01548, over 15110.00 frames. ], tot_loss[loss=0.08917, simple_loss=0.1073, pruned_loss=0.02486, audio_tagging_loss=0.01068, over 3046408.08 frames. ], batch size: 56, lr: 8.51e-03, grad_scale: 16.0 2023-11-19 07:10:32,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=623613.3333333334, ans=0.2 2023-11-19 07:10:38,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=623613.3333333334, ans=0.0 2023-11-19 07:10:44,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=623680.0, ans=0.1 2023-11-19 07:10:51,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=623680.0, ans=0.125 2023-11-19 07:10:54,091 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9400, loss[loss=0.1015, simple_loss=0.1214, pruned_loss=0.03019, audio_tagging_loss=0.01064, over 15517.00 frames. ], tot_loss[loss=0.09034, simple_loss=0.1087, pruned_loss=0.02532, audio_tagging_loss=0.01068, over 3051585.81 frames. ], batch size: 57, lr: 8.51e-03, grad_scale: 16.0 2023-11-19 07:11:12,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=623813.3333333334, ans=0.0 2023-11-19 07:11:12,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=623813.3333333334, ans=0.125 2023-11-19 07:11:31,162 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 8.638e+01 9.433e+01 1.071e+02 1.581e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-19 07:11:40,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=624013.3333333334, ans=0.0 2023-11-19 07:11:48,783 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:11:49,850 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9450, loss[loss=0.09068, simple_loss=0.1085, pruned_loss=0.02539, audio_tagging_loss=0.01104, over 15387.00 frames. ], tot_loss[loss=0.09106, simple_loss=0.1093, pruned_loss=0.02562, audio_tagging_loss=0.01078, over 3053135.47 frames. ], batch size: 60, lr: 8.50e-03, grad_scale: 16.0 2023-11-19 07:12:01,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=624146.6666666666, ans=0.125 2023-11-19 07:12:08,040 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:12:18,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=624213.3333333334, ans=0.1 2023-11-19 07:12:23,790 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.64 vs. limit=12.0 2023-11-19 07:12:32,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=624280.0, ans=0.125 2023-11-19 07:12:39,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=624346.6666666666, ans=0.125 2023-11-19 07:12:45,965 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9500, loss[loss=0.1036, simple_loss=0.1329, pruned_loss=0.02623, audio_tagging_loss=0.01096, over 15889.00 frames. ], tot_loss[loss=0.09039, simple_loss=0.1082, pruned_loss=0.02532, audio_tagging_loss=0.01099, over 3049182.94 frames. ], batch size: 57, lr: 8.50e-03, grad_scale: 16.0 2023-11-19 07:12:57,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=624480.0, ans=0.125 2023-11-19 07:13:14,630 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-19 07:13:22,387 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.807e+01 8.457e+01 9.140e+01 9.820e+01 1.196e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 07:13:25,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=624613.3333333334, ans=0.04949747468305833 2023-11-19 07:13:38,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=624680.0, ans=0.125 2023-11-19 07:13:41,568 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9550, loss[loss=0.1046, simple_loss=0.1247, pruned_loss=0.03152, audio_tagging_loss=0.01076, over 15739.00 frames. ], tot_loss[loss=0.0907, simple_loss=0.1083, pruned_loss=0.02552, audio_tagging_loss=0.01103, over 3047307.88 frames. ], batch size: 59, lr: 8.50e-03, grad_scale: 16.0 2023-11-19 07:13:43,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=624746.6666666666, ans=0.125 2023-11-19 07:13:55,059 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=624813.3333333334, ans=0.0 2023-11-19 07:13:59,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=624813.3333333334, ans=0.0 2023-11-19 07:14:01,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=624813.3333333334, ans=0.0 2023-11-19 07:14:04,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=624880.0, ans=0.1 2023-11-19 07:14:09,951 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=22.5 2023-11-19 07:14:11,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=624880.0, ans=0.125 2023-11-19 07:14:24,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=624946.6666666666, ans=0.125 2023-11-19 07:14:37,083 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9600, loss[loss=0.0928, simple_loss=0.1063, pruned_loss=0.02584, audio_tagging_loss=0.01381, over 14388.00 frames. ], tot_loss[loss=0.09074, simple_loss=0.1084, pruned_loss=0.02552, audio_tagging_loss=0.011, over 3044893.52 frames. ], batch size: 54, lr: 8.50e-03, grad_scale: 32.0 2023-11-19 07:15:13,356 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.461e+01 9.298e+01 1.020e+02 1.547e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 07:15:14,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625280.0, ans=0.1 2023-11-19 07:15:33,197 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9650, loss[loss=0.106, simple_loss=0.1213, pruned_loss=0.03372, audio_tagging_loss=0.01165, over 13591.00 frames. ], tot_loss[loss=0.09077, simple_loss=0.1088, pruned_loss=0.0255, audio_tagging_loss=0.01085, over 3046518.66 frames. ], batch size: 55, lr: 8.50e-03, grad_scale: 32.0 2023-11-19 07:15:35,836 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.21 vs. limit=15.0 2023-11-19 07:15:57,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=625546.6666666666, ans=0.125 2023-11-19 07:16:00,062 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=15.0 2023-11-19 07:16:06,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=625613.3333333334, ans=0.125 2023-11-19 07:16:13,121 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=625613.3333333334, ans=0.125 2023-11-19 07:16:18,122 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=15.0 2023-11-19 07:16:28,188 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9700, loss[loss=0.09364, simple_loss=0.1195, pruned_loss=0.02515, audio_tagging_loss=0.008722, over 15054.00 frames. ], tot_loss[loss=0.09075, simple_loss=0.109, pruned_loss=0.0256, audio_tagging_loss=0.01066, over 3048116.79 frames. ], batch size: 55, lr: 8.49e-03, grad_scale: 32.0 2023-11-19 07:16:44,233 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=625813.3333333334, ans=0.125 2023-11-19 07:16:45,992 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=625813.3333333334, ans=0.125 2023-11-19 07:17:05,231 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.422e+01 9.037e+01 9.716e+01 1.315e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 07:17:12,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=626013.3333333334, ans=0.1 2023-11-19 07:17:24,173 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9750, loss[loss=0.08488, simple_loss=0.1006, pruned_loss=0.02517, audio_tagging_loss=0.009427, over 15506.00 frames. ], tot_loss[loss=0.091, simple_loss=0.1093, pruned_loss=0.02571, audio_tagging_loss=0.01064, over 3053074.52 frames. ], batch size: 59, lr: 8.49e-03, grad_scale: 32.0 2023-11-19 07:17:25,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=626080.0, ans=0.1 2023-11-19 07:17:27,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=626080.0, ans=0.05 2023-11-19 07:17:29,128 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=626080.0, ans=0.0 2023-11-19 07:17:30,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=626080.0, ans=0.07 2023-11-19 07:17:37,660 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=626146.6666666666, ans=0.125 2023-11-19 07:17:40,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=626146.6666666666, ans=0.125 2023-11-19 07:17:59,680 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.30 vs. limit=10.0 2023-11-19 07:18:00,957 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2023-11-19 07:18:15,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.40 vs. limit=15.0 2023-11-19 07:18:15,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=626346.6666666666, ans=0.2 2023-11-19 07:18:19,722 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9800, loss[loss=0.133, simple_loss=0.1724, pruned_loss=0.03892, audio_tagging_loss=0.007875, over 15090.00 frames. ], tot_loss[loss=0.09127, simple_loss=0.1099, pruned_loss=0.02581, audio_tagging_loss=0.01052, over 3046985.14 frames. ], batch size: 53, lr: 8.49e-03, grad_scale: 32.0 2023-11-19 07:18:32,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=626480.0, ans=0.2 2023-11-19 07:18:48,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=626546.6666666666, ans=0.1 2023-11-19 07:18:56,500 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.157e+01 8.941e+01 9.770e+01 1.328e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-19 07:19:03,376 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=626680.0, ans=0.125 2023-11-19 07:19:06,458 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2023-11-19 07:19:10,037 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:19:12,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=626680.0, ans=0.05 2023-11-19 07:19:15,288 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9850, loss[loss=0.07245, simple_loss=0.09235, pruned_loss=0.01781, audio_tagging_loss=0.008468, over 14771.00 frames. ], tot_loss[loss=0.09184, simple_loss=0.1106, pruned_loss=0.02607, audio_tagging_loss=0.01049, over 3047657.07 frames. ], batch size: 57, lr: 8.49e-03, grad_scale: 32.0 2023-11-19 07:19:38,622 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=22.5 2023-11-19 07:19:45,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=626880.0, ans=0.125 2023-11-19 07:19:48,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=626946.6666666666, ans=0.125 2023-11-19 07:19:59,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=627013.3333333334, ans=0.125 2023-11-19 07:20:09,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=627080.0, ans=0.0 2023-11-19 07:20:10,741 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9900, loss[loss=0.08614, simple_loss=0.1007, pruned_loss=0.02521, audio_tagging_loss=0.01057, over 15530.00 frames. ], tot_loss[loss=0.09162, simple_loss=0.1103, pruned_loss=0.02599, audio_tagging_loss=0.01046, over 3047610.39 frames. ], batch size: 58, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:20:38,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=627213.3333333334, ans=0.125 2023-11-19 07:20:40,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.84 vs. limit=15.0 2023-11-19 07:20:40,923 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=627213.3333333334, ans=0.2 2023-11-19 07:20:47,071 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.674e+01 9.203e+01 1.082e+02 1.582e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 07:20:56,920 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:21:06,725 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 9950, loss[loss=0.08359, simple_loss=0.1077, pruned_loss=0.01883, audio_tagging_loss=0.0109, over 13886.00 frames. ], tot_loss[loss=0.09179, simple_loss=0.1106, pruned_loss=0.02598, audio_tagging_loss=0.01052, over 3048415.33 frames. ], batch size: 52, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:21:19,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.67 vs. limit=12.0 2023-11-19 07:21:28,767 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=15.0 2023-11-19 07:21:36,517 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.194e-01 2023-11-19 07:21:40,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=627613.3333333334, ans=0.2 2023-11-19 07:22:00,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=627680.0, ans=0.125 2023-11-19 07:22:02,120 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10000, loss[loss=0.06375, simple_loss=0.07437, pruned_loss=0.01606, audio_tagging_loss=0.0105, over 14495.00 frames. ], tot_loss[loss=0.09062, simple_loss=0.109, pruned_loss=0.02556, audio_tagging_loss=0.01058, over 3043930.64 frames. ], batch size: 56, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:22:31,429 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=627880.0, ans=0.125 2023-11-19 07:22:38,986 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.402e+01 8.658e+01 9.575e+01 1.064e+02 1.480e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-19 07:22:46,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=628013.3333333334, ans=0.025 2023-11-19 07:22:57,037 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10050, loss[loss=0.07724, simple_loss=0.09582, pruned_loss=0.0204, audio_tagging_loss=0.008933, over 15054.00 frames. ], tot_loss[loss=0.09052, simple_loss=0.1088, pruned_loss=0.02561, audio_tagging_loss=0.01051, over 3042613.66 frames. ], batch size: 56, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:23:10,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.32 vs. limit=22.5 2023-11-19 07:23:11,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=628146.6666666666, ans=0.125 2023-11-19 07:23:11,946 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.05 vs. limit=10.0 2023-11-19 07:23:45,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=628346.6666666666, ans=0.0 2023-11-19 07:23:45,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=628346.6666666666, ans=0.1 2023-11-19 07:23:53,592 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10100, loss[loss=0.09563, simple_loss=0.1111, pruned_loss=0.02917, audio_tagging_loss=0.01093, over 14618.00 frames. ], tot_loss[loss=0.09031, simple_loss=0.1088, pruned_loss=0.02532, audio_tagging_loss=0.01061, over 3047211.85 frames. ], batch size: 54, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:23:55,027 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=22.5 2023-11-19 07:24:01,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=628413.3333333334, ans=0.09899494936611666 2023-11-19 07:24:03,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=628480.0, ans=0.0 2023-11-19 07:24:29,378 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.846e+01 9.478e+01 1.084e+02 1.850e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-19 07:24:37,886 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:24:39,157 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=628680.0, ans=0.125 2023-11-19 07:24:45,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=628680.0, ans=0.0 2023-11-19 07:24:48,922 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10150, loss[loss=0.1035, simple_loss=0.1271, pruned_loss=0.03037, audio_tagging_loss=0.009564, over 15141.00 frames. ], tot_loss[loss=0.09011, simple_loss=0.1084, pruned_loss=0.02528, audio_tagging_loss=0.01063, over 3053622.44 frames. ], batch size: 57, lr: 8.47e-03, grad_scale: 32.0 2023-11-19 07:24:52,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=628746.6666666666, ans=0.2 2023-11-19 07:24:58,649 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=628813.3333333334, ans=0.125 2023-11-19 07:25:00,108 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.53 vs. limit=6.0 2023-11-19 07:25:15,292 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:25:15,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=628880.0, ans=0.125 2023-11-19 07:25:22,721 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=15.0 2023-11-19 07:25:43,824 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10200, loss[loss=0.09684, simple_loss=0.1033, pruned_loss=0.03308, audio_tagging_loss=0.01212, over 14743.00 frames. ], tot_loss[loss=0.09135, simple_loss=0.1098, pruned_loss=0.02569, audio_tagging_loss=0.01076, over 3056453.70 frames. ], batch size: 55, lr: 8.47e-03, grad_scale: 32.0 2023-11-19 07:25:45,191 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=629080.0, ans=0.125 2023-11-19 07:25:45,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=629080.0, ans=0.1 2023-11-19 07:25:51,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=629080.0, ans=0.125 2023-11-19 07:25:56,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=629146.6666666666, ans=0.2 2023-11-19 07:26:05,517 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:26:19,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=629280.0, ans=0.0 2023-11-19 07:26:20,961 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.630e+01 9.623e+01 1.074e+02 1.731e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-19 07:26:26,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=629280.0, ans=0.0 2023-11-19 07:26:34,955 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2023-11-19 07:26:38,196 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=629346.6666666666, ans=0.0 2023-11-19 07:26:40,088 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10250, loss[loss=0.07955, simple_loss=0.09628, pruned_loss=0.01811, audio_tagging_loss=0.0133, over 15550.00 frames. ], tot_loss[loss=0.09154, simple_loss=0.1103, pruned_loss=0.02565, audio_tagging_loss=0.01075, over 3058079.49 frames. ], batch size: 58, lr: 8.47e-03, grad_scale: 32.0 2023-11-19 07:27:27,974 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:27:32,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.63 vs. limit=15.0 2023-11-19 07:27:36,293 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10300, loss[loss=0.07468, simple_loss=0.08957, pruned_loss=0.02097, audio_tagging_loss=0.008928, over 14932.00 frames. ], tot_loss[loss=0.09087, simple_loss=0.1092, pruned_loss=0.02543, audio_tagging_loss=0.01084, over 3057765.15 frames. ], batch size: 57, lr: 8.47e-03, grad_scale: 32.0 2023-11-19 07:27:48,525 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.53 vs. limit=6.0 2023-11-19 07:28:11,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=629946.6666666666, ans=0.125 2023-11-19 07:28:12,260 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.771e+01 9.491e+01 1.012e+02 1.579e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-19 07:28:30,770 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10350, loss[loss=0.1067, simple_loss=0.1361, pruned_loss=0.03061, audio_tagging_loss=0.00809, over 15823.00 frames. ], tot_loss[loss=0.09042, simple_loss=0.1087, pruned_loss=0.02517, audio_tagging_loss=0.01088, over 3059185.17 frames. ], batch size: 56, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:28:34,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=630080.0, ans=0.125 2023-11-19 07:29:03,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=630280.0, ans=0.125 2023-11-19 07:29:09,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=630280.0, ans=0.2 2023-11-19 07:29:26,685 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10400, loss[loss=0.08044, simple_loss=0.09351, pruned_loss=0.02258, audio_tagging_loss=0.0111, over 14756.00 frames. ], tot_loss[loss=0.08979, simple_loss=0.108, pruned_loss=0.0249, audio_tagging_loss=0.01088, over 3055284.28 frames. ], batch size: 57, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:29:33,464 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.07 vs. limit=10.0 2023-11-19 07:29:45,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=630480.0, ans=0.0 2023-11-19 07:29:57,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=630546.6666666666, ans=0.125 2023-11-19 07:30:03,014 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.887e+01 8.595e+01 9.141e+01 1.035e+02 1.375e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 07:30:06,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=630613.3333333334, ans=0.125 2023-11-19 07:30:16,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=630680.0, ans=0.125 2023-11-19 07:30:16,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=630680.0, ans=0.0 2023-11-19 07:30:22,397 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10450, loss[loss=0.07421, simple_loss=0.08958, pruned_loss=0.01791, audio_tagging_loss=0.01151, over 15339.00 frames. ], tot_loss[loss=0.08846, simple_loss=0.1063, pruned_loss=0.02435, audio_tagging_loss=0.01096, over 3059810.66 frames. ], batch size: 59, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:30:39,054 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.30 vs. limit=6.0 2023-11-19 07:30:39,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2023-11-19 07:30:43,069 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=630880.0, ans=0.0 2023-11-19 07:30:50,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=630880.0, ans=0.1 2023-11-19 07:31:11,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=631013.3333333334, ans=0.2 2023-11-19 07:31:11,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=631013.3333333334, ans=0.0 2023-11-19 07:31:13,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=631013.3333333334, ans=0.0 2023-11-19 07:31:15,798 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=631013.3333333334, ans=0.125 2023-11-19 07:31:17,690 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10500, loss[loss=0.1044, simple_loss=0.147, pruned_loss=0.02609, audio_tagging_loss=0.004815, over 15236.00 frames. ], tot_loss[loss=0.08823, simple_loss=0.1062, pruned_loss=0.02425, audio_tagging_loss=0.01087, over 3052758.91 frames. ], batch size: 55, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:31:20,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=631080.0, ans=0.2 2023-11-19 07:31:25,311 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=631080.0, ans=0.5 2023-11-19 07:31:38,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=631146.6666666666, ans=0.025 2023-11-19 07:31:41,743 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.48 vs. limit=12.0 2023-11-19 07:31:44,257 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.75 vs. limit=22.5 2023-11-19 07:31:44,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=631213.3333333334, ans=0.0 2023-11-19 07:31:54,646 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.423e+01 8.554e+01 9.380e+01 1.032e+02 1.223e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-19 07:31:55,884 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=631280.0, ans=0.2 2023-11-19 07:32:05,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=631346.6666666666, ans=0.125 2023-11-19 07:32:12,284 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=631413.3333333334, ans=0.125 2023-11-19 07:32:13,138 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10550, loss[loss=0.07018, simple_loss=0.08216, pruned_loss=0.01985, audio_tagging_loss=0.009251, over 14732.00 frames. ], tot_loss[loss=0.08787, simple_loss=0.1062, pruned_loss=0.02403, audio_tagging_loss=0.01076, over 3042924.31 frames. ], batch size: 57, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:32:17,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=631413.3333333334, ans=0.125 2023-11-19 07:32:31,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=631480.0, ans=0.04949747468305833 2023-11-19 07:32:40,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=631546.6666666666, ans=0.125 2023-11-19 07:32:45,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=631613.3333333334, ans=0.0 2023-11-19 07:33:09,270 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10600, loss[loss=0.09775, simple_loss=0.1165, pruned_loss=0.02928, audio_tagging_loss=0.01024, over 15807.00 frames. ], tot_loss[loss=0.08901, simple_loss=0.1078, pruned_loss=0.0246, audio_tagging_loss=0.01053, over 3043339.14 frames. ], batch size: 56, lr: 8.45e-03, grad_scale: 32.0 2023-11-19 07:33:22,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=631813.3333333334, ans=0.125 2023-11-19 07:33:24,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=631813.3333333334, ans=0.0 2023-11-19 07:33:25,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=631813.3333333334, ans=0.0 2023-11-19 07:33:45,594 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.034e+01 8.589e+01 9.112e+01 9.990e+01 1.319e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-19 07:34:01,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=632013.3333333334, ans=0.1 2023-11-19 07:34:03,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.88 vs. limit=15.0 2023-11-19 07:34:04,983 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10650, loss[loss=0.09897, simple_loss=0.124, pruned_loss=0.02733, audio_tagging_loss=0.009661, over 13388.00 frames. ], tot_loss[loss=0.08913, simple_loss=0.108, pruned_loss=0.02467, audio_tagging_loss=0.01044, over 3043526.24 frames. ], batch size: 53, lr: 8.45e-03, grad_scale: 32.0 2023-11-19 07:34:05,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=632080.0, ans=0.125 2023-11-19 07:34:09,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=632080.0, ans=0.07 2023-11-19 07:34:12,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=632080.0, ans=0.09899494936611666 2023-11-19 07:34:14,078 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.09 vs. limit=22.5 2023-11-19 07:34:21,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=632146.6666666666, ans=0.95 2023-11-19 07:34:33,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2023-11-19 07:34:48,960 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=15.0 2023-11-19 07:34:50,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=632346.6666666666, ans=0.0 2023-11-19 07:34:51,894 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=632346.6666666666, ans=0.0 2023-11-19 07:35:00,539 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10700, loss[loss=0.08827, simple_loss=0.1091, pruned_loss=0.02163, audio_tagging_loss=0.01211, over 15263.00 frames. ], tot_loss[loss=0.08921, simple_loss=0.108, pruned_loss=0.02481, audio_tagging_loss=0.01041, over 3044812.96 frames. ], batch size: 57, lr: 8.45e-03, grad_scale: 32.0 2023-11-19 07:35:01,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=632413.3333333334, ans=0.0 2023-11-19 07:35:05,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=632413.3333333334, ans=0.0 2023-11-19 07:35:12,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=632480.0, ans=0.125 2023-11-19 07:35:18,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=12.0 2023-11-19 07:35:37,032 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.376e+01 8.469e+01 9.057e+01 9.771e+01 1.264e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 07:35:56,136 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10750, loss[loss=0.1219, simple_loss=0.1455, pruned_loss=0.03965, audio_tagging_loss=0.009464, over 15103.00 frames. ], tot_loss[loss=0.08915, simple_loss=0.1075, pruned_loss=0.02491, audio_tagging_loss=0.01046, over 3046904.89 frames. ], batch size: 55, lr: 8.45e-03, grad_scale: 32.0 2023-11-19 07:36:15,338 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=632813.3333333334, ans=0.0 2023-11-19 07:36:20,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=632880.0, ans=0.2 2023-11-19 07:36:21,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=632880.0, ans=0.125 2023-11-19 07:36:35,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=632946.6666666666, ans=0.0 2023-11-19 07:36:42,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=633013.3333333334, ans=0.125 2023-11-19 07:36:47,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=633013.3333333334, ans=0.04949747468305833 2023-11-19 07:36:51,454 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10800, loss[loss=0.09788, simple_loss=0.114, pruned_loss=0.03083, audio_tagging_loss=0.01005, over 16290.00 frames. ], tot_loss[loss=0.09024, simple_loss=0.109, pruned_loss=0.02525, audio_tagging_loss=0.01048, over 3057239.22 frames. ], batch size: 61, lr: 8.44e-03, grad_scale: 32.0 2023-11-19 07:36:54,120 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-11-19 07:37:02,761 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:37:22,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=633213.3333333334, ans=0.125 2023-11-19 07:37:29,645 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.684e+01 9.473e+01 1.039e+02 1.467e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-19 07:37:36,867 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.28 vs. limit=10.0 2023-11-19 07:37:44,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=633346.6666666666, ans=0.2 2023-11-19 07:37:46,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2023-11-19 07:37:48,139 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10850, loss[loss=0.1036, simple_loss=0.1259, pruned_loss=0.02808, audio_tagging_loss=0.01253, over 16615.00 frames. ], tot_loss[loss=0.09067, simple_loss=0.1094, pruned_loss=0.02544, audio_tagging_loss=0.01051, over 3063300.21 frames. ], batch size: 63, lr: 8.44e-03, grad_scale: 16.0 2023-11-19 07:37:57,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=633413.3333333334, ans=0.125 2023-11-19 07:37:59,239 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=633480.0, ans=0.015 2023-11-19 07:38:02,638 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=633480.0, ans=0.125 2023-11-19 07:38:10,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=633546.6666666666, ans=0.0 2023-11-19 07:38:18,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=633546.6666666666, ans=0.0 2023-11-19 07:38:21,015 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=633613.3333333334, ans=0.1 2023-11-19 07:38:40,561 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:38:43,644 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10900, loss[loss=0.08217, simple_loss=0.09684, pruned_loss=0.02443, audio_tagging_loss=0.009322, over 15315.00 frames. ], tot_loss[loss=0.09112, simple_loss=0.11, pruned_loss=0.02568, audio_tagging_loss=0.01045, over 3060537.34 frames. ], batch size: 59, lr: 8.44e-03, grad_scale: 16.0 2023-11-19 07:38:45,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=22.5 2023-11-19 07:38:50,732 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=633746.6666666666, ans=0.0 2023-11-19 07:39:02,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633813.3333333334, ans=0.1 2023-11-19 07:39:04,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=633880.0, ans=0.0 2023-11-19 07:39:06,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=633880.0, ans=0.025 2023-11-19 07:39:06,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=633880.0, ans=0.0 2023-11-19 07:39:13,638 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.63 vs. limit=15.0 2023-11-19 07:39:21,879 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.390e+01 9.440e+01 1.044e+02 1.572e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-19 07:39:39,404 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 10950, loss[loss=0.07694, simple_loss=0.09372, pruned_loss=0.01562, audio_tagging_loss=0.01446, over 14602.00 frames. ], tot_loss[loss=0.09103, simple_loss=0.1097, pruned_loss=0.02562, audio_tagging_loss=0.01055, over 3048950.39 frames. ], batch size: 57, lr: 8.44e-03, grad_scale: 16.0 2023-11-19 07:39:43,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=634080.0, ans=0.2 2023-11-19 07:39:58,765 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=634146.6666666666, ans=0.1 2023-11-19 07:40:32,749 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=634346.6666666666, ans=0.125 2023-11-19 07:40:34,752 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11000, loss[loss=0.1234, simple_loss=0.1599, pruned_loss=0.03711, audio_tagging_loss=0.006307, over 15701.00 frames. ], tot_loss[loss=0.09067, simple_loss=0.1092, pruned_loss=0.0254, audio_tagging_loss=0.01066, over 3039795.26 frames. ], batch size: 56, lr: 8.44e-03, grad_scale: 16.0 2023-11-19 07:40:44,797 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:40:49,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=634480.0, ans=0.1 2023-11-19 07:41:12,768 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.297e+01 9.076e+01 1.002e+02 1.429e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-19 07:41:15,304 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=22.5 2023-11-19 07:41:31,719 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11050, loss[loss=0.08312, simple_loss=0.1042, pruned_loss=0.0207, audio_tagging_loss=0.01031, over 15493.00 frames. ], tot_loss[loss=0.09107, simple_loss=0.1096, pruned_loss=0.02557, audio_tagging_loss=0.01067, over 3049706.63 frames. ], batch size: 58, lr: 8.43e-03, grad_scale: 16.0 2023-11-19 07:41:42,999 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=634813.3333333334, ans=0.125 2023-11-19 07:41:44,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=634813.3333333334, ans=0.125 2023-11-19 07:42:08,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=634946.6666666666, ans=0.2 2023-11-19 07:42:14,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=634946.6666666666, ans=0.125 2023-11-19 07:42:27,261 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11100, loss[loss=0.0756, simple_loss=0.08535, pruned_loss=0.01954, audio_tagging_loss=0.01338, over 14747.00 frames. ], tot_loss[loss=0.09094, simple_loss=0.1095, pruned_loss=0.02537, audio_tagging_loss=0.01084, over 3053609.78 frames. ], batch size: 58, lr: 8.43e-03, grad_scale: 16.0 2023-11-19 07:42:32,840 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=635080.0, ans=0.0 2023-11-19 07:42:33,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=635080.0, ans=0.0 2023-11-19 07:42:33,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=635080.0, ans=0.125 2023-11-19 07:42:41,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=635146.6666666666, ans=0.125 2023-11-19 07:42:53,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=635213.3333333334, ans=0.025 2023-11-19 07:43:05,510 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.720e+01 9.689e+01 1.049e+02 1.321e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-19 07:43:16,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=635346.6666666666, ans=0.125 2023-11-19 07:43:22,345 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11150, loss[loss=0.08793, simple_loss=0.1002, pruned_loss=0.02215, audio_tagging_loss=0.01568, over 15472.00 frames. ], tot_loss[loss=0.08996, simple_loss=0.108, pruned_loss=0.02481, audio_tagging_loss=0.01113, over 3054077.99 frames. ], batch size: 56, lr: 8.43e-03, grad_scale: 16.0 2023-11-19 07:43:31,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=635413.3333333334, ans=0.1 2023-11-19 07:43:39,094 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.90 vs. limit=15.0 2023-11-19 07:43:45,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=635546.6666666666, ans=0.125 2023-11-19 07:43:56,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=635613.3333333334, ans=0.0 2023-11-19 07:44:18,707 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11200, loss[loss=0.1095, simple_loss=0.1406, pruned_loss=0.02795, audio_tagging_loss=0.01123, over 15399.00 frames. ], tot_loss[loss=0.08972, simple_loss=0.1077, pruned_loss=0.02472, audio_tagging_loss=0.01114, over 3046335.24 frames. ], batch size: 53, lr: 8.43e-03, grad_scale: 32.0 2023-11-19 07:44:23,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.06 vs. limit=15.0 2023-11-19 07:44:24,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=635746.6666666666, ans=0.2 2023-11-19 07:44:24,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=635746.6666666666, ans=0.0 2023-11-19 07:44:44,643 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=635880.0, ans=0.125 2023-11-19 07:44:50,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2023-11-19 07:44:56,021 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.574e+01 8.472e+01 9.235e+01 9.971e+01 1.338e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 07:44:56,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=635946.6666666666, ans=0.0 2023-11-19 07:45:11,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=636013.3333333334, ans=0.09899494936611666 2023-11-19 07:45:14,417 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11250, loss[loss=0.07042, simple_loss=0.08356, pruned_loss=0.01817, audio_tagging_loss=0.01047, over 15131.00 frames. ], tot_loss[loss=0.08958, simple_loss=0.1075, pruned_loss=0.02481, audio_tagging_loss=0.01103, over 3048841.32 frames. ], batch size: 57, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:45:14,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=636080.0, ans=0.0 2023-11-19 07:45:26,274 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=636146.6666666666, ans=0.1 2023-11-19 07:45:37,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=636213.3333333334, ans=0.0 2023-11-19 07:46:09,198 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11300, loss[loss=0.08364, simple_loss=0.1051, pruned_loss=0.0197, audio_tagging_loss=0.0114, over 14710.00 frames. ], tot_loss[loss=0.09012, simple_loss=0.1084, pruned_loss=0.02503, audio_tagging_loss=0.01087, over 3048251.11 frames. ], batch size: 56, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:46:17,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=636413.3333333334, ans=0.125 2023-11-19 07:46:20,520 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.65 vs. limit=10.0 2023-11-19 07:46:31,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=636546.6666666666, ans=0.0 2023-11-19 07:46:47,210 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.708e+01 8.767e+01 9.641e+01 1.057e+02 1.574e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-19 07:46:58,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=636680.0, ans=0.0 2023-11-19 07:47:05,213 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11350, loss[loss=0.08364, simple_loss=0.1107, pruned_loss=0.02097, audio_tagging_loss=0.00731, over 14833.00 frames. ], tot_loss[loss=0.08966, simple_loss=0.108, pruned_loss=0.02504, audio_tagging_loss=0.01061, over 3044744.27 frames. ], batch size: 58, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:47:11,779 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-11-19 07:47:32,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=636880.0, ans=0.1 2023-11-19 07:47:39,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=636946.6666666666, ans=0.0 2023-11-19 07:48:01,158 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11400, loss[loss=0.09867, simple_loss=0.1219, pruned_loss=0.02708, audio_tagging_loss=0.01065, over 17466.00 frames. ], tot_loss[loss=0.08976, simple_loss=0.1086, pruned_loss=0.02502, audio_tagging_loss=0.01046, over 3043291.63 frames. ], batch size: 65, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:48:04,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=637080.0, ans=0.0 2023-11-19 07:48:17,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=637146.6666666666, ans=0.0 2023-11-19 07:48:22,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=12.0 2023-11-19 07:48:38,432 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.888e+01 8.501e+01 9.316e+01 1.025e+02 1.797e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 07:48:51,621 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2023-11-19 07:48:56,274 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11450, loss[loss=0.09841, simple_loss=0.1182, pruned_loss=0.02786, audio_tagging_loss=0.01146, over 14831.00 frames. ], tot_loss[loss=0.09006, simple_loss=0.1088, pruned_loss=0.02523, audio_tagging_loss=0.01044, over 3038048.77 frames. ], batch size: 56, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:49:06,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=637480.0, ans=0.125 2023-11-19 07:49:12,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=637480.0, ans=0.125 2023-11-19 07:49:14,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=637480.0, ans=0.2 2023-11-19 07:49:30,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=637613.3333333334, ans=0.0 2023-11-19 07:49:31,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=637613.3333333334, ans=0.125 2023-11-19 07:49:42,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=637680.0, ans=0.125 2023-11-19 07:49:53,052 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11500, loss[loss=0.1002, simple_loss=0.1166, pruned_loss=0.03179, audio_tagging_loss=0.01009, over 15400.00 frames. ], tot_loss[loss=0.08991, simple_loss=0.1088, pruned_loss=0.02503, audio_tagging_loss=0.01048, over 3041860.64 frames. ], batch size: 57, lr: 8.41e-03, grad_scale: 32.0 2023-11-19 07:50:05,607 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=637813.3333333334, ans=0.1 2023-11-19 07:50:18,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=637880.0, ans=0.0 2023-11-19 07:50:30,710 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.770e+01 8.483e+01 9.238e+01 9.842e+01 1.262e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 07:50:39,483 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.74 vs. limit=15.0 2023-11-19 07:50:44,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=12.0 2023-11-19 07:50:49,223 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11550, loss[loss=0.0874, simple_loss=0.1101, pruned_loss=0.02237, audio_tagging_loss=0.009994, over 14538.00 frames. ], tot_loss[loss=0.08942, simple_loss=0.1081, pruned_loss=0.02488, audio_tagging_loss=0.01048, over 3045371.91 frames. ], batch size: 56, lr: 8.41e-03, grad_scale: 32.0 2023-11-19 07:50:56,854 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=638080.0, ans=0.1 2023-11-19 07:51:00,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=638146.6666666666, ans=0.2 2023-11-19 07:51:12,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=638213.3333333334, ans=0.0 2023-11-19 07:51:14,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=638213.3333333334, ans=0.0 2023-11-19 07:51:21,262 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2023-11-19 07:51:22,833 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:51:35,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=638346.6666666666, ans=0.125 2023-11-19 07:51:38,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=638346.6666666666, ans=0.125 2023-11-19 07:51:43,890 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11600, loss[loss=0.07371, simple_loss=0.09284, pruned_loss=0.01595, audio_tagging_loss=0.01135, over 14491.00 frames. ], tot_loss[loss=0.08929, simple_loss=0.1077, pruned_loss=0.02485, audio_tagging_loss=0.0106, over 3039450.22 frames. ], batch size: 56, lr: 8.41e-03, grad_scale: 32.0 2023-11-19 07:51:55,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=638480.0, ans=0.0 2023-11-19 07:52:17,708 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:52:21,739 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.115e+01 8.734e+01 9.337e+01 1.014e+02 1.440e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-19 07:52:32,759 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=638680.0, ans=0.0 2023-11-19 07:52:39,902 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11650, loss[loss=0.09135, simple_loss=0.109, pruned_loss=0.02821, audio_tagging_loss=0.008663, over 14618.00 frames. ], tot_loss[loss=0.08866, simple_loss=0.1072, pruned_loss=0.02449, audio_tagging_loss=0.01057, over 3043030.72 frames. ], batch size: 56, lr: 8.41e-03, grad_scale: 16.0 2023-11-19 07:52:46,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=638746.6666666666, ans=0.07 2023-11-19 07:52:50,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=638813.3333333334, ans=0.0 2023-11-19 07:52:52,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=638813.3333333334, ans=0.2 2023-11-19 07:52:59,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=638813.3333333334, ans=0.1 2023-11-19 07:53:14,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.21 vs. limit=12.0 2023-11-19 07:53:34,701 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11700, loss[loss=0.05685, simple_loss=0.06194, pruned_loss=0.01313, audio_tagging_loss=0.01275, over 14775.00 frames. ], tot_loss[loss=0.08951, simple_loss=0.1081, pruned_loss=0.02486, audio_tagging_loss=0.0106, over 3045106.98 frames. ], batch size: 59, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:53:50,007 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.79 vs. limit=10.0 2023-11-19 07:53:52,903 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=639146.6666666666, ans=0.0 2023-11-19 07:53:58,538 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.15 vs. limit=22.5 2023-11-19 07:54:11,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639280.0, ans=0.1 2023-11-19 07:54:13,831 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.036e+01 8.154e+01 8.844e+01 9.528e+01 1.167e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 07:54:15,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=639280.0, ans=10.0 2023-11-19 07:54:18,860 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=639346.6666666666, ans=0.125 2023-11-19 07:54:30,856 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11750, loss[loss=0.08082, simple_loss=0.0993, pruned_loss=0.01884, audio_tagging_loss=0.01233, over 16622.00 frames. ], tot_loss[loss=0.08934, simple_loss=0.108, pruned_loss=0.02477, audio_tagging_loss=0.01055, over 3047913.73 frames. ], batch size: 65, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:54:47,035 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2023-11-19 07:55:06,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=639613.3333333334, ans=0.0 2023-11-19 07:55:14,032 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=639680.0, ans=0.125 2023-11-19 07:55:26,014 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11800, loss[loss=0.09112, simple_loss=0.1072, pruned_loss=0.0246, audio_tagging_loss=0.01294, over 16200.00 frames. ], tot_loss[loss=0.08909, simple_loss=0.1073, pruned_loss=0.02485, audio_tagging_loss=0.01061, over 3045706.52 frames. ], batch size: 60, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:55:28,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=639746.6666666666, ans=0.2 2023-11-19 07:55:38,584 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=639813.3333333334, ans=0.0 2023-11-19 07:55:46,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=639813.3333333334, ans=0.0 2023-11-19 07:55:53,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.58 vs. limit=22.5 2023-11-19 07:55:56,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639880.0, ans=0.1 2023-11-19 07:56:05,550 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.407e+01 8.609e+01 9.500e+01 1.074e+02 1.455e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-19 07:56:24,336 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11850, loss[loss=0.09793, simple_loss=0.1114, pruned_loss=0.03193, audio_tagging_loss=0.01033, over 14954.00 frames. ], tot_loss[loss=0.0895, simple_loss=0.1076, pruned_loss=0.02489, audio_tagging_loss=0.0108, over 3048648.55 frames. ], batch size: 58, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:56:24,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=640080.0, ans=0.0 2023-11-19 07:56:28,536 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=22.5 2023-11-19 07:56:40,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=640146.6666666666, ans=0.0 2023-11-19 07:56:55,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=640213.3333333334, ans=0.125 2023-11-19 07:56:56,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=640213.3333333334, ans=0.125 2023-11-19 07:56:56,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=640213.3333333334, ans=0.125 2023-11-19 07:57:08,560 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.86 vs. limit=15.0 2023-11-19 07:57:20,089 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11900, loss[loss=0.06938, simple_loss=0.08656, pruned_loss=0.0144, audio_tagging_loss=0.0117, over 15343.00 frames. ], tot_loss[loss=0.08884, simple_loss=0.1068, pruned_loss=0.02461, audio_tagging_loss=0.01084, over 3044134.94 frames. ], batch size: 60, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:57:20,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=640413.3333333334, ans=0.07 2023-11-19 07:57:23,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=640413.3333333334, ans=22.5 2023-11-19 07:57:32,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=640480.0, ans=0.09899494936611666 2023-11-19 07:57:59,326 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.550e+01 8.414e+01 8.951e+01 9.837e+01 1.464e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 07:58:07,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=640680.0, ans=0.0 2023-11-19 07:58:16,216 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 11950, loss[loss=0.08806, simple_loss=0.09139, pruned_loss=0.02647, audio_tagging_loss=0.01589, over 14839.00 frames. ], tot_loss[loss=0.08884, simple_loss=0.1064, pruned_loss=0.02459, audio_tagging_loss=0.01107, over 3041588.28 frames. ], batch size: 57, lr: 8.39e-03, grad_scale: 16.0 2023-11-19 07:58:20,442 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.25 vs. limit=15.0 2023-11-19 07:58:28,783 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=640813.3333333334, ans=0.125 2023-11-19 07:58:34,303 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=15.0 2023-11-19 07:59:00,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.58 vs. limit=22.5 2023-11-19 07:59:02,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=641013.3333333334, ans=0.0 2023-11-19 07:59:05,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=641013.3333333334, ans=0.125 2023-11-19 07:59:10,347 INFO [train_asr.py:1115] (1/4) Epoch 8, batch 12000, loss[loss=0.09476, simple_loss=0.1207, pruned_loss=0.02569, audio_tagging_loss=0.008702, over 16051.00 frames. ], tot_loss[loss=0.08888, simple_loss=0.1064, pruned_loss=0.02457, audio_tagging_loss=0.01112, over 3037608.29 frames. ], batch size: 58, lr: 8.39e-03, grad_scale: 32.0 2023-11-19 07:59:10,348 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 07:59:39,716 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6496, 3.4020, 3.7069, 2.9649], device='cuda:1') 2023-11-19 07:59:42,989 INFO [train_asr.py:1147] (1/4) Epoch 8, validation: loss=0.06649, simple_loss=0.05653, pruned_loss=0.006961, audio_tagging_loss=0.03127, over 4681554.00 frames. 2023-11-19 07:59:42,990 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 07:59:54,518 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2023-11-19 07:59:58,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=641146.6666666666, ans=0.0 2023-11-19 08:00:44,328 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 0, loss[loss=0.09365, simple_loss=0.09788, pruned_loss=0.01556, audio_tagging_loss=0.02915, over 14172.00 frames. ], tot_loss[loss=0.09365, simple_loss=0.09788, pruned_loss=0.01556, audio_tagging_loss=0.02915, over 14172.00 frames. ], batch size: 56, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:00:44,328 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 08:01:16,088 INFO [train_asr.py:1147] (1/4) Epoch 9, validation: loss=0.06566, simple_loss=0.05652, pruned_loss=0.006966, audio_tagging_loss=0.03043, over 4681554.00 frames. 2023-11-19 08:01:16,089 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 08:01:20,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=641240.0, ans=0.125 2023-11-19 08:01:28,800 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.618e+01 8.783e+01 9.637e+01 1.099e+02 1.400e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-19 08:01:35,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=641306.6666666666, ans=0.05 2023-11-19 08:01:55,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=641440.0, ans=0.2 2023-11-19 08:02:12,368 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 50, loss[loss=0.1147, simple_loss=0.133, pruned_loss=0.03138, audio_tagging_loss=0.01682, over 17990.00 frames. ], tot_loss[loss=0.09842, simple_loss=0.1072, pruned_loss=0.02426, audio_tagging_loss=0.02057, over 691561.44 frames. ], batch size: 64, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:02:31,051 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=641640.0, ans=0.125 2023-11-19 08:02:44,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=641706.6666666666, ans=0.125 2023-11-19 08:02:49,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=641773.3333333334, ans=0.2 2023-11-19 08:02:58,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=641840.0, ans=0.0 2023-11-19 08:03:02,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=641840.0, ans=0.05 2023-11-19 08:03:07,984 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 100, loss[loss=0.07739, simple_loss=0.07566, pruned_loss=0.01614, audio_tagging_loss=0.02342, over 14329.00 frames. ], tot_loss[loss=0.09783, simple_loss=0.1073, pruned_loss=0.02449, audio_tagging_loss=0.01967, over 1213604.90 frames. ], batch size: 55, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:03:15,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=641906.6666666666, ans=0.125 2023-11-19 08:03:19,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.444e+01 8.656e+01 9.404e+01 1.019e+02 1.351e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-19 08:03:23,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=641973.3333333334, ans=0.0 2023-11-19 08:03:25,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=641973.3333333334, ans=0.0 2023-11-19 08:03:28,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=641973.3333333334, ans=0.0 2023-11-19 08:03:34,988 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=642040.0, ans=0.0 2023-11-19 08:03:39,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=642040.0, ans=0.0 2023-11-19 08:03:45,158 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=642106.6666666666, ans=0.1 2023-11-19 08:03:54,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=642173.3333333334, ans=0.125 2023-11-19 08:04:03,507 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 150, loss[loss=0.09076, simple_loss=0.1026, pruned_loss=0.02611, audio_tagging_loss=0.01335, over 16646.00 frames. ], tot_loss[loss=0.09799, simple_loss=0.1102, pruned_loss=0.02558, audio_tagging_loss=0.0173, over 1619683.65 frames. ], batch size: 64, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:04:11,130 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2023-11-19 08:04:13,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=642306.6666666666, ans=0.1 2023-11-19 08:04:16,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2023-11-19 08:04:38,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=642440.0, ans=0.125 2023-11-19 08:04:40,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=642440.0, ans=0.125 2023-11-19 08:04:44,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2023-11-19 08:04:51,088 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=642506.6666666666, ans=0.1 2023-11-19 08:04:59,889 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 200, loss[loss=0.06727, simple_loss=0.08158, pruned_loss=0.01506, audio_tagging_loss=0.01141, over 14993.00 frames. ], tot_loss[loss=0.09456, simple_loss=0.1086, pruned_loss=0.02501, audio_tagging_loss=0.01524, over 1930093.39 frames. ], batch size: 55, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:05:00,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=642573.3333333334, ans=0.0 2023-11-19 08:05:11,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=642640.0, ans=0.125 2023-11-19 08:05:13,045 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.507e+01 8.518e+01 9.330e+01 1.026e+02 1.321e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-19 08:05:18,039 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.68 vs. limit=22.5 2023-11-19 08:05:24,062 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:05:49,698 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.56 vs. limit=22.5 2023-11-19 08:05:50,289 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:05:50,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=642840.0, ans=0.1 2023-11-19 08:05:55,838 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 250, loss[loss=0.1027, simple_loss=0.1326, pruned_loss=0.02867, audio_tagging_loss=0.007713, over 16575.00 frames. ], tot_loss[loss=0.09339, simple_loss=0.1087, pruned_loss=0.02517, audio_tagging_loss=0.01385, over 2175141.80 frames. ], batch size: 58, lr: 7.93e-03, grad_scale: 16.0 2023-11-19 08:06:06,046 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2023-11-19 08:06:07,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=642973.3333333334, ans=0.07 2023-11-19 08:06:23,270 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=643040.0, ans=0.125 2023-11-19 08:06:36,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=643106.6666666666, ans=0.035 2023-11-19 08:06:37,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=643106.6666666666, ans=0.125 2023-11-19 08:06:51,157 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 300, loss[loss=0.1176, simple_loss=0.1536, pruned_loss=0.0333, audio_tagging_loss=0.0075, over 15013.00 frames. ], tot_loss[loss=0.09348, simple_loss=0.11, pruned_loss=0.0256, audio_tagging_loss=0.01288, over 2370624.10 frames. ], batch size: 55, lr: 7.93e-03, grad_scale: 16.0 2023-11-19 08:07:04,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2023-11-19 08:07:05,327 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.313e+01 8.625e+01 9.241e+01 1.032e+02 1.343e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 08:07:05,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=643306.6666666666, ans=0.125 2023-11-19 08:07:13,361 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2023-11-19 08:07:14,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=643373.3333333334, ans=0.05 2023-11-19 08:07:15,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=643373.3333333334, ans=0.125 2023-11-19 08:07:21,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=643373.3333333334, ans=0.125 2023-11-19 08:07:21,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=643373.3333333334, ans=0.2 2023-11-19 08:07:22,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=643373.3333333334, ans=15.0 2023-11-19 08:07:22,333 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-11-19 08:07:22,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=643373.3333333334, ans=0.0 2023-11-19 08:07:33,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643440.0, ans=0.1 2023-11-19 08:07:45,735 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.549e-01 2023-11-19 08:07:47,520 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 350, loss[loss=0.09212, simple_loss=0.1181, pruned_loss=0.02611, audio_tagging_loss=0.006937, over 15215.00 frames. ], tot_loss[loss=0.09225, simple_loss=0.1097, pruned_loss=0.02531, audio_tagging_loss=0.0121, over 2524173.90 frames. ], batch size: 56, lr: 7.93e-03, grad_scale: 16.0 2023-11-19 08:07:49,906 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:07:50,981 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=643573.3333333334, ans=0.0 2023-11-19 08:07:58,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=643640.0, ans=0.1 2023-11-19 08:08:19,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=643773.3333333334, ans=0.1 2023-11-19 08:08:29,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=643773.3333333334, ans=0.1 2023-11-19 08:08:31,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=643840.0, ans=0.0 2023-11-19 08:08:43,342 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 400, loss[loss=0.09815, simple_loss=0.1267, pruned_loss=0.02467, audio_tagging_loss=0.01014, over 15218.00 frames. ], tot_loss[loss=0.09149, simple_loss=0.1094, pruned_loss=0.02515, audio_tagging_loss=0.01165, over 2637628.96 frames. ], batch size: 55, lr: 7.93e-03, grad_scale: 32.0 2023-11-19 08:08:55,931 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.363e+01 9.025e+01 9.871e+01 1.227e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 08:09:05,923 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-19 08:09:09,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=644040.0, ans=0.125 2023-11-19 08:09:16,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=644106.6666666666, ans=0.1 2023-11-19 08:09:20,738 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.01 vs. limit=22.5 2023-11-19 08:09:25,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=644106.6666666666, ans=0.125 2023-11-19 08:09:34,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=644173.3333333334, ans=0.0 2023-11-19 08:09:38,846 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 450, loss[loss=0.1037, simple_loss=0.1191, pruned_loss=0.03477, audio_tagging_loss=0.009331, over 15663.00 frames. ], tot_loss[loss=0.09055, simple_loss=0.1084, pruned_loss=0.02495, audio_tagging_loss=0.01138, over 2731493.18 frames. ], batch size: 60, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:09:40,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=644240.0, ans=0.125 2023-11-19 08:09:41,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=644240.0, ans=0.125 2023-11-19 08:09:41,528 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.16 vs. limit=15.0 2023-11-19 08:09:57,051 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=22.5 2023-11-19 08:09:57,816 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:09:58,902 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=644306.6666666666, ans=0.0 2023-11-19 08:10:03,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=644373.3333333334, ans=0.125 2023-11-19 08:10:05,856 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.48 vs. limit=22.5 2023-11-19 08:10:12,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=644440.0, ans=0.0 2023-11-19 08:10:21,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=644440.0, ans=0.125 2023-11-19 08:10:23,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=644506.6666666666, ans=0.125 2023-11-19 08:10:25,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=644506.6666666666, ans=0.0 2023-11-19 08:10:27,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=15.0 2023-11-19 08:10:30,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=644506.6666666666, ans=0.0 2023-11-19 08:10:35,250 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 500, loss[loss=0.1116, simple_loss=0.1353, pruned_loss=0.0367, audio_tagging_loss=0.007282, over 15136.00 frames. ], tot_loss[loss=0.09056, simple_loss=0.1085, pruned_loss=0.02503, audio_tagging_loss=0.01126, over 2802776.10 frames. ], batch size: 54, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:10:48,163 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.71 vs. limit=15.0 2023-11-19 08:10:48,619 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.872e+01 8.533e+01 9.443e+01 1.042e+02 1.372e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-19 08:11:07,832 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.43 vs. limit=15.0 2023-11-19 08:11:10,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2023-11-19 08:11:31,022 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 550, loss[loss=0.09325, simple_loss=0.1076, pruned_loss=0.02872, audio_tagging_loss=0.01071, over 15797.00 frames. ], tot_loss[loss=0.09049, simple_loss=0.1087, pruned_loss=0.02511, audio_tagging_loss=0.011, over 2850623.94 frames. ], batch size: 60, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:11:42,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=644973.3333333334, ans=0.0 2023-11-19 08:11:57,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=645040.0, ans=0.125 2023-11-19 08:12:13,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=645106.6666666666, ans=0.0 2023-11-19 08:12:24,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2023-11-19 08:12:26,854 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 600, loss[loss=0.08314, simple_loss=0.09963, pruned_loss=0.0229, audio_tagging_loss=0.01042, over 14839.00 frames. ], tot_loss[loss=0.0899, simple_loss=0.108, pruned_loss=0.02497, audio_tagging_loss=0.01093, over 2895642.63 frames. ], batch size: 56, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:12:40,047 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.132e+01 8.342e+01 9.026e+01 9.768e+01 1.504e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 08:12:51,261 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:12:54,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2023-11-19 08:12:55,481 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=7.963e-03 2023-11-19 08:12:56,524 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=645373.3333333334, ans=0.0 2023-11-19 08:13:01,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=645440.0, ans=0.1 2023-11-19 08:13:07,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=645440.0, ans=0.125 2023-11-19 08:13:09,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.07 vs. limit=15.0 2023-11-19 08:13:22,996 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 650, loss[loss=0.0896, simple_loss=0.1037, pruned_loss=0.02404, audio_tagging_loss=0.01371, over 15488.00 frames. ], tot_loss[loss=0.08989, simple_loss=0.1082, pruned_loss=0.02481, audio_tagging_loss=0.01096, over 2933370.99 frames. ], batch size: 57, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:13:39,474 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=12.0 2023-11-19 08:13:46,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=645706.6666666666, ans=0.0 2023-11-19 08:14:03,140 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.76 vs. limit=15.0 2023-11-19 08:14:13,955 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=645840.0, ans=0.1 2023-11-19 08:14:19,614 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 700, loss[loss=0.07553, simple_loss=0.1002, pruned_loss=0.01593, audio_tagging_loss=0.00952, over 15522.00 frames. ], tot_loss[loss=0.08922, simple_loss=0.1077, pruned_loss=0.02448, audio_tagging_loss=0.01087, over 2955668.43 frames. ], batch size: 58, lr: 7.91e-03, grad_scale: 16.0 2023-11-19 08:14:31,052 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.303e-01 2023-11-19 08:14:32,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=645973.3333333334, ans=0.2 2023-11-19 08:14:33,925 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.588e+01 8.287e+01 8.978e+01 1.006e+02 1.254e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 08:14:38,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=645973.3333333334, ans=0.125 2023-11-19 08:15:10,351 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=646173.3333333334, ans=10.0 2023-11-19 08:15:15,502 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 750, loss[loss=0.1134, simple_loss=0.1429, pruned_loss=0.03287, audio_tagging_loss=0.009093, over 15238.00 frames. ], tot_loss[loss=0.08963, simple_loss=0.1082, pruned_loss=0.02465, audio_tagging_loss=0.01086, over 2979630.68 frames. ], batch size: 55, lr: 7.91e-03, grad_scale: 16.0 2023-11-19 08:15:18,943 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=646240.0, ans=0.2 2023-11-19 08:15:29,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=646306.6666666666, ans=0.025 2023-11-19 08:15:31,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=12.0 2023-11-19 08:15:35,817 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=646306.6666666666, ans=0.125 2023-11-19 08:15:45,288 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2023-11-19 08:16:11,322 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 800, loss[loss=0.08423, simple_loss=0.09412, pruned_loss=0.02352, audio_tagging_loss=0.01365, over 14910.00 frames. ], tot_loss[loss=0.08932, simple_loss=0.1082, pruned_loss=0.02444, audio_tagging_loss=0.01079, over 2994789.84 frames. ], batch size: 53, lr: 7.91e-03, grad_scale: 32.0 2023-11-19 08:16:18,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=646573.3333333334, ans=0.07 2023-11-19 08:16:23,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=646640.0, ans=0.0 2023-11-19 08:16:25,538 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.571e+01 9.363e+01 1.050e+02 1.472e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-19 08:16:40,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=12.0 2023-11-19 08:17:07,117 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 850, loss[loss=0.06968, simple_loss=0.07835, pruned_loss=0.01559, audio_tagging_loss=0.01491, over 15298.00 frames. ], tot_loss[loss=0.08914, simple_loss=0.1078, pruned_loss=0.02426, audio_tagging_loss=0.01096, over 3002596.80 frames. ], batch size: 58, lr: 7.91e-03, grad_scale: 32.0 2023-11-19 08:17:25,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=646973.3333333334, ans=0.125 2023-11-19 08:17:30,100 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=647040.0, ans=0.0 2023-11-19 08:17:32,124 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=22.5 2023-11-19 08:17:40,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=647106.6666666666, ans=0.1 2023-11-19 08:17:52,006 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=647173.3333333334, ans=0.0 2023-11-19 08:18:02,487 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 900, loss[loss=0.08304, simple_loss=0.0971, pruned_loss=0.02283, audio_tagging_loss=0.01166, over 17687.00 frames. ], tot_loss[loss=0.08918, simple_loss=0.1077, pruned_loss=0.02432, audio_tagging_loss=0.01099, over 3013669.01 frames. ], batch size: 69, lr: 7.91e-03, grad_scale: 32.0 2023-11-19 08:18:16,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.327e+01 8.077e+01 9.345e+01 1.007e+02 1.276e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-19 08:18:23,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=647306.6666666666, ans=0.1 2023-11-19 08:18:58,290 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 950, loss[loss=0.08253, simple_loss=0.1027, pruned_loss=0.02219, audio_tagging_loss=0.009003, over 15242.00 frames. ], tot_loss[loss=0.08972, simple_loss=0.1089, pruned_loss=0.02454, audio_tagging_loss=0.01071, over 3031873.11 frames. ], batch size: 57, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:19:07,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.05 vs. limit=6.0 2023-11-19 08:19:08,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=647640.0, ans=0.0 2023-11-19 08:19:12,731 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=647640.0, ans=0.125 2023-11-19 08:19:24,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=647706.6666666666, ans=0.2 2023-11-19 08:19:25,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=647706.6666666666, ans=0.125 2023-11-19 08:19:53,961 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1000, loss[loss=0.06322, simple_loss=0.07521, pruned_loss=0.01323, audio_tagging_loss=0.01239, over 14888.00 frames. ], tot_loss[loss=0.08865, simple_loss=0.1075, pruned_loss=0.02431, audio_tagging_loss=0.01061, over 3029524.71 frames. ], batch size: 58, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:20:08,930 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.042e+01 8.007e+01 8.931e+01 9.562e+01 1.265e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-19 08:20:17,760 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:20:24,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=648040.0, ans=0.125 2023-11-19 08:20:50,068 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1050, loss[loss=0.08363, simple_loss=0.1064, pruned_loss=0.0215, audio_tagging_loss=0.008925, over 16125.00 frames. ], tot_loss[loss=0.08797, simple_loss=0.1066, pruned_loss=0.02417, audio_tagging_loss=0.01049, over 3038729.92 frames. ], batch size: 60, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:21:04,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=648306.6666666666, ans=0.0 2023-11-19 08:21:13,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=648373.3333333334, ans=0.125 2023-11-19 08:21:15,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=648373.3333333334, ans=0.125 2023-11-19 08:21:25,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=648440.0, ans=0.2 2023-11-19 08:21:31,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=648440.0, ans=0.0 2023-11-19 08:21:42,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=648506.6666666666, ans=0.125 2023-11-19 08:21:46,129 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1100, loss[loss=0.05762, simple_loss=0.05804, pruned_loss=0.01506, audio_tagging_loss=0.01355, over 14965.00 frames. ], tot_loss[loss=0.08786, simple_loss=0.1063, pruned_loss=0.02424, audio_tagging_loss=0.01046, over 3030128.94 frames. ], batch size: 59, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:21:48,222 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:21:51,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=648573.3333333334, ans=0.1 2023-11-19 08:21:52,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=648573.3333333334, ans=0.1 2023-11-19 08:22:00,250 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.479e+01 9.483e+01 1.065e+02 1.916e+02, threshold=1.897e+02, percent-clipped=1.0 2023-11-19 08:22:12,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.37 vs. limit=15.0 2023-11-19 08:22:24,746 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.46 vs. limit=15.0 2023-11-19 08:22:39,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=648840.0, ans=0.125 2023-11-19 08:22:41,974 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1150, loss[loss=0.05412, simple_loss=0.06055, pruned_loss=0.01081, audio_tagging_loss=0.01303, over 14487.00 frames. ], tot_loss[loss=0.08784, simple_loss=0.1061, pruned_loss=0.02429, audio_tagging_loss=0.01051, over 3029326.77 frames. ], batch size: 56, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:22:57,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=648973.3333333334, ans=0.125 2023-11-19 08:22:58,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=648973.3333333334, ans=0.125 2023-11-19 08:23:07,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=649040.0, ans=0.2 2023-11-19 08:23:14,599 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=649106.6666666666, ans=0.07 2023-11-19 08:23:37,845 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1200, loss[loss=0.08106, simple_loss=0.09961, pruned_loss=0.02018, audio_tagging_loss=0.01107, over 15145.00 frames. ], tot_loss[loss=0.08826, simple_loss=0.1068, pruned_loss=0.02443, audio_tagging_loss=0.01042, over 3035104.10 frames. ], batch size: 56, lr: 7.89e-03, grad_scale: 32.0 2023-11-19 08:23:52,034 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.405e+01 8.142e+01 8.954e+01 1.003e+02 1.503e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-19 08:23:54,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=649306.6666666666, ans=0.0 2023-11-19 08:23:57,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=649306.6666666666, ans=0.125 2023-11-19 08:24:02,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649373.3333333334, ans=0.1 2023-11-19 08:24:13,557 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.17 vs. limit=15.0 2023-11-19 08:24:21,609 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=649506.6666666666, ans=0.0 2023-11-19 08:24:33,461 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1250, loss[loss=0.09784, simple_loss=0.1236, pruned_loss=0.02849, audio_tagging_loss=0.007542, over 17220.00 frames. ], tot_loss[loss=0.08822, simple_loss=0.1066, pruned_loss=0.02447, audio_tagging_loss=0.01045, over 3033018.67 frames. ], batch size: 64, lr: 7.89e-03, grad_scale: 32.0 2023-11-19 08:24:47,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=649640.0, ans=0.125 2023-11-19 08:25:18,673 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=649840.0, ans=0.125 2023-11-19 08:25:19,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=649840.0, ans=0.125 2023-11-19 08:25:25,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=649840.0, ans=0.125 2023-11-19 08:25:29,557 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1300, loss[loss=0.04662, simple_loss=0.04527, pruned_loss=0.01098, audio_tagging_loss=0.013, over 14833.00 frames. ], tot_loss[loss=0.08754, simple_loss=0.1056, pruned_loss=0.02421, audio_tagging_loss=0.01052, over 3034192.73 frames. ], batch size: 56, lr: 7.89e-03, grad_scale: 16.0 2023-11-19 08:25:38,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=649906.6666666666, ans=0.0 2023-11-19 08:25:38,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2023-11-19 08:25:44,821 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.781e+01 8.314e+01 8.988e+01 1.010e+02 1.259e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 08:25:46,056 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=649973.3333333334, ans=0.125 2023-11-19 08:25:56,265 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2023-11-19 08:26:09,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=650106.6666666666, ans=0.125 2023-11-19 08:26:13,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=650173.3333333334, ans=0.125 2023-11-19 08:26:18,971 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.17 vs. limit=22.5 2023-11-19 08:26:25,338 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1350, loss[loss=0.08248, simple_loss=0.09876, pruned_loss=0.02088, audio_tagging_loss=0.01222, over 15317.00 frames. ], tot_loss[loss=0.0877, simple_loss=0.1057, pruned_loss=0.02434, audio_tagging_loss=0.0105, over 3037087.89 frames. ], batch size: 57, lr: 7.89e-03, grad_scale: 16.0 2023-11-19 08:26:46,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650373.3333333334, ans=0.1 2023-11-19 08:26:46,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=650373.3333333334, ans=0.125 2023-11-19 08:26:52,571 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=650373.3333333334, ans=0.035 2023-11-19 08:26:55,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650373.3333333334, ans=0.1 2023-11-19 08:27:02,409 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2023-11-19 08:27:05,207 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:27:09,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=650506.6666666666, ans=0.0 2023-11-19 08:27:12,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=650506.6666666666, ans=0.0 2023-11-19 08:27:15,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=650506.6666666666, ans=0.125 2023-11-19 08:27:20,252 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1400, loss[loss=0.06064, simple_loss=0.07097, pruned_loss=0.01105, audio_tagging_loss=0.0141, over 15867.00 frames. ], tot_loss[loss=0.08811, simple_loss=0.1062, pruned_loss=0.0245, audio_tagging_loss=0.01052, over 3038303.44 frames. ], batch size: 61, lr: 7.89e-03, grad_scale: 16.0 2023-11-19 08:27:27,924 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.54 vs. limit=15.0 2023-11-19 08:27:28,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=650573.3333333334, ans=0.0 2023-11-19 08:27:34,441 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=650640.0, ans=0.125 2023-11-19 08:27:36,580 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 8.323e+01 8.984e+01 9.924e+01 1.651e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-19 08:27:45,226 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=650706.6666666666, ans=0.0 2023-11-19 08:28:17,050 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1450, loss[loss=0.08893, simple_loss=0.1059, pruned_loss=0.02167, audio_tagging_loss=0.01431, over 15072.00 frames. ], tot_loss[loss=0.08764, simple_loss=0.1052, pruned_loss=0.02439, audio_tagging_loss=0.01067, over 3033516.90 frames. ], batch size: 57, lr: 7.88e-03, grad_scale: 16.0 2023-11-19 08:28:36,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=650973.3333333334, ans=0.2 2023-11-19 08:28:39,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=651040.0, ans=0.2 2023-11-19 08:28:40,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=651040.0, ans=0.0 2023-11-19 08:28:56,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=651106.6666666666, ans=0.125 2023-11-19 08:29:12,202 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=15.0 2023-11-19 08:29:12,417 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1500, loss[loss=0.09306, simple_loss=0.1175, pruned_loss=0.0263, audio_tagging_loss=0.008011, over 13971.00 frames. ], tot_loss[loss=0.08842, simple_loss=0.1062, pruned_loss=0.02467, audio_tagging_loss=0.01066, over 3040357.50 frames. ], batch size: 54, lr: 7.88e-03, grad_scale: 16.0 2023-11-19 08:29:24,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=651306.6666666666, ans=10.0 2023-11-19 08:29:27,756 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.092e+01 8.592e+01 9.437e+01 1.054e+02 1.547e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-19 08:29:28,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=651306.6666666666, ans=0.125 2023-11-19 08:29:33,867 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=651373.3333333334, ans=0.125 2023-11-19 08:29:37,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=651373.3333333334, ans=0.0 2023-11-19 08:30:08,175 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1550, loss[loss=0.09326, simple_loss=0.1148, pruned_loss=0.0249, audio_tagging_loss=0.01095, over 15182.00 frames. ], tot_loss[loss=0.08793, simple_loss=0.1057, pruned_loss=0.02432, audio_tagging_loss=0.01077, over 3042981.99 frames. ], batch size: 56, lr: 7.88e-03, grad_scale: 16.0 2023-11-19 08:30:08,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=651573.3333333334, ans=0.0 2023-11-19 08:30:09,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=651573.3333333334, ans=0.2 2023-11-19 08:30:13,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=651573.3333333334, ans=0.0 2023-11-19 08:30:15,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=651573.3333333334, ans=0.125 2023-11-19 08:30:24,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=651640.0, ans=0.1 2023-11-19 08:30:28,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2023-11-19 08:30:43,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=651773.3333333334, ans=0.125 2023-11-19 08:30:49,605 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=651773.3333333334, ans=0.0 2023-11-19 08:30:50,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=651773.3333333334, ans=0.125 2023-11-19 08:31:04,997 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1600, loss[loss=0.08991, simple_loss=0.1136, pruned_loss=0.02372, audio_tagging_loss=0.009382, over 15609.00 frames. ], tot_loss[loss=0.08845, simple_loss=0.1066, pruned_loss=0.02443, audio_tagging_loss=0.01071, over 3050860.82 frames. ], batch size: 57, lr: 7.88e-03, grad_scale: 32.0 2023-11-19 08:31:07,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.90 vs. limit=22.5 2023-11-19 08:31:08,547 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=651906.6666666666, ans=0.125 2023-11-19 08:31:20,267 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.774e+01 8.197e+01 8.883e+01 9.741e+01 1.380e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-19 08:31:20,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=651973.3333333334, ans=0.0 2023-11-19 08:31:21,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=651973.3333333334, ans=0.0 2023-11-19 08:32:00,804 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1650, loss[loss=0.09891, simple_loss=0.1237, pruned_loss=0.02764, audio_tagging_loss=0.009399, over 14633.00 frames. ], tot_loss[loss=0.08869, simple_loss=0.1069, pruned_loss=0.02443, audio_tagging_loss=0.01079, over 3047450.18 frames. ], batch size: 55, lr: 7.88e-03, grad_scale: 32.0 2023-11-19 08:32:16,576 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.62 vs. limit=10.0 2023-11-19 08:32:53,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=652506.6666666666, ans=0.0 2023-11-19 08:32:53,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2023-11-19 08:32:56,431 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1700, loss[loss=0.09872, simple_loss=0.1128, pruned_loss=0.02891, audio_tagging_loss=0.01343, over 15089.00 frames. ], tot_loss[loss=0.08898, simple_loss=0.1072, pruned_loss=0.0245, audio_tagging_loss=0.01087, over 3052330.59 frames. ], batch size: 58, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:32:57,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=652573.3333333334, ans=0.125 2023-11-19 08:33:03,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=652573.3333333334, ans=0.0 2023-11-19 08:33:13,286 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.601e+01 9.234e+01 1.036e+02 1.381e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 08:33:18,614 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.71 vs. limit=22.5 2023-11-19 08:33:24,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=652706.6666666666, ans=0.125 2023-11-19 08:33:29,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=652773.3333333334, ans=0.1 2023-11-19 08:33:30,678 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=652773.3333333334, ans=0.125 2023-11-19 08:33:41,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=652840.0, ans=0.125 2023-11-19 08:33:52,663 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1750, loss[loss=0.08018, simple_loss=0.08897, pruned_loss=0.02308, audio_tagging_loss=0.01261, over 15943.00 frames. ], tot_loss[loss=0.08827, simple_loss=0.1067, pruned_loss=0.02418, audio_tagging_loss=0.01075, over 3048794.94 frames. ], batch size: 64, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:34:00,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=652906.6666666666, ans=0.2 2023-11-19 08:34:06,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=652973.3333333334, ans=0.0 2023-11-19 08:34:23,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=653040.0, ans=0.125 2023-11-19 08:34:36,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=653173.3333333334, ans=0.0 2023-11-19 08:34:36,623 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=653173.3333333334, ans=0.125 2023-11-19 08:34:38,250 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2023-11-19 08:34:47,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=653240.0, ans=0.125 2023-11-19 08:34:48,506 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1800, loss[loss=0.09347, simple_loss=0.1064, pruned_loss=0.02997, audio_tagging_loss=0.01032, over 16555.00 frames. ], tot_loss[loss=0.08853, simple_loss=0.1071, pruned_loss=0.02431, audio_tagging_loss=0.01065, over 3049455.74 frames. ], batch size: 64, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:34:54,640 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=653240.0, ans=0.125 2023-11-19 08:35:05,315 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.385e+01 9.235e+01 9.785e+01 1.398e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 08:35:18,342 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=653373.3333333334, ans=0.0 2023-11-19 08:35:44,909 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1850, loss[loss=0.1105, simple_loss=0.143, pruned_loss=0.0316, audio_tagging_loss=0.007416, over 15676.00 frames. ], tot_loss[loss=0.08826, simple_loss=0.1069, pruned_loss=0.02417, audio_tagging_loss=0.01066, over 3049338.00 frames. ], batch size: 54, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:35:50,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=653573.3333333334, ans=0.125 2023-11-19 08:36:13,828 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=15.0 2023-11-19 08:36:17,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=653773.3333333334, ans=0.125 2023-11-19 08:36:23,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=653773.3333333334, ans=0.0 2023-11-19 08:36:31,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=653840.0, ans=0.0 2023-11-19 08:36:37,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=653840.0, ans=0.125 2023-11-19 08:36:40,895 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1900, loss[loss=0.08946, simple_loss=0.1, pruned_loss=0.02727, audio_tagging_loss=0.01217, over 14216.00 frames. ], tot_loss[loss=0.08814, simple_loss=0.1069, pruned_loss=0.02406, audio_tagging_loss=0.01062, over 3050025.92 frames. ], batch size: 54, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:36:52,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.53 vs. limit=10.0 2023-11-19 08:36:57,970 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.285e+01 9.091e+01 1.015e+02 1.507e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-19 08:36:58,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=653973.3333333334, ans=0.125 2023-11-19 08:37:11,650 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2023-11-19 08:37:31,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.78 vs. limit=15.0 2023-11-19 08:37:32,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=654173.3333333334, ans=0.125 2023-11-19 08:37:36,682 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 1950, loss[loss=0.09562, simple_loss=0.1168, pruned_loss=0.02854, audio_tagging_loss=0.00868, over 14841.00 frames. ], tot_loss[loss=0.08841, simple_loss=0.1072, pruned_loss=0.02427, audio_tagging_loss=0.01053, over 3049259.67 frames. ], batch size: 56, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:37:40,101 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.20 vs. limit=15.0 2023-11-19 08:38:10,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=654440.0, ans=0.2 2023-11-19 08:38:23,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=654506.6666666666, ans=0.125 2023-11-19 08:38:30,879 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=654506.6666666666, ans=0.0 2023-11-19 08:38:31,219 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=15.0 2023-11-19 08:38:32,851 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2000, loss[loss=0.08428, simple_loss=0.1018, pruned_loss=0.02485, audio_tagging_loss=0.008546, over 14855.00 frames. ], tot_loss[loss=0.08751, simple_loss=0.106, pruned_loss=0.02395, audio_tagging_loss=0.01057, over 3042816.40 frames. ], batch size: 57, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:38:45,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=654640.0, ans=0.1 2023-11-19 08:38:51,176 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.460e+01 8.516e+01 9.233e+01 1.009e+02 1.531e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 08:39:12,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=654773.3333333334, ans=0.125 2023-11-19 08:39:13,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=654773.3333333334, ans=0.0 2023-11-19 08:39:14,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=654773.3333333334, ans=0.125 2023-11-19 08:39:20,108 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=654840.0, ans=0.1 2023-11-19 08:39:28,857 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2050, loss[loss=0.1019, simple_loss=0.1247, pruned_loss=0.02758, audio_tagging_loss=0.012, over 15333.00 frames. ], tot_loss[loss=0.08775, simple_loss=0.1063, pruned_loss=0.02415, audio_tagging_loss=0.01047, over 3044053.80 frames. ], batch size: 59, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:40:25,108 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2100, loss[loss=0.06884, simple_loss=0.07991, pruned_loss=0.01541, audio_tagging_loss=0.01348, over 14538.00 frames. ], tot_loss[loss=0.08781, simple_loss=0.106, pruned_loss=0.02435, audio_tagging_loss=0.01046, over 3046016.70 frames. ], batch size: 57, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:40:42,515 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.322e+01 9.112e+01 9.913e+01 1.952e+02, threshold=1.822e+02, percent-clipped=1.0 2023-11-19 08:41:02,050 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=22.5 2023-11-19 08:41:09,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=655506.6666666666, ans=0.125 2023-11-19 08:41:17,308 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=655506.6666666666, ans=0.2 2023-11-19 08:41:20,754 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2150, loss[loss=0.08663, simple_loss=0.1135, pruned_loss=0.02089, audio_tagging_loss=0.009007, over 15481.00 frames. ], tot_loss[loss=0.08794, simple_loss=0.1059, pruned_loss=0.02453, audio_tagging_loss=0.01047, over 3040766.88 frames. ], batch size: 55, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:41:40,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=655640.0, ans=0.1 2023-11-19 08:41:42,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=655706.6666666666, ans=0.0 2023-11-19 08:41:52,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=655773.3333333334, ans=0.125 2023-11-19 08:41:52,994 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=655773.3333333334, ans=0.0 2023-11-19 08:41:53,797 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:41:55,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=655773.3333333334, ans=0.125 2023-11-19 08:41:55,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=655773.3333333334, ans=0.02 2023-11-19 08:42:06,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=655840.0, ans=0.5 2023-11-19 08:42:07,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=655840.0, ans=0.0 2023-11-19 08:42:14,244 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=655840.0, ans=0.0 2023-11-19 08:42:16,762 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2200, loss[loss=0.1037, simple_loss=0.1268, pruned_loss=0.03118, audio_tagging_loss=0.009151, over 15692.00 frames. ], tot_loss[loss=0.08865, simple_loss=0.107, pruned_loss=0.02468, audio_tagging_loss=0.01046, over 3041660.11 frames. ], batch size: 62, lr: 7.85e-03, grad_scale: 16.0 2023-11-19 08:42:29,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=655973.3333333334, ans=0.0 2023-11-19 08:42:32,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=655973.3333333334, ans=0.125 2023-11-19 08:42:33,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=655973.3333333334, ans=0.1 2023-11-19 08:42:34,502 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.518e+01 8.336e+01 8.931e+01 9.747e+01 1.930e+02, threshold=1.786e+02, percent-clipped=1.0 2023-11-19 08:42:41,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=656040.0, ans=0.0 2023-11-19 08:42:49,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=656106.6666666666, ans=0.125 2023-11-19 08:42:59,685 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2023-11-19 08:43:12,630 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2250, loss[loss=0.08379, simple_loss=0.105, pruned_loss=0.02373, audio_tagging_loss=0.007537, over 16312.00 frames. ], tot_loss[loss=0.09012, simple_loss=0.1088, pruned_loss=0.0252, audio_tagging_loss=0.0105, over 3041192.25 frames. ], batch size: 60, lr: 7.85e-03, grad_scale: 16.0 2023-11-19 08:43:19,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=656240.0, ans=0.0 2023-11-19 08:43:53,352 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=656440.0, ans=0.125 2023-11-19 08:43:58,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=656506.6666666666, ans=0.125 2023-11-19 08:43:59,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=656506.6666666666, ans=0.125 2023-11-19 08:44:08,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2023-11-19 08:44:08,701 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2300, loss[loss=0.07309, simple_loss=0.09682, pruned_loss=0.01565, audio_tagging_loss=0.009027, over 16140.00 frames. ], tot_loss[loss=0.08995, simple_loss=0.1089, pruned_loss=0.02502, audio_tagging_loss=0.01048, over 3042085.25 frames. ], batch size: 62, lr: 7.85e-03, grad_scale: 16.0 2023-11-19 08:44:26,703 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.756e+01 8.387e+01 9.079e+01 9.954e+01 1.397e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-19 08:44:36,249 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:44:57,979 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:45:04,357 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2350, loss[loss=0.09249, simple_loss=0.1082, pruned_loss=0.02941, audio_tagging_loss=0.009001, over 14357.00 frames. ], tot_loss[loss=0.08901, simple_loss=0.1077, pruned_loss=0.02461, audio_tagging_loss=0.01056, over 3037243.54 frames. ], batch size: 57, lr: 7.85e-03, grad_scale: 16.0 2023-11-19 08:45:15,221 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=656973.3333333334, ans=0.0 2023-11-19 08:45:24,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=656973.3333333334, ans=0.0 2023-11-19 08:45:26,868 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=657040.0, ans=0.2 2023-11-19 08:45:38,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=657106.6666666666, ans=0.125 2023-11-19 08:45:38,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=657106.6666666666, ans=0.2 2023-11-19 08:45:38,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=657106.6666666666, ans=0.125 2023-11-19 08:45:46,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=657106.6666666666, ans=0.125 2023-11-19 08:46:00,325 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2400, loss[loss=0.1029, simple_loss=0.1325, pruned_loss=0.03038, audio_tagging_loss=0.006299, over 16434.00 frames. ], tot_loss[loss=0.08973, simple_loss=0.1087, pruned_loss=0.02475, audio_tagging_loss=0.01064, over 3045171.80 frames. ], batch size: 58, lr: 7.85e-03, grad_scale: 32.0 2023-11-19 08:46:01,506 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=657240.0, ans=0.2 2023-11-19 08:46:01,628 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=657240.0, ans=0.0 2023-11-19 08:46:02,558 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=657240.0, ans=0.1 2023-11-19 08:46:07,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=657240.0, ans=0.5 2023-11-19 08:46:08,963 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=657240.0, ans=0.125 2023-11-19 08:46:09,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2023-11-19 08:46:17,891 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 8.346e+01 9.299e+01 9.977e+01 1.391e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 08:46:19,231 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=657306.6666666666, ans=0.125 2023-11-19 08:46:20,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=657306.6666666666, ans=0.025 2023-11-19 08:46:26,563 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-11-19 08:46:33,122 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=657440.0, ans=0.0 2023-11-19 08:46:33,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=657440.0, ans=0.125 2023-11-19 08:46:47,438 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=657506.6666666666, ans=0.125 2023-11-19 08:46:56,228 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2450, loss[loss=0.07247, simple_loss=0.09371, pruned_loss=0.01616, audio_tagging_loss=0.009455, over 14946.00 frames. ], tot_loss[loss=0.09013, simple_loss=0.1089, pruned_loss=0.02494, audio_tagging_loss=0.01074, over 3045632.90 frames. ], batch size: 57, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:47:03,008 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.45 vs. limit=15.0 2023-11-19 08:47:21,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=657706.6666666666, ans=0.125 2023-11-19 08:47:25,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=657706.6666666666, ans=0.1 2023-11-19 08:47:31,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=657773.3333333334, ans=0.2 2023-11-19 08:47:43,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.70 vs. limit=15.0 2023-11-19 08:47:50,410 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=657906.6666666666, ans=0.125 2023-11-19 08:47:51,715 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2500, loss[loss=0.09213, simple_loss=0.1092, pruned_loss=0.02647, audio_tagging_loss=0.01107, over 15620.00 frames. ], tot_loss[loss=0.0898, simple_loss=0.1086, pruned_loss=0.02486, audio_tagging_loss=0.01065, over 3040710.28 frames. ], batch size: 57, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:47:52,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=657906.6666666666, ans=0.125 2023-11-19 08:47:57,752 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=657906.6666666666, ans=0.125 2023-11-19 08:48:09,790 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.559e+01 8.238e+01 9.130e+01 9.958e+01 1.355e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-19 08:48:17,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=658040.0, ans=0.125 2023-11-19 08:48:42,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=658173.3333333334, ans=0.2 2023-11-19 08:48:48,242 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2550, loss[loss=0.1222, simple_loss=0.1552, pruned_loss=0.03785, audio_tagging_loss=0.006742, over 15628.00 frames. ], tot_loss[loss=0.08978, simple_loss=0.1088, pruned_loss=0.0249, audio_tagging_loss=0.0105, over 3043983.50 frames. ], batch size: 58, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:48:50,591 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:48:51,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=658240.0, ans=0.125 2023-11-19 08:49:00,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=658306.6666666666, ans=0.2 2023-11-19 08:49:14,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=658373.3333333334, ans=0.125 2023-11-19 08:49:23,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=658440.0, ans=0.125 2023-11-19 08:49:33,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=658506.6666666666, ans=0.125 2023-11-19 08:49:34,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=658506.6666666666, ans=0.05 2023-11-19 08:49:39,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=22.5 2023-11-19 08:49:41,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=658506.6666666666, ans=0.125 2023-11-19 08:49:42,416 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2023-11-19 08:49:43,987 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2600, loss[loss=0.09808, simple_loss=0.1228, pruned_loss=0.02962, audio_tagging_loss=0.007039, over 15488.00 frames. ], tot_loss[loss=0.08981, simple_loss=0.1091, pruned_loss=0.02495, audio_tagging_loss=0.01029, over 3044064.67 frames. ], batch size: 56, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:49:44,543 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2023-11-19 08:50:01,805 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.704e+01 9.558e+01 1.082e+02 2.151e+02, threshold=1.912e+02, percent-clipped=1.0 2023-11-19 08:50:22,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=658773.3333333334, ans=0.1 2023-11-19 08:50:30,801 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=658840.0, ans=0.125 2023-11-19 08:50:36,045 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=658840.0, ans=0.0 2023-11-19 08:50:40,080 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2650, loss[loss=0.08319, simple_loss=0.09717, pruned_loss=0.01972, audio_tagging_loss=0.01489, over 15238.00 frames. ], tot_loss[loss=0.08969, simple_loss=0.1088, pruned_loss=0.02492, audio_tagging_loss=0.01036, over 3048730.94 frames. ], batch size: 56, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:50:41,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=658906.6666666666, ans=0.125 2023-11-19 08:50:47,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=15.0 2023-11-19 08:51:30,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=659173.3333333334, ans=0.125 2023-11-19 08:51:36,936 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2700, loss[loss=0.06881, simple_loss=0.08211, pruned_loss=0.0159, audio_tagging_loss=0.01185, over 14643.00 frames. ], tot_loss[loss=0.08887, simple_loss=0.1078, pruned_loss=0.02462, audio_tagging_loss=0.01036, over 3043519.06 frames. ], batch size: 56, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:51:41,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=659240.0, ans=0.125 2023-11-19 08:51:54,344 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.151e+01 8.620e+01 9.365e+01 1.021e+02 1.535e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-19 08:51:54,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=659306.6666666666, ans=0.05 2023-11-19 08:52:07,219 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=659373.3333333334, ans=0.09899494936611666 2023-11-19 08:52:20,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=659506.6666666666, ans=0.125 2023-11-19 08:52:31,919 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2750, loss[loss=0.08852, simple_loss=0.111, pruned_loss=0.02423, audio_tagging_loss=0.008806, over 15687.00 frames. ], tot_loss[loss=0.0884, simple_loss=0.1067, pruned_loss=0.02465, audio_tagging_loss=0.01041, over 3048454.19 frames. ], batch size: 58, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:52:36,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=659573.3333333334, ans=0.125 2023-11-19 08:52:45,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2023-11-19 08:53:03,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=659706.6666666666, ans=0.0 2023-11-19 08:53:16,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=659840.0, ans=0.125 2023-11-19 08:53:18,436 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:53:20,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2023-11-19 08:53:21,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=659840.0, ans=0.035 2023-11-19 08:53:26,881 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2800, loss[loss=0.09897, simple_loss=0.1241, pruned_loss=0.02972, audio_tagging_loss=0.007179, over 15265.00 frames. ], tot_loss[loss=0.08776, simple_loss=0.1058, pruned_loss=0.02437, audio_tagging_loss=0.0105, over 3039363.26 frames. ], batch size: 54, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:53:30,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=659906.6666666666, ans=0.125 2023-11-19 08:53:30,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=659906.6666666666, ans=0.0 2023-11-19 08:53:30,906 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.52 vs. limit=15.0 2023-11-19 08:53:45,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.607e+01 8.203e+01 8.899e+01 9.500e+01 1.297e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 08:53:50,003 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2023-11-19 08:54:00,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=660106.6666666666, ans=0.2 2023-11-19 08:54:06,038 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=660106.6666666666, ans=0.0 2023-11-19 08:54:19,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=660173.3333333334, ans=0.0 2023-11-19 08:54:21,477 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=660240.0, ans=0.5 2023-11-19 08:54:22,320 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2850, loss[loss=0.1099, simple_loss=0.1324, pruned_loss=0.03413, audio_tagging_loss=0.009567, over 16550.00 frames. ], tot_loss[loss=0.08807, simple_loss=0.1063, pruned_loss=0.0245, audio_tagging_loss=0.01044, over 3037794.77 frames. ], batch size: 64, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:54:43,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=660306.6666666666, ans=0.2 2023-11-19 08:54:51,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-11-19 08:55:01,349 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.49 vs. limit=15.0 2023-11-19 08:55:18,650 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2900, loss[loss=0.09064, simple_loss=0.1144, pruned_loss=0.02676, audio_tagging_loss=0.006704, over 14893.00 frames. ], tot_loss[loss=0.08785, simple_loss=0.106, pruned_loss=0.0244, audio_tagging_loss=0.01043, over 3031247.33 frames. ], batch size: 59, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:55:29,654 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.83 vs. limit=15.0 2023-11-19 08:55:36,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.534e+01 9.333e+01 1.020e+02 1.874e+02, threshold=1.867e+02, percent-clipped=1.0 2023-11-19 08:56:03,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=660840.0, ans=0.0 2023-11-19 08:56:13,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=660906.6666666666, ans=0.125 2023-11-19 08:56:14,645 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 2950, loss[loss=0.0873, simple_loss=0.1184, pruned_loss=0.02095, audio_tagging_loss=0.007159, over 14544.00 frames. ], tot_loss[loss=0.08843, simple_loss=0.1069, pruned_loss=0.02449, audio_tagging_loss=0.01051, over 3037065.25 frames. ], batch size: 52, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 08:56:16,470 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.05 vs. limit=15.0 2023-11-19 08:56:20,223 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=660906.6666666666, ans=0.0 2023-11-19 08:56:24,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=660906.6666666666, ans=0.0 2023-11-19 08:56:27,476 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.85 vs. limit=22.5 2023-11-19 08:56:40,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=661040.0, ans=0.2 2023-11-19 08:56:57,659 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=661106.6666666666, ans=0.0 2023-11-19 08:57:10,812 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3000, loss[loss=0.08535, simple_loss=0.1056, pruned_loss=0.02104, audio_tagging_loss=0.01151, over 14636.00 frames. ], tot_loss[loss=0.08853, simple_loss=0.1068, pruned_loss=0.02456, audio_tagging_loss=0.01056, over 3041051.59 frames. ], batch size: 56, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 08:57:10,812 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 08:57:44,068 INFO [train_asr.py:1147] (1/4) Epoch 9, validation: loss=0.06604, simple_loss=0.05618, pruned_loss=0.006775, audio_tagging_loss=0.03117, over 4681554.00 frames. 2023-11-19 08:57:44,068 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 08:57:53,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=661306.6666666666, ans=0.125 2023-11-19 08:57:54,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=661306.6666666666, ans=0.1 2023-11-19 08:57:55,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=15.0 2023-11-19 08:58:01,249 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.176e+01 8.862e+01 9.645e+01 1.062e+02 1.575e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-19 08:58:10,044 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=661373.3333333334, ans=0.2 2023-11-19 08:58:28,068 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=661506.6666666666, ans=0.5 2023-11-19 08:58:28,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=661506.6666666666, ans=0.025 2023-11-19 08:58:32,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=661506.6666666666, ans=0.0 2023-11-19 08:58:39,499 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3050, loss[loss=0.09324, simple_loss=0.1231, pruned_loss=0.02262, audio_tagging_loss=0.009086, over 15045.00 frames. ], tot_loss[loss=0.0887, simple_loss=0.1071, pruned_loss=0.02466, audio_tagging_loss=0.01051, over 3043611.92 frames. ], batch size: 57, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 08:58:49,960 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=661640.0, ans=0.0 2023-11-19 08:58:50,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=661640.0, ans=12.0 2023-11-19 08:58:57,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=661640.0, ans=0.125 2023-11-19 08:58:59,506 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.02 vs. limit=22.5 2023-11-19 08:59:02,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=661706.6666666666, ans=0.0 2023-11-19 08:59:03,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=661706.6666666666, ans=0.125 2023-11-19 08:59:12,953 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:59:17,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=661773.3333333334, ans=0.2 2023-11-19 08:59:17,967 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.25 vs. limit=15.0 2023-11-19 08:59:18,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=661773.3333333334, ans=0.125 2023-11-19 08:59:20,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=661773.3333333334, ans=0.125 2023-11-19 08:59:23,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=661773.3333333334, ans=0.0 2023-11-19 08:59:24,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=661840.0, ans=0.2 2023-11-19 08:59:36,838 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3100, loss[loss=0.08844, simple_loss=0.1088, pruned_loss=0.02742, audio_tagging_loss=0.00661, over 14957.00 frames. ], tot_loss[loss=0.08943, simple_loss=0.1081, pruned_loss=0.02483, audio_tagging_loss=0.01055, over 3045350.57 frames. ], batch size: 56, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 08:59:46,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2023-11-19 08:59:49,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=661973.3333333334, ans=0.125 2023-11-19 08:59:54,977 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.514e+01 9.258e+01 1.059e+02 1.772e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-19 09:00:01,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=662040.0, ans=0.0 2023-11-19 09:00:19,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=662106.6666666666, ans=0.0 2023-11-19 09:00:29,297 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662173.3333333334, ans=0.1 2023-11-19 09:00:32,265 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3150, loss[loss=0.06901, simple_loss=0.08546, pruned_loss=0.01751, audio_tagging_loss=0.008772, over 14959.00 frames. ], tot_loss[loss=0.089, simple_loss=0.1071, pruned_loss=0.02472, audio_tagging_loss=0.01072, over 3040975.02 frames. ], batch size: 56, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 09:00:59,756 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2023-11-19 09:01:02,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662373.3333333334, ans=0.1 2023-11-19 09:01:28,539 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3200, loss[loss=0.09667, simple_loss=0.1103, pruned_loss=0.02972, audio_tagging_loss=0.01179, over 15066.00 frames. ], tot_loss[loss=0.08941, simple_loss=0.1077, pruned_loss=0.02475, audio_tagging_loss=0.01084, over 3042951.19 frames. ], batch size: 57, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 09:01:28,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=662573.3333333334, ans=0.125 2023-11-19 09:01:46,780 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.390e+01 8.354e+01 9.192e+01 1.025e+02 1.348e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 09:01:50,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=662706.6666666666, ans=0.0 2023-11-19 09:01:55,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=662706.6666666666, ans=0.0 2023-11-19 09:01:56,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=662706.6666666666, ans=0.125 2023-11-19 09:02:07,130 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=662773.3333333334, ans=0.035 2023-11-19 09:02:13,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=662840.0, ans=0.125 2023-11-19 09:02:24,476 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3250, loss[loss=0.08165, simple_loss=0.1039, pruned_loss=0.019, audio_tagging_loss=0.01071, over 15663.00 frames. ], tot_loss[loss=0.0893, simple_loss=0.1078, pruned_loss=0.0245, audio_tagging_loss=0.01091, over 3052606.32 frames. ], batch size: 59, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:02:31,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=662906.6666666666, ans=0.125 2023-11-19 09:02:37,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=662973.3333333334, ans=0.0 2023-11-19 09:03:07,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=663106.6666666666, ans=0.2 2023-11-19 09:03:20,475 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3300, loss[loss=0.09559, simple_loss=0.1132, pruned_loss=0.02754, audio_tagging_loss=0.01146, over 15380.00 frames. ], tot_loss[loss=0.08905, simple_loss=0.1074, pruned_loss=0.02443, audio_tagging_loss=0.01091, over 3046675.36 frames. ], batch size: 56, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:03:26,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=663240.0, ans=10.0 2023-11-19 09:03:35,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=663306.6666666666, ans=0.125 2023-11-19 09:03:38,111 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=10.0 2023-11-19 09:03:38,458 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.893e+01 9.711e+01 1.106e+02 1.510e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-19 09:03:39,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.38 vs. limit=10.0 2023-11-19 09:03:47,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=663373.3333333334, ans=0.125 2023-11-19 09:03:47,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=663373.3333333334, ans=0.125 2023-11-19 09:03:47,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2023-11-19 09:03:48,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=663373.3333333334, ans=0.0 2023-11-19 09:03:56,815 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:04:10,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=663506.6666666666, ans=0.1 2023-11-19 09:04:11,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=663506.6666666666, ans=0.0 2023-11-19 09:04:16,819 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3350, loss[loss=0.08899, simple_loss=0.1035, pruned_loss=0.02171, audio_tagging_loss=0.01554, over 15924.00 frames. ], tot_loss[loss=0.08892, simple_loss=0.1074, pruned_loss=0.02442, audio_tagging_loss=0.0108, over 3049317.33 frames. ], batch size: 60, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:04:39,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=663706.6666666666, ans=0.0 2023-11-19 09:04:54,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=663773.3333333334, ans=0.2 2023-11-19 09:05:04,383 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2023-11-19 09:05:08,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=663840.0, ans=0.125 2023-11-19 09:05:11,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=21.01 vs. limit=15.0 2023-11-19 09:05:12,839 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3400, loss[loss=0.06567, simple_loss=0.08124, pruned_loss=0.01428, audio_tagging_loss=0.01078, over 15196.00 frames. ], tot_loss[loss=0.08891, simple_loss=0.1077, pruned_loss=0.02442, audio_tagging_loss=0.01065, over 3049923.66 frames. ], batch size: 58, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:05:30,638 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 8.652e+01 9.509e+01 1.042e+02 1.757e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 09:05:34,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=664040.0, ans=0.125 2023-11-19 09:05:54,941 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.82 vs. limit=22.5 2023-11-19 09:06:08,674 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3450, loss[loss=0.07257, simple_loss=0.09614, pruned_loss=0.01408, audio_tagging_loss=0.01042, over 14875.00 frames. ], tot_loss[loss=0.08878, simple_loss=0.1081, pruned_loss=0.02421, audio_tagging_loss=0.01054, over 3050706.82 frames. ], batch size: 55, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:06:08,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=664240.0, ans=0.05 2023-11-19 09:06:19,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=664306.6666666666, ans=0.2 2023-11-19 09:06:22,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=664306.6666666666, ans=0.1 2023-11-19 09:06:22,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2023-11-19 09:06:26,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.47 vs. limit=15.0 2023-11-19 09:06:27,166 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=664306.6666666666, ans=0.125 2023-11-19 09:07:04,549 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3500, loss[loss=0.09637, simple_loss=0.1158, pruned_loss=0.02885, audio_tagging_loss=0.009635, over 15175.00 frames. ], tot_loss[loss=0.08902, simple_loss=0.1083, pruned_loss=0.02452, audio_tagging_loss=0.01036, over 3039190.04 frames. ], batch size: 56, lr: 7.80e-03, grad_scale: 32.0 2023-11-19 09:07:15,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=664640.0, ans=0.0 2023-11-19 09:07:22,601 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.336e+01 8.969e+01 9.703e+01 1.247e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-19 09:07:25,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=664640.0, ans=0.1 2023-11-19 09:07:32,797 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:07:48,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=664840.0, ans=0.0 2023-11-19 09:07:48,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=664840.0, ans=0.125 2023-11-19 09:07:48,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=664840.0, ans=0.125 2023-11-19 09:07:49,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=664840.0, ans=0.1 2023-11-19 09:07:52,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=664840.0, ans=0.125 2023-11-19 09:08:00,887 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3550, loss[loss=0.07963, simple_loss=0.09099, pruned_loss=0.02187, audio_tagging_loss=0.01227, over 14529.00 frames. ], tot_loss[loss=0.08867, simple_loss=0.1076, pruned_loss=0.0245, audio_tagging_loss=0.01037, over 3042589.39 frames. ], batch size: 56, lr: 7.80e-03, grad_scale: 32.0 2023-11-19 09:08:05,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=664906.6666666666, ans=0.0 2023-11-19 09:08:11,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=664973.3333333334, ans=0.125 2023-11-19 09:08:12,544 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-19 09:08:23,207 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:08:56,465 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3600, loss[loss=0.07792, simple_loss=0.08786, pruned_loss=0.02354, audio_tagging_loss=0.01045, over 15379.00 frames. ], tot_loss[loss=0.08747, simple_loss=0.106, pruned_loss=0.02406, audio_tagging_loss=0.01042, over 3038992.17 frames. ], batch size: 58, lr: 7.80e-03, grad_scale: 32.0 2023-11-19 09:09:08,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=665306.6666666666, ans=0.125 2023-11-19 09:09:15,142 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.775e+01 8.230e+01 8.845e+01 9.597e+01 1.384e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 09:09:25,220 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=665373.3333333334, ans=0.0 2023-11-19 09:09:52,794 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3650, loss[loss=0.05298, simple_loss=0.05525, pruned_loss=0.01424, audio_tagging_loss=0.01112, over 14695.00 frames. ], tot_loss[loss=0.08771, simple_loss=0.1059, pruned_loss=0.02433, audio_tagging_loss=0.01043, over 3038939.78 frames. ], batch size: 58, lr: 7.80e-03, grad_scale: 16.0 2023-11-19 09:10:19,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=665706.6666666666, ans=0.1 2023-11-19 09:10:28,727 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.61 vs. limit=22.5 2023-11-19 09:10:35,390 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=665773.3333333334, ans=0.125 2023-11-19 09:10:39,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=665840.0, ans=0.125 2023-11-19 09:10:41,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=665840.0, ans=0.0 2023-11-19 09:10:43,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=665840.0, ans=0.125 2023-11-19 09:10:48,458 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3700, loss[loss=0.08558, simple_loss=0.1004, pruned_loss=0.02695, audio_tagging_loss=0.008441, over 14700.00 frames. ], tot_loss[loss=0.0874, simple_loss=0.1058, pruned_loss=0.024, audio_tagging_loss=0.01049, over 3045468.93 frames. ], batch size: 54, lr: 7.80e-03, grad_scale: 16.0 2023-11-19 09:10:52,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=665906.6666666666, ans=0.04949747468305833 2023-11-19 09:10:59,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=665973.3333333334, ans=0.0 2023-11-19 09:11:06,729 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.089e+01 8.367e+01 9.183e+01 1.016e+02 1.508e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-19 09:11:29,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.28 vs. limit=15.0 2023-11-19 09:11:31,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.86 vs. limit=22.5 2023-11-19 09:11:43,405 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3750, loss[loss=0.09015, simple_loss=0.1056, pruned_loss=0.02663, audio_tagging_loss=0.01074, over 15633.00 frames. ], tot_loss[loss=0.08747, simple_loss=0.1062, pruned_loss=0.02392, audio_tagging_loss=0.01043, over 3039869.60 frames. ], batch size: 58, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:11:50,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.59 vs. limit=15.0 2023-11-19 09:12:09,669 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=666373.3333333334, ans=0.125 2023-11-19 09:12:13,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=666373.3333333334, ans=0.2 2023-11-19 09:12:20,551 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:12:28,479 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.90 vs. limit=15.0 2023-11-19 09:12:32,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=666506.6666666666, ans=0.0 2023-11-19 09:12:39,253 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3800, loss[loss=0.07319, simple_loss=0.09752, pruned_loss=0.01345, audio_tagging_loss=0.01099, over 15682.00 frames. ], tot_loss[loss=0.08871, simple_loss=0.1077, pruned_loss=0.02438, audio_tagging_loss=0.01047, over 3048633.38 frames. ], batch size: 57, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:12:45,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=666573.3333333334, ans=0.125 2023-11-19 09:13:00,319 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=666640.0, ans=0.125 2023-11-19 09:13:01,080 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.623e+01 8.849e+01 9.395e+01 1.021e+02 1.360e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 09:13:11,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=666706.6666666666, ans=0.0 2023-11-19 09:13:11,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.68 vs. limit=22.5 2023-11-19 09:13:22,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=666773.3333333334, ans=0.1 2023-11-19 09:13:26,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=666840.0, ans=0.125 2023-11-19 09:13:37,597 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3850, loss[loss=0.108, simple_loss=0.1413, pruned_loss=0.02779, audio_tagging_loss=0.009576, over 15722.00 frames. ], tot_loss[loss=0.08824, simple_loss=0.1069, pruned_loss=0.02412, audio_tagging_loss=0.01064, over 3037857.01 frames. ], batch size: 56, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:13:38,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=666906.6666666666, ans=0.125 2023-11-19 09:14:05,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=667040.0, ans=0.125 2023-11-19 09:14:09,379 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=667040.0, ans=0.125 2023-11-19 09:14:24,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=667173.3333333334, ans=0.0 2023-11-19 09:14:29,093 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=667173.3333333334, ans=0.1 2023-11-19 09:14:30,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=667173.3333333334, ans=0.0 2023-11-19 09:14:34,206 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3900, loss[loss=0.08708, simple_loss=0.1138, pruned_loss=0.02253, audio_tagging_loss=0.007633, over 15715.00 frames. ], tot_loss[loss=0.08801, simple_loss=0.1063, pruned_loss=0.02418, audio_tagging_loss=0.01069, over 3038465.64 frames. ], batch size: 57, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:14:44,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=667306.6666666666, ans=0.125 2023-11-19 09:14:44,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=667306.6666666666, ans=0.125 2023-11-19 09:14:48,712 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=667306.6666666666, ans=0.125 2023-11-19 09:14:52,706 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 8.271e+01 8.958e+01 9.768e+01 1.292e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-19 09:14:55,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=667373.3333333334, ans=0.125 2023-11-19 09:15:05,290 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=667373.3333333334, ans=0.0 2023-11-19 09:15:18,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=22.5 2023-11-19 09:15:19,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=667506.6666666666, ans=0.2 2023-11-19 09:15:26,175 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2023-11-19 09:15:30,405 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 3950, loss[loss=0.07887, simple_loss=0.09483, pruned_loss=0.02188, audio_tagging_loss=0.009569, over 14916.00 frames. ], tot_loss[loss=0.08798, simple_loss=0.1062, pruned_loss=0.02411, audio_tagging_loss=0.01079, over 3042418.86 frames. ], batch size: 58, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:15:35,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=667573.3333333334, ans=0.0 2023-11-19 09:15:35,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=667573.3333333334, ans=0.1 2023-11-19 09:15:40,206 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=667640.0, ans=0.1 2023-11-19 09:15:56,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=667706.6666666666, ans=0.125 2023-11-19 09:15:57,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=667706.6666666666, ans=0.125 2023-11-19 09:15:58,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=667706.6666666666, ans=0.125 2023-11-19 09:16:01,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=667706.6666666666, ans=0.2 2023-11-19 09:16:01,809 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.35 vs. limit=6.0 2023-11-19 09:16:06,967 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2023-11-19 09:16:19,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=667840.0, ans=0.0 2023-11-19 09:16:25,232 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4000, loss[loss=0.05342, simple_loss=0.05946, pruned_loss=0.01509, audio_tagging_loss=0.008595, over 14465.00 frames. ], tot_loss[loss=0.08866, simple_loss=0.1069, pruned_loss=0.02442, audio_tagging_loss=0.01078, over 3041685.56 frames. ], batch size: 54, lr: 7.78e-03, grad_scale: 32.0 2023-11-19 09:16:26,786 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2023-11-19 09:16:29,183 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=667906.6666666666, ans=0.125 2023-11-19 09:16:37,852 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=667973.3333333334, ans=0.125 2023-11-19 09:16:45,238 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.595e+01 9.425e+01 1.030e+02 1.465e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-19 09:16:52,368 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.75 vs. limit=15.0 2023-11-19 09:17:06,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=668106.6666666666, ans=0.1 2023-11-19 09:17:09,389 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=668173.3333333334, ans=0.0 2023-11-19 09:17:10,849 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=668173.3333333334, ans=0.0 2023-11-19 09:17:11,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=668173.3333333334, ans=0.1 2023-11-19 09:17:22,424 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4050, loss[loss=0.074, simple_loss=0.08123, pruned_loss=0.02059, audio_tagging_loss=0.0128, over 14378.00 frames. ], tot_loss[loss=0.08903, simple_loss=0.1075, pruned_loss=0.02442, audio_tagging_loss=0.01089, over 3038030.13 frames. ], batch size: 57, lr: 7.78e-03, grad_scale: 32.0 2023-11-19 09:17:23,524 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:17:28,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=668240.0, ans=0.0 2023-11-19 09:17:51,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=668373.3333333334, ans=0.125 2023-11-19 09:18:02,501 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=668440.0, ans=0.0 2023-11-19 09:18:17,769 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4100, loss[loss=0.063, simple_loss=0.07783, pruned_loss=0.01319, audio_tagging_loss=0.0109, over 15033.00 frames. ], tot_loss[loss=0.08876, simple_loss=0.1072, pruned_loss=0.02434, audio_tagging_loss=0.01081, over 3037722.16 frames. ], batch size: 57, lr: 7.78e-03, grad_scale: 16.0 2023-11-19 09:18:37,942 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 8.527e+01 9.124e+01 9.950e+01 1.525e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 09:18:53,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=668773.3333333334, ans=0.2 2023-11-19 09:19:02,701 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.55 vs. limit=15.0 2023-11-19 09:19:11,164 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.10 vs. limit=15.0 2023-11-19 09:19:13,572 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4150, loss[loss=0.07993, simple_loss=0.09477, pruned_loss=0.0231, audio_tagging_loss=0.009452, over 14856.00 frames. ], tot_loss[loss=0.08856, simple_loss=0.1069, pruned_loss=0.02428, audio_tagging_loss=0.01082, over 3042530.99 frames. ], batch size: 56, lr: 7.78e-03, grad_scale: 16.0 2023-11-19 09:19:23,928 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=668973.3333333334, ans=0.125 2023-11-19 09:19:23,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=668973.3333333334, ans=0.1 2023-11-19 09:19:23,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=668973.3333333334, ans=0.125 2023-11-19 09:19:24,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=668973.3333333334, ans=0.125 2023-11-19 09:19:45,982 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=669040.0, ans=0.125 2023-11-19 09:19:48,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=669106.6666666666, ans=0.125 2023-11-19 09:19:53,153 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:19:53,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=669106.6666666666, ans=0.125 2023-11-19 09:20:03,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=669173.3333333334, ans=0.125 2023-11-19 09:20:10,264 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4200, loss[loss=0.09479, simple_loss=0.1175, pruned_loss=0.02758, audio_tagging_loss=0.00849, over 15631.00 frames. ], tot_loss[loss=0.08901, simple_loss=0.1078, pruned_loss=0.02457, audio_tagging_loss=0.01052, over 3044578.39 frames. ], batch size: 57, lr: 7.78e-03, grad_scale: 16.0 2023-11-19 09:20:14,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=669240.0, ans=0.0 2023-11-19 09:20:16,396 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.94 vs. limit=22.5 2023-11-19 09:20:18,593 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=669240.0, ans=0.125 2023-11-19 09:20:30,746 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.637e+01 8.694e+01 9.811e+01 1.116e+02 1.412e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-19 09:20:47,811 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2023-11-19 09:20:58,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=669506.6666666666, ans=0.1 2023-11-19 09:21:06,490 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4250, loss[loss=0.06533, simple_loss=0.08033, pruned_loss=0.01689, audio_tagging_loss=0.00828, over 14694.00 frames. ], tot_loss[loss=0.08882, simple_loss=0.1079, pruned_loss=0.02443, audio_tagging_loss=0.01042, over 3042891.53 frames. ], batch size: 56, lr: 7.77e-03, grad_scale: 16.0 2023-11-19 09:21:06,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=669573.3333333334, ans=0.2 2023-11-19 09:21:08,876 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=669573.3333333334, ans=0.2 2023-11-19 09:21:24,300 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=669640.0, ans=0.025 2023-11-19 09:21:29,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=669706.6666666666, ans=0.125 2023-11-19 09:21:32,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=669706.6666666666, ans=0.95 2023-11-19 09:21:38,825 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.25 vs. limit=10.0 2023-11-19 09:21:42,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=669773.3333333334, ans=0.07 2023-11-19 09:21:49,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=669773.3333333334, ans=0.125 2023-11-19 09:22:02,677 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4300, loss[loss=0.1096, simple_loss=0.1338, pruned_loss=0.03371, audio_tagging_loss=0.00894, over 14605.00 frames. ], tot_loss[loss=0.08873, simple_loss=0.1082, pruned_loss=0.02439, audio_tagging_loss=0.01026, over 3036672.66 frames. ], batch size: 56, lr: 7.77e-03, grad_scale: 16.0 2023-11-19 09:22:11,834 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=669906.6666666666, ans=0.1 2023-11-19 09:22:16,167 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=669973.3333333334, ans=0.125 2023-11-19 09:22:19,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=669973.3333333334, ans=0.125 2023-11-19 09:22:21,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=669973.3333333334, ans=0.0 2023-11-19 09:22:22,813 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.156e+01 8.641e+01 9.552e+01 1.070e+02 1.517e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-19 09:22:26,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=670040.0, ans=0.05 2023-11-19 09:22:45,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=670106.6666666666, ans=0.125 2023-11-19 09:22:51,220 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.40 vs. limit=5.0 2023-11-19 09:22:58,317 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4350, loss[loss=0.09183, simple_loss=0.1041, pruned_loss=0.02801, audio_tagging_loss=0.01178, over 15307.00 frames. ], tot_loss[loss=0.08867, simple_loss=0.1078, pruned_loss=0.02438, audio_tagging_loss=0.01039, over 3041219.31 frames. ], batch size: 57, lr: 7.77e-03, grad_scale: 16.0 2023-11-19 09:23:10,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=670306.6666666666, ans=0.2 2023-11-19 09:23:20,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.07 vs. limit=22.5 2023-11-19 09:23:40,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.83 vs. limit=22.5 2023-11-19 09:23:42,743 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:23:50,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=670506.6666666666, ans=0.04949747468305833 2023-11-19 09:23:51,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=15.0 2023-11-19 09:23:54,176 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4400, loss[loss=0.08139, simple_loss=0.09952, pruned_loss=0.02109, audio_tagging_loss=0.01055, over 15529.00 frames. ], tot_loss[loss=0.08816, simple_loss=0.1072, pruned_loss=0.02417, audio_tagging_loss=0.01039, over 3034467.76 frames. ], batch size: 58, lr: 7.77e-03, grad_scale: 32.0 2023-11-19 09:24:01,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=670573.3333333334, ans=0.125 2023-11-19 09:24:11,740 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2023-11-19 09:24:14,410 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.076e+01 8.724e+01 9.307e+01 1.083e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-19 09:24:20,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=670706.6666666666, ans=0.125 2023-11-19 09:24:28,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=670773.3333333334, ans=0.125 2023-11-19 09:24:30,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=670773.3333333334, ans=0.125 2023-11-19 09:24:31,505 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:24:49,635 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4450, loss[loss=0.09811, simple_loss=0.1222, pruned_loss=0.02657, audio_tagging_loss=0.01046, over 16450.00 frames. ], tot_loss[loss=0.08828, simple_loss=0.1073, pruned_loss=0.02432, audio_tagging_loss=0.01033, over 3034652.40 frames. ], batch size: 61, lr: 7.77e-03, grad_scale: 32.0 2023-11-19 09:25:07,750 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=670973.3333333334, ans=0.125 2023-11-19 09:25:45,406 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4500, loss[loss=0.1195, simple_loss=0.1456, pruned_loss=0.0393, audio_tagging_loss=0.0074, over 14667.00 frames. ], tot_loss[loss=0.08805, simple_loss=0.1071, pruned_loss=0.02418, audio_tagging_loss=0.01033, over 3040878.13 frames. ], batch size: 55, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:25:48,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=671240.0, ans=0.125 2023-11-19 09:25:59,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.55 vs. limit=22.5 2023-11-19 09:26:04,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=671306.6666666666, ans=0.2 2023-11-19 09:26:06,066 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.307e+01 9.155e+01 9.901e+01 1.565e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 09:26:07,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=671373.3333333334, ans=0.125 2023-11-19 09:26:40,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=671573.3333333334, ans=0.125 2023-11-19 09:26:41,059 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4550, loss[loss=0.07459, simple_loss=0.09239, pruned_loss=0.01361, audio_tagging_loss=0.01478, over 15825.00 frames. ], tot_loss[loss=0.08795, simple_loss=0.1066, pruned_loss=0.02417, audio_tagging_loss=0.01047, over 3043100.67 frames. ], batch size: 59, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:26:41,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2023-11-19 09:26:45,619 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=671573.3333333334, ans=0.125 2023-11-19 09:26:52,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=671640.0, ans=0.125 2023-11-19 09:26:56,618 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.32 vs. limit=6.0 2023-11-19 09:27:22,363 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:27:27,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=671840.0, ans=0.125 2023-11-19 09:27:28,081 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.04 vs. limit=10.0 2023-11-19 09:27:36,509 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4600, loss[loss=0.06823, simple_loss=0.08797, pruned_loss=0.01543, audio_tagging_loss=0.008821, over 14964.00 frames. ], tot_loss[loss=0.08767, simple_loss=0.106, pruned_loss=0.02403, audio_tagging_loss=0.01065, over 3036889.63 frames. ], batch size: 57, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:27:41,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=671906.6666666666, ans=0.0 2023-11-19 09:27:47,187 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=671973.3333333334, ans=0.125 2023-11-19 09:27:58,267 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.562e+01 9.559e+01 1.086e+02 1.814e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-19 09:28:01,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=672040.0, ans=0.05 2023-11-19 09:28:07,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=672040.0, ans=0.0 2023-11-19 09:28:22,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=672173.3333333334, ans=0.125 2023-11-19 09:28:24,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=672173.3333333334, ans=0.125 2023-11-19 09:28:32,582 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4650, loss[loss=0.08771, simple_loss=0.1005, pruned_loss=0.02452, audio_tagging_loss=0.01292, over 14576.00 frames. ], tot_loss[loss=0.08775, simple_loss=0.1061, pruned_loss=0.02402, audio_tagging_loss=0.01068, over 3041949.28 frames. ], batch size: 55, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:28:45,055 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=672306.6666666666, ans=0.125 2023-11-19 09:28:49,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=672306.6666666666, ans=0.05 2023-11-19 09:29:16,433 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=672506.6666666666, ans=0.125 2023-11-19 09:29:28,493 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4700, loss[loss=0.07939, simple_loss=0.08698, pruned_loss=0.02283, audio_tagging_loss=0.01307, over 14662.00 frames. ], tot_loss[loss=0.08798, simple_loss=0.1061, pruned_loss=0.02413, audio_tagging_loss=0.01082, over 3037909.86 frames. ], batch size: 58, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:29:32,929 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=672573.3333333334, ans=0.0 2023-11-19 09:29:38,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=672640.0, ans=0.0 2023-11-19 09:29:40,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2023-11-19 09:29:40,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=672640.0, ans=0.125 2023-11-19 09:29:48,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=672640.0, ans=0.125 2023-11-19 09:29:49,806 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.777e+01 8.341e+01 9.226e+01 1.015e+02 1.641e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 09:30:01,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=672773.3333333334, ans=10.0 2023-11-19 09:30:07,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=672773.3333333334, ans=0.125 2023-11-19 09:30:24,318 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4750, loss[loss=0.104, simple_loss=0.1222, pruned_loss=0.03115, audio_tagging_loss=0.01171, over 14318.00 frames. ], tot_loss[loss=0.08848, simple_loss=0.107, pruned_loss=0.02424, audio_tagging_loss=0.01076, over 3034106.08 frames. ], batch size: 56, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:30:29,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=672906.6666666666, ans=0.0 2023-11-19 09:30:41,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=672973.3333333334, ans=0.2 2023-11-19 09:30:45,744 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.09 vs. limit=22.5 2023-11-19 09:30:48,283 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=673040.0, ans=0.04949747468305833 2023-11-19 09:30:57,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=673106.6666666666, ans=0.0 2023-11-19 09:31:12,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=673173.3333333334, ans=0.125 2023-11-19 09:31:14,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=673173.3333333334, ans=0.125 2023-11-19 09:31:20,475 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4800, loss[loss=0.0965, simple_loss=0.1134, pruned_loss=0.02997, audio_tagging_loss=0.00984, over 15144.00 frames. ], tot_loss[loss=0.08846, simple_loss=0.1071, pruned_loss=0.02415, audio_tagging_loss=0.01077, over 3038341.43 frames. ], batch size: 59, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:31:23,947 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=673240.0, ans=0.125 2023-11-19 09:31:31,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.79 vs. limit=22.5 2023-11-19 09:31:41,642 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.288e+01 8.950e+01 9.768e+01 1.286e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 09:31:41,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=673373.3333333334, ans=0.125 2023-11-19 09:31:44,329 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=15.0 2023-11-19 09:31:56,553 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2023-11-19 09:31:59,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=673440.0, ans=0.125 2023-11-19 09:32:16,666 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4850, loss[loss=0.1046, simple_loss=0.1348, pruned_loss=0.02583, audio_tagging_loss=0.01134, over 15680.00 frames. ], tot_loss[loss=0.08827, simple_loss=0.1067, pruned_loss=0.02397, audio_tagging_loss=0.01094, over 3039267.40 frames. ], batch size: 58, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:32:26,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=673640.0, ans=0.0 2023-11-19 09:32:47,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=673706.6666666666, ans=0.125 2023-11-19 09:32:50,598 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2023-11-19 09:33:12,513 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4900, loss[loss=0.09369, simple_loss=0.1274, pruned_loss=0.02101, audio_tagging_loss=0.00899, over 15570.00 frames. ], tot_loss[loss=0.08873, simple_loss=0.1076, pruned_loss=0.02419, audio_tagging_loss=0.01072, over 3043644.23 frames. ], batch size: 56, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:33:13,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=673906.6666666666, ans=0.125 2023-11-19 09:33:23,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.40 vs. limit=22.5 2023-11-19 09:33:33,596 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.373e+01 9.037e+01 1.012e+02 1.386e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 09:33:38,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2023-11-19 09:33:48,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=674106.6666666666, ans=0.125 2023-11-19 09:33:56,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=674173.3333333334, ans=0.125 2023-11-19 09:33:56,335 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.54 vs. limit=12.0 2023-11-19 09:34:03,396 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=674173.3333333334, ans=0.125 2023-11-19 09:34:04,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=674173.3333333334, ans=0.125 2023-11-19 09:34:07,881 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 4950, loss[loss=0.08019, simple_loss=0.09983, pruned_loss=0.02244, audio_tagging_loss=0.007835, over 15942.00 frames. ], tot_loss[loss=0.08809, simple_loss=0.107, pruned_loss=0.024, audio_tagging_loss=0.01058, over 3044971.78 frames. ], batch size: 62, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:34:09,597 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2023-11-19 09:34:26,020 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=674306.6666666666, ans=0.125 2023-11-19 09:34:38,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=12.0 2023-11-19 09:34:46,742 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=674440.0, ans=10.0 2023-11-19 09:34:48,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=674440.0, ans=0.125 2023-11-19 09:35:04,108 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5000, loss[loss=0.1075, simple_loss=0.1331, pruned_loss=0.03204, audio_tagging_loss=0.008913, over 14437.00 frames. ], tot_loss[loss=0.08812, simple_loss=0.1071, pruned_loss=0.02412, audio_tagging_loss=0.01048, over 3036359.25 frames. ], batch size: 55, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:35:15,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=674640.0, ans=0.125 2023-11-19 09:35:25,266 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.803e+01 8.355e+01 9.053e+01 1.007e+02 1.287e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 09:35:33,717 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.42 vs. limit=15.0 2023-11-19 09:35:34,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=674706.6666666666, ans=0.0 2023-11-19 09:35:59,633 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5050, loss[loss=0.098, simple_loss=0.1243, pruned_loss=0.0239, audio_tagging_loss=0.01193, over 16351.00 frames. ], tot_loss[loss=0.0885, simple_loss=0.1077, pruned_loss=0.02431, audio_tagging_loss=0.01036, over 3035247.84 frames. ], batch size: 58, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:36:08,049 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=15.0 2023-11-19 09:36:22,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=675040.0, ans=0.125 2023-11-19 09:36:26,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=675040.0, ans=0.0 2023-11-19 09:36:45,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=675173.3333333334, ans=0.125 2023-11-19 09:36:53,436 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.77 vs. limit=22.5 2023-11-19 09:36:55,048 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5100, loss[loss=0.08056, simple_loss=0.1009, pruned_loss=0.02055, audio_tagging_loss=0.009578, over 15168.00 frames. ], tot_loss[loss=0.0875, simple_loss=0.1063, pruned_loss=0.02392, audio_tagging_loss=0.01044, over 3037399.68 frames. ], batch size: 57, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:37:00,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=675240.0, ans=0.125 2023-11-19 09:37:00,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=675240.0, ans=0.125 2023-11-19 09:37:12,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675306.6666666666, ans=0.1 2023-11-19 09:37:14,272 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675306.6666666666, ans=0.1 2023-11-19 09:37:16,148 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.361e+01 9.263e+01 1.052e+02 1.984e+02, threshold=1.853e+02, percent-clipped=1.0 2023-11-19 09:37:31,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=675440.0, ans=0.125 2023-11-19 09:37:32,920 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=675440.0, ans=0.05 2023-11-19 09:37:35,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.82 vs. limit=22.5 2023-11-19 09:37:41,084 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2023-11-19 09:37:51,134 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5150, loss[loss=0.09985, simple_loss=0.1063, pruned_loss=0.03074, audio_tagging_loss=0.01598, over 14955.00 frames. ], tot_loss[loss=0.08696, simple_loss=0.1055, pruned_loss=0.02378, audio_tagging_loss=0.01044, over 3031680.96 frames. ], batch size: 56, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:37:58,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=675573.3333333334, ans=0.0 2023-11-19 09:38:17,509 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2023-11-19 09:38:18,249 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=675706.6666666666, ans=0.2 2023-11-19 09:38:18,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=675706.6666666666, ans=0.0 2023-11-19 09:38:20,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=675706.6666666666, ans=0.0 2023-11-19 09:38:35,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=675840.0, ans=0.0 2023-11-19 09:38:35,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=675840.0, ans=0.125 2023-11-19 09:38:46,046 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5200, loss[loss=0.07424, simple_loss=0.09893, pruned_loss=0.01889, audio_tagging_loss=0.005889, over 15733.00 frames. ], tot_loss[loss=0.08708, simple_loss=0.1058, pruned_loss=0.02374, audio_tagging_loss=0.01043, over 3038993.52 frames. ], batch size: 58, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:39:07,241 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.810e+01 8.521e+01 9.161e+01 1.002e+02 1.521e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 09:39:32,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=676173.3333333334, ans=0.125 2023-11-19 09:39:35,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=676173.3333333334, ans=0.125 2023-11-19 09:39:40,170 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-19 09:39:41,615 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5250, loss[loss=0.07467, simple_loss=0.0872, pruned_loss=0.01912, audio_tagging_loss=0.01195, over 15891.00 frames. ], tot_loss[loss=0.08726, simple_loss=0.1062, pruned_loss=0.02387, audio_tagging_loss=0.01028, over 3049459.76 frames. ], batch size: 61, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:39:47,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=676240.0, ans=0.09899494936611666 2023-11-19 09:39:48,366 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=8.13 vs. limit=8.0 2023-11-19 09:39:49,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=676240.0, ans=0.125 2023-11-19 09:39:54,612 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=676306.6666666666, ans=10.0 2023-11-19 09:40:04,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=676373.3333333334, ans=0.2 2023-11-19 09:40:12,418 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.70 vs. limit=15.0 2023-11-19 09:40:37,335 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5300, loss[loss=0.09126, simple_loss=0.1117, pruned_loss=0.02434, audio_tagging_loss=0.01108, over 15735.00 frames. ], tot_loss[loss=0.08757, simple_loss=0.1068, pruned_loss=0.02391, audio_tagging_loss=0.01026, over 3046135.42 frames. ], batch size: 58, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:40:40,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=676573.3333333334, ans=0.1 2023-11-19 09:40:42,755 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=676573.3333333334, ans=0.125 2023-11-19 09:40:42,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=676573.3333333334, ans=0.125 2023-11-19 09:40:50,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=676640.0, ans=0.2 2023-11-19 09:40:51,882 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=676640.0, ans=0.125 2023-11-19 09:40:56,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=676640.0, ans=0.1 2023-11-19 09:40:58,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.778e+01 8.435e+01 9.072e+01 1.015e+02 1.516e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-19 09:41:14,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=676773.3333333334, ans=0.1 2023-11-19 09:41:16,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=676773.3333333334, ans=0.125 2023-11-19 09:41:21,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=676840.0, ans=0.1 2023-11-19 09:41:24,505 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-11-19 09:41:32,748 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5350, loss[loss=0.07752, simple_loss=0.09564, pruned_loss=0.01726, audio_tagging_loss=0.01244, over 14903.00 frames. ], tot_loss[loss=0.08761, simple_loss=0.1067, pruned_loss=0.02395, audio_tagging_loss=0.01032, over 3047373.67 frames. ], batch size: 59, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:41:56,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=677040.0, ans=0.0 2023-11-19 09:41:58,565 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2023-11-19 09:42:10,471 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=677106.6666666666, ans=0.125 2023-11-19 09:42:11,984 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2023-11-19 09:42:17,533 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2023-11-19 09:42:22,211 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=677173.3333333334, ans=0.125 2023-11-19 09:42:27,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=677240.0, ans=0.125 2023-11-19 09:42:28,299 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5400, loss[loss=0.05233, simple_loss=0.05437, pruned_loss=0.009338, audio_tagging_loss=0.01581, over 14438.00 frames. ], tot_loss[loss=0.088, simple_loss=0.107, pruned_loss=0.02409, audio_tagging_loss=0.01041, over 3051529.86 frames. ], batch size: 56, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:42:32,710 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=677240.0, ans=0.0 2023-11-19 09:42:34,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.35 vs. limit=15.0 2023-11-19 09:42:49,671 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.356e+01 9.040e+01 1.006e+02 1.272e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 09:42:57,109 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.84 vs. limit=10.0 2023-11-19 09:43:23,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.08 vs. limit=22.5 2023-11-19 09:43:24,107 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5450, loss[loss=0.09081, simple_loss=0.112, pruned_loss=0.02307, audio_tagging_loss=0.01174, over 15250.00 frames. ], tot_loss[loss=0.08939, simple_loss=0.1088, pruned_loss=0.02467, audio_tagging_loss=0.01035, over 3045683.61 frames. ], batch size: 55, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:43:35,787 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=677640.0, ans=22.5 2023-11-19 09:43:38,112 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2023-11-19 09:43:54,696 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=677706.6666666666, ans=0.0 2023-11-19 09:44:16,020 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.016e-01 2023-11-19 09:44:20,045 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5500, loss[loss=0.08098, simple_loss=0.1043, pruned_loss=0.01893, audio_tagging_loss=0.009884, over 16032.00 frames. ], tot_loss[loss=0.08912, simple_loss=0.1084, pruned_loss=0.0245, audio_tagging_loss=0.01043, over 3047245.56 frames. ], batch size: 62, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:44:30,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=677906.6666666666, ans=0.125 2023-11-19 09:44:41,365 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.921e+01 8.594e+01 9.259e+01 1.001e+02 1.326e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-19 09:44:52,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=678040.0, ans=0.125 2023-11-19 09:45:16,532 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5550, loss[loss=0.08962, simple_loss=0.117, pruned_loss=0.02271, audio_tagging_loss=0.008414, over 15590.00 frames. ], tot_loss[loss=0.08907, simple_loss=0.1085, pruned_loss=0.02428, audio_tagging_loss=0.01055, over 3052097.02 frames. ], batch size: 58, lr: 7.72e-03, grad_scale: 32.0 2023-11-19 09:45:47,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=678373.3333333334, ans=0.125 2023-11-19 09:45:49,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=678440.0, ans=0.0 2023-11-19 09:46:12,186 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5600, loss[loss=0.08873, simple_loss=0.1009, pruned_loss=0.02503, audio_tagging_loss=0.01328, over 15301.00 frames. ], tot_loss[loss=0.0887, simple_loss=0.1075, pruned_loss=0.02418, audio_tagging_loss=0.01077, over 3048995.07 frames. ], batch size: 58, lr: 7.72e-03, grad_scale: 32.0 2023-11-19 09:46:34,641 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 8.355e+01 9.090e+01 1.021e+02 1.619e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-19 09:46:39,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=678706.6666666666, ans=0.125 2023-11-19 09:46:43,259 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.49 vs. limit=15.0 2023-11-19 09:46:52,570 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:47:06,120 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678840.0, ans=0.1 2023-11-19 09:47:07,976 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5650, loss[loss=0.07612, simple_loss=0.0848, pruned_loss=0.02102, audio_tagging_loss=0.0127, over 14000.00 frames. ], tot_loss[loss=0.08863, simple_loss=0.1073, pruned_loss=0.02413, audio_tagging_loss=0.01086, over 3045359.00 frames. ], batch size: 57, lr: 7.72e-03, grad_scale: 16.0 2023-11-19 09:47:11,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=678906.6666666666, ans=0.05 2023-11-19 09:47:41,184 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=679106.6666666666, ans=0.125 2023-11-19 09:47:51,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.62 vs. limit=22.5 2023-11-19 09:48:04,376 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5700, loss[loss=0.08634, simple_loss=0.1002, pruned_loss=0.02565, audio_tagging_loss=0.01056, over 13832.00 frames. ], tot_loss[loss=0.08844, simple_loss=0.1067, pruned_loss=0.02428, audio_tagging_loss=0.01083, over 3046585.36 frames. ], batch size: 53, lr: 7.72e-03, grad_scale: 16.0 2023-11-19 09:48:12,426 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.89 vs. limit=15.0 2023-11-19 09:48:26,548 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.673e+01 9.391e+01 1.015e+02 1.366e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-19 09:48:35,180 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:48:43,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=679440.0, ans=0.125 2023-11-19 09:48:45,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=679440.0, ans=0.0 2023-11-19 09:48:51,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=679506.6666666666, ans=0.125 2023-11-19 09:48:59,041 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=679573.3333333334, ans=0.2 2023-11-19 09:48:59,868 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5750, loss[loss=0.1001, simple_loss=0.1154, pruned_loss=0.02728, audio_tagging_loss=0.01508, over 15040.00 frames. ], tot_loss[loss=0.08779, simple_loss=0.1059, pruned_loss=0.0241, audio_tagging_loss=0.01073, over 3046995.97 frames. ], batch size: 57, lr: 7.72e-03, grad_scale: 16.0 2023-11-19 09:49:06,449 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=15.0 2023-11-19 09:49:08,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=679573.3333333334, ans=0.125 2023-11-19 09:49:12,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=679640.0, ans=0.0 2023-11-19 09:49:35,364 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=679773.3333333334, ans=0.125 2023-11-19 09:49:55,318 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5800, loss[loss=0.08979, simple_loss=0.1045, pruned_loss=0.02762, audio_tagging_loss=0.009922, over 15111.00 frames. ], tot_loss[loss=0.08736, simple_loss=0.1054, pruned_loss=0.02399, audio_tagging_loss=0.01066, over 3041140.14 frames. ], batch size: 55, lr: 7.72e-03, grad_scale: 16.0 2023-11-19 09:49:55,981 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=12.0 2023-11-19 09:50:10,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=679973.3333333334, ans=0.125 2023-11-19 09:50:17,689 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.685e+01 8.360e+01 9.012e+01 9.906e+01 1.267e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 09:50:23,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=680040.0, ans=0.125 2023-11-19 09:50:34,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=680106.6666666666, ans=0.1 2023-11-19 09:50:41,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=680173.3333333334, ans=0.2 2023-11-19 09:50:50,895 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5850, loss[loss=0.09329, simple_loss=0.1172, pruned_loss=0.02396, audio_tagging_loss=0.01072, over 15751.00 frames. ], tot_loss[loss=0.087, simple_loss=0.1056, pruned_loss=0.02375, audio_tagging_loss=0.01044, over 3044593.09 frames. ], batch size: 60, lr: 7.71e-03, grad_scale: 16.0 2023-11-19 09:51:04,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=680306.6666666666, ans=0.125 2023-11-19 09:51:07,311 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.23 vs. limit=10.0 2023-11-19 09:51:09,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=680306.6666666666, ans=0.125 2023-11-19 09:51:11,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=680306.6666666666, ans=0.2 2023-11-19 09:51:47,068 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5900, loss[loss=0.09066, simple_loss=0.1138, pruned_loss=0.02491, audio_tagging_loss=0.008836, over 15072.00 frames. ], tot_loss[loss=0.08691, simple_loss=0.1056, pruned_loss=0.02368, audio_tagging_loss=0.0104, over 3045792.28 frames. ], batch size: 55, lr: 7.71e-03, grad_scale: 16.0 2023-11-19 09:51:47,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=680573.3333333334, ans=0.04949747468305833 2023-11-19 09:51:55,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=680573.3333333334, ans=0.1 2023-11-19 09:52:08,766 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.892e+01 8.198e+01 8.843e+01 9.810e+01 1.400e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 09:52:18,899 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=680773.3333333334, ans=0.015 2023-11-19 09:52:21,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=680773.3333333334, ans=0.2 2023-11-19 09:52:31,190 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.30 vs. limit=15.0 2023-11-19 09:52:42,553 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 5950, loss[loss=0.1303, simple_loss=0.1535, pruned_loss=0.04585, audio_tagging_loss=0.007643, over 15628.00 frames. ], tot_loss[loss=0.0873, simple_loss=0.1061, pruned_loss=0.0238, audio_tagging_loss=0.01044, over 3049763.52 frames. ], batch size: 57, lr: 7.71e-03, grad_scale: 16.0 2023-11-19 09:53:11,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=681040.0, ans=0.125 2023-11-19 09:53:14,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2023-11-19 09:53:37,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2023-11-19 09:53:38,045 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6000, loss[loss=0.06583, simple_loss=0.074, pruned_loss=0.02016, audio_tagging_loss=0.008664, over 15332.00 frames. ], tot_loss[loss=0.08798, simple_loss=0.107, pruned_loss=0.02409, audio_tagging_loss=0.01038, over 3051903.04 frames. ], batch size: 60, lr: 7.71e-03, grad_scale: 32.0 2023-11-19 09:53:38,045 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 09:54:10,897 INFO [train_asr.py:1147] (1/4) Epoch 9, validation: loss=0.06636, simple_loss=0.05607, pruned_loss=0.006778, audio_tagging_loss=0.03155, over 4681554.00 frames. 2023-11-19 09:54:10,898 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 09:54:20,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=681240.0, ans=0.025 2023-11-19 09:54:25,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=681306.6666666666, ans=0.125 2023-11-19 09:54:33,947 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.777e+01 8.283e+01 9.118e+01 1.003e+02 1.340e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 09:54:51,003 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:54:53,295 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.07 vs. limit=22.5 2023-11-19 09:55:06,820 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6050, loss[loss=0.09546, simple_loss=0.103, pruned_loss=0.03162, audio_tagging_loss=0.01233, over 15515.00 frames. ], tot_loss[loss=0.08821, simple_loss=0.1074, pruned_loss=0.02414, audio_tagging_loss=0.01035, over 3051902.97 frames. ], batch size: 60, lr: 7.71e-03, grad_scale: 16.0 2023-11-19 09:55:23,643 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=12.0 2023-11-19 09:55:29,037 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2023-11-19 09:55:34,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-19 09:55:47,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=681773.3333333334, ans=0.1 2023-11-19 09:55:53,780 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2023-11-19 09:55:56,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=681840.0, ans=0.1 2023-11-19 09:55:57,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.13 vs. limit=15.0 2023-11-19 09:56:02,352 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6100, loss[loss=0.08225, simple_loss=0.09593, pruned_loss=0.02402, audio_tagging_loss=0.01026, over 14597.00 frames. ], tot_loss[loss=0.08847, simple_loss=0.1077, pruned_loss=0.02423, audio_tagging_loss=0.01039, over 3051811.53 frames. ], batch size: 57, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:56:07,973 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.46 vs. limit=10.0 2023-11-19 09:56:11,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.06 vs. limit=15.0 2023-11-19 09:56:26,131 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.696e+01 8.498e+01 9.519e+01 1.052e+02 1.737e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 09:56:34,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=682106.6666666666, ans=0.125 2023-11-19 09:56:39,503 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.56 vs. limit=12.0 2023-11-19 09:56:57,823 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6150, loss[loss=0.09941, simple_loss=0.1245, pruned_loss=0.02698, audio_tagging_loss=0.01016, over 15489.00 frames. ], tot_loss[loss=0.0887, simple_loss=0.1079, pruned_loss=0.02436, audio_tagging_loss=0.01039, over 3058944.11 frames. ], batch size: 56, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:57:05,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=682240.0, ans=0.0 2023-11-19 09:57:14,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=682306.6666666666, ans=0.125 2023-11-19 09:57:15,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=682306.6666666666, ans=0.1 2023-11-19 09:57:41,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=682506.6666666666, ans=0.1 2023-11-19 09:57:42,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=682506.6666666666, ans=15.0 2023-11-19 09:57:53,347 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6200, loss[loss=0.1003, simple_loss=0.1164, pruned_loss=0.02783, audio_tagging_loss=0.01427, over 15131.00 frames. ], tot_loss[loss=0.08779, simple_loss=0.1062, pruned_loss=0.02401, audio_tagging_loss=0.01068, over 3060478.68 frames. ], batch size: 56, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:57:53,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=682573.3333333334, ans=0.125 2023-11-19 09:57:54,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.39 vs. limit=5.0 2023-11-19 09:58:16,315 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.684e+01 8.555e+01 9.157e+01 9.904e+01 1.201e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 09:58:20,461 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-19 09:58:49,116 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6250, loss[loss=0.07638, simple_loss=0.08999, pruned_loss=0.01898, audio_tagging_loss=0.0124, over 15072.00 frames. ], tot_loss[loss=0.08844, simple_loss=0.1069, pruned_loss=0.02419, audio_tagging_loss=0.01078, over 3058876.72 frames. ], batch size: 58, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:58:56,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2023-11-19 09:59:00,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=22.5 2023-11-19 09:59:06,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=682973.3333333334, ans=0.0 2023-11-19 09:59:23,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=683106.6666666666, ans=0.0 2023-11-19 09:59:30,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=683106.6666666666, ans=0.05 2023-11-19 09:59:31,617 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=683106.6666666666, ans=0.0 2023-11-19 09:59:34,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=683173.3333333334, ans=0.0 2023-11-19 09:59:36,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=683173.3333333334, ans=0.5 2023-11-19 09:59:38,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=683173.3333333334, ans=0.125 2023-11-19 09:59:44,719 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6300, loss[loss=0.08258, simple_loss=0.1007, pruned_loss=0.02078, audio_tagging_loss=0.01147, over 14249.00 frames. ], tot_loss[loss=0.0884, simple_loss=0.1067, pruned_loss=0.02417, audio_tagging_loss=0.0109, over 3055133.70 frames. ], batch size: 53, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:59:44,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=683240.0, ans=0.125 2023-11-19 10:00:05,195 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=15.0 2023-11-19 10:00:07,021 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=683373.3333333334, ans=0.125 2023-11-19 10:00:07,711 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.970e+01 8.511e+01 9.206e+01 1.011e+02 2.353e+02, threshold=1.841e+02, percent-clipped=1.0 2023-11-19 10:00:17,663 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=683440.0, ans=0.0 2023-11-19 10:00:19,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=683440.0, ans=0.125 2023-11-19 10:00:21,871 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=683440.0, ans=0.0 2023-11-19 10:00:40,557 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6350, loss[loss=0.08133, simple_loss=0.1075, pruned_loss=0.0192, audio_tagging_loss=0.008384, over 14867.00 frames. ], tot_loss[loss=0.08753, simple_loss=0.1055, pruned_loss=0.02382, audio_tagging_loss=0.01096, over 3053714.84 frames. ], batch size: 55, lr: 7.69e-03, grad_scale: 16.0 2023-11-19 10:01:05,528 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=683706.6666666666, ans=0.125 2023-11-19 10:01:34,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=683906.6666666666, ans=0.125 2023-11-19 10:01:35,461 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6400, loss[loss=0.06202, simple_loss=0.06747, pruned_loss=0.01339, audio_tagging_loss=0.01489, over 14268.00 frames. ], tot_loss[loss=0.08766, simple_loss=0.1056, pruned_loss=0.02388, audio_tagging_loss=0.01099, over 3046500.76 frames. ], batch size: 54, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:01:37,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=683906.6666666666, ans=0.125 2023-11-19 10:01:48,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=683973.3333333334, ans=0.125 2023-11-19 10:01:59,223 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.664e+01 8.378e+01 8.903e+01 9.717e+01 1.251e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-19 10:02:20,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=684173.3333333334, ans=0.2 2023-11-19 10:02:30,940 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6450, loss[loss=0.08746, simple_loss=0.1104, pruned_loss=0.02257, audio_tagging_loss=0.0097, over 16068.00 frames. ], tot_loss[loss=0.08778, simple_loss=0.1058, pruned_loss=0.0239, audio_tagging_loss=0.011, over 3049894.80 frames. ], batch size: 61, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:02:56,393 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2023-11-19 10:02:59,046 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=684373.3333333334, ans=0.2 2023-11-19 10:03:01,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=684373.3333333334, ans=0.125 2023-11-19 10:03:19,355 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=684506.6666666666, ans=0.125 2023-11-19 10:03:20,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=684506.6666666666, ans=0.1 2023-11-19 10:03:27,131 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6500, loss[loss=0.09156, simple_loss=0.1178, pruned_loss=0.02379, audio_tagging_loss=0.008852, over 14449.00 frames. ], tot_loss[loss=0.08737, simple_loss=0.1055, pruned_loss=0.02369, audio_tagging_loss=0.01094, over 3042624.78 frames. ], batch size: 57, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:03:40,461 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=684640.0, ans=0.0 2023-11-19 10:03:50,207 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.426e+01 9.031e+01 9.982e+01 1.610e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-19 10:03:53,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=684706.6666666666, ans=0.0 2023-11-19 10:04:05,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=684773.3333333334, ans=0.125 2023-11-19 10:04:14,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=684840.0, ans=0.125 2023-11-19 10:04:20,849 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=12.0 2023-11-19 10:04:22,322 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6550, loss[loss=0.08938, simple_loss=0.1066, pruned_loss=0.02553, audio_tagging_loss=0.01054, over 16553.00 frames. ], tot_loss[loss=0.08768, simple_loss=0.1063, pruned_loss=0.02383, audio_tagging_loss=0.01069, over 3043155.89 frames. ], batch size: 61, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:04:43,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=12.0 2023-11-19 10:04:54,129 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.06 vs. limit=22.5 2023-11-19 10:05:01,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=685106.6666666666, ans=0.125 2023-11-19 10:05:13,027 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=685173.3333333334, ans=0.125 2023-11-19 10:05:18,101 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6600, loss[loss=0.06664, simple_loss=0.07806, pruned_loss=0.01669, audio_tagging_loss=0.01092, over 14445.00 frames. ], tot_loss[loss=0.088, simple_loss=0.107, pruned_loss=0.02397, audio_tagging_loss=0.01055, over 3041309.37 frames. ], batch size: 57, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:05:41,609 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.207e+01 8.810e+01 9.589e+01 1.176e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-19 10:05:56,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=685440.0, ans=0.1 2023-11-19 10:06:14,467 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6650, loss[loss=0.1135, simple_loss=0.1413, pruned_loss=0.03346, audio_tagging_loss=0.00941, over 15298.00 frames. ], tot_loss[loss=0.08816, simple_loss=0.1075, pruned_loss=0.02406, audio_tagging_loss=0.01036, over 3046633.31 frames. ], batch size: 54, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:06:17,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=685573.3333333334, ans=0.125 2023-11-19 10:06:23,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=22.5 2023-11-19 10:06:30,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=685640.0, ans=0.125 2023-11-19 10:06:32,204 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.73 vs. limit=15.0 2023-11-19 10:07:05,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=685840.0, ans=0.125 2023-11-19 10:07:09,323 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6700, loss[loss=0.0916, simple_loss=0.1185, pruned_loss=0.02452, audio_tagging_loss=0.007837, over 15274.00 frames. ], tot_loss[loss=0.08765, simple_loss=0.1071, pruned_loss=0.02379, audio_tagging_loss=0.0103, over 3049556.67 frames. ], batch size: 56, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:07:20,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=685973.3333333334, ans=0.0 2023-11-19 10:07:23,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.57 vs. limit=15.0 2023-11-19 10:07:30,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=685973.3333333334, ans=0.0 2023-11-19 10:07:30,534 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.36 vs. limit=15.0 2023-11-19 10:07:33,113 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.482e+01 9.410e+01 1.023e+02 1.409e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-19 10:07:38,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=686040.0, ans=0.2 2023-11-19 10:07:42,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=686106.6666666666, ans=0.125 2023-11-19 10:07:44,733 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=686106.6666666666, ans=0.0 2023-11-19 10:07:47,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=686106.6666666666, ans=0.2 2023-11-19 10:07:48,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=686106.6666666666, ans=0.125 2023-11-19 10:08:05,674 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6750, loss[loss=0.07612, simple_loss=0.08575, pruned_loss=0.01868, audio_tagging_loss=0.01456, over 15003.00 frames. ], tot_loss[loss=0.08745, simple_loss=0.1065, pruned_loss=0.02386, audio_tagging_loss=0.01037, over 3044366.64 frames. ], batch size: 56, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:08:05,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=686240.0, ans=0.125 2023-11-19 10:08:11,733 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.87 vs. limit=15.0 2023-11-19 10:08:12,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=686240.0, ans=0.2 2023-11-19 10:08:19,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=686306.6666666666, ans=0.125 2023-11-19 10:08:43,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=686440.0, ans=0.125 2023-11-19 10:08:53,213 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=686506.6666666666, ans=0.125 2023-11-19 10:08:55,917 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=686506.6666666666, ans=0.125 2023-11-19 10:09:01,562 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6800, loss[loss=0.08859, simple_loss=0.1033, pruned_loss=0.02718, audio_tagging_loss=0.009743, over 16246.00 frames. ], tot_loss[loss=0.08765, simple_loss=0.1067, pruned_loss=0.02398, audio_tagging_loss=0.01033, over 3051752.89 frames. ], batch size: 61, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:09:01,862 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=686573.3333333334, ans=0.2 2023-11-19 10:09:12,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=686640.0, ans=0.0 2023-11-19 10:09:24,302 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.110e+01 8.200e+01 8.866e+01 9.843e+01 1.346e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-19 10:09:30,755 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=6.0 2023-11-19 10:09:31,414 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=686706.6666666666, ans=0.2 2023-11-19 10:09:46,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=686840.0, ans=0.0 2023-11-19 10:09:48,339 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=686840.0, ans=0.02 2023-11-19 10:09:53,773 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.56 vs. limit=15.0 2023-11-19 10:09:56,519 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6850, loss[loss=0.08096, simple_loss=0.1073, pruned_loss=0.01937, audio_tagging_loss=0.007946, over 14544.00 frames. ], tot_loss[loss=0.08634, simple_loss=0.105, pruned_loss=0.02335, audio_tagging_loss=0.01048, over 3040155.09 frames. ], batch size: 56, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:10:00,924 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=686906.6666666666, ans=0.125 2023-11-19 10:10:04,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-11-19 10:10:16,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=686973.3333333334, ans=0.125 2023-11-19 10:10:22,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=687040.0, ans=0.0 2023-11-19 10:10:32,799 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=687106.6666666666, ans=0.2 2023-11-19 10:10:45,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=687173.3333333334, ans=0.125 2023-11-19 10:10:47,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=687173.3333333334, ans=0.0 2023-11-19 10:10:47,233 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2023-11-19 10:10:52,109 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6900, loss[loss=0.07577, simple_loss=0.09187, pruned_loss=0.01985, audio_tagging_loss=0.009986, over 15305.00 frames. ], tot_loss[loss=0.08652, simple_loss=0.1052, pruned_loss=0.02342, audio_tagging_loss=0.01048, over 3042554.27 frames. ], batch size: 58, lr: 7.67e-03, grad_scale: 32.0 2023-11-19 10:10:56,959 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=687240.0, ans=0.1 2023-11-19 10:11:00,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=687240.0, ans=0.0 2023-11-19 10:11:04,192 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-19 10:11:09,586 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=687306.6666666666, ans=0.0 2023-11-19 10:11:13,836 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=687373.3333333334, ans=0.125 2023-11-19 10:11:15,750 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.905e+01 8.720e+01 9.438e+01 1.043e+02 1.545e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-19 10:11:24,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=687440.0, ans=0.125 2023-11-19 10:11:34,264 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:11:45,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=687506.6666666666, ans=0.125 2023-11-19 10:11:47,997 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 6950, loss[loss=0.1039, simple_loss=0.1336, pruned_loss=0.03004, audio_tagging_loss=0.007089, over 15543.00 frames. ], tot_loss[loss=0.0873, simple_loss=0.1061, pruned_loss=0.02381, audio_tagging_loss=0.01045, over 3047098.48 frames. ], batch size: 56, lr: 7.67e-03, grad_scale: 32.0 2023-11-19 10:11:59,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.49 vs. limit=12.0 2023-11-19 10:12:03,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=687640.0, ans=0.125 2023-11-19 10:12:04,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=687640.0, ans=0.125 2023-11-19 10:12:09,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=687706.6666666666, ans=0.125 2023-11-19 10:12:31,173 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-19 10:12:38,555 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.08 vs. limit=12.0 2023-11-19 10:12:42,533 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:12:43,296 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7000, loss[loss=0.0886, simple_loss=0.1016, pruned_loss=0.02859, audio_tagging_loss=0.009197, over 14413.00 frames. ], tot_loss[loss=0.08713, simple_loss=0.1058, pruned_loss=0.02368, audio_tagging_loss=0.01055, over 3044698.65 frames. ], batch size: 59, lr: 7.67e-03, grad_scale: 16.0 2023-11-19 10:13:08,454 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.313e+01 8.438e+01 9.310e+01 1.017e+02 1.231e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-19 10:13:22,253 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.18 vs. limit=6.0 2023-11-19 10:13:32,458 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=688173.3333333334, ans=0.1 2023-11-19 10:13:36,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=688173.3333333334, ans=0.2 2023-11-19 10:13:39,126 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7050, loss[loss=0.09332, simple_loss=0.1064, pruned_loss=0.02512, audio_tagging_loss=0.01502, over 14261.00 frames. ], tot_loss[loss=0.08741, simple_loss=0.1059, pruned_loss=0.02381, audio_tagging_loss=0.01063, over 3044794.43 frames. ], batch size: 54, lr: 7.67e-03, grad_scale: 16.0 2023-11-19 10:13:41,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=688240.0, ans=0.125 2023-11-19 10:14:19,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=688440.0, ans=0.07 2023-11-19 10:14:20,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=688440.0, ans=0.05 2023-11-19 10:14:29,497 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.14 vs. limit=10.0 2023-11-19 10:14:35,331 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7100, loss[loss=0.1344, simple_loss=0.1645, pruned_loss=0.04063, audio_tagging_loss=0.01154, over 16182.00 frames. ], tot_loss[loss=0.08751, simple_loss=0.1058, pruned_loss=0.02382, audio_tagging_loss=0.01076, over 3045556.14 frames. ], batch size: 56, lr: 7.67e-03, grad_scale: 16.0 2023-11-19 10:14:41,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=688573.3333333334, ans=0.125 2023-11-19 10:14:48,916 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.99 vs. limit=22.5 2023-11-19 10:14:59,035 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.651e+01 8.454e+01 9.144e+01 1.007e+02 1.381e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-19 10:15:15,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=688773.3333333334, ans=0.125 2023-11-19 10:15:30,843 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7150, loss[loss=0.08552, simple_loss=0.09897, pruned_loss=0.02342, audio_tagging_loss=0.01262, over 13517.00 frames. ], tot_loss[loss=0.08744, simple_loss=0.1057, pruned_loss=0.0238, audio_tagging_loss=0.01079, over 3050697.66 frames. ], batch size: 53, lr: 7.67e-03, grad_scale: 16.0 2023-11-19 10:15:38,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=688906.6666666666, ans=0.0 2023-11-19 10:16:26,359 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7200, loss[loss=0.1037, simple_loss=0.1223, pruned_loss=0.03236, audio_tagging_loss=0.01017, over 16665.00 frames. ], tot_loss[loss=0.0877, simple_loss=0.106, pruned_loss=0.02389, audio_tagging_loss=0.01082, over 3051039.84 frames. ], batch size: 61, lr: 7.66e-03, grad_scale: 32.0 2023-11-19 10:16:27,761 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=689240.0, ans=0.1 2023-11-19 10:16:28,657 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=689240.0, ans=0.125 2023-11-19 10:16:34,252 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2023-11-19 10:16:50,998 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.339e+01 8.352e+01 9.080e+01 1.000e+02 1.342e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-19 10:17:01,866 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=689440.0, ans=0.1 2023-11-19 10:17:02,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=689440.0, ans=15.0 2023-11-19 10:17:08,744 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=689440.0, ans=0.0 2023-11-19 10:17:21,595 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7250, loss[loss=0.0665, simple_loss=0.08517, pruned_loss=0.01417, audio_tagging_loss=0.009744, over 14305.00 frames. ], tot_loss[loss=0.08724, simple_loss=0.1057, pruned_loss=0.02356, audio_tagging_loss=0.01083, over 3044605.57 frames. ], batch size: 56, lr: 7.66e-03, grad_scale: 32.0 2023-11-19 10:17:26,562 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=689573.3333333334, ans=0.125 2023-11-19 10:17:41,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=689640.0, ans=0.125 2023-11-19 10:17:42,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=689706.6666666666, ans=0.125 2023-11-19 10:17:44,131 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=689706.6666666666, ans=0.0 2023-11-19 10:18:03,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=689773.3333333334, ans=0.2 2023-11-19 10:18:08,850 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=689840.0, ans=0.125 2023-11-19 10:18:17,660 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7300, loss[loss=0.08758, simple_loss=0.105, pruned_loss=0.02655, audio_tagging_loss=0.008527, over 14108.00 frames. ], tot_loss[loss=0.08776, simple_loss=0.1065, pruned_loss=0.02388, audio_tagging_loss=0.01064, over 3044230.77 frames. ], batch size: 56, lr: 7.66e-03, grad_scale: 16.0 2023-11-19 10:18:27,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=689973.3333333334, ans=0.0 2023-11-19 10:18:28,695 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.32 vs. limit=22.5 2023-11-19 10:18:42,972 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.905e+01 8.177e+01 8.479e+01 9.252e+01 1.452e+02, threshold=1.696e+02, percent-clipped=0.0 2023-11-19 10:19:04,348 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=690173.3333333334, ans=0.125 2023-11-19 10:19:12,530 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7350, loss[loss=0.07194, simple_loss=0.08729, pruned_loss=0.01807, audio_tagging_loss=0.01022, over 14072.00 frames. ], tot_loss[loss=0.087, simple_loss=0.1056, pruned_loss=0.0237, audio_tagging_loss=0.01052, over 3039489.48 frames. ], batch size: 53, lr: 7.66e-03, grad_scale: 16.0 2023-11-19 10:19:29,809 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=690306.6666666666, ans=0.125 2023-11-19 10:19:33,459 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=690306.6666666666, ans=0.2 2023-11-19 10:19:46,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=690440.0, ans=0.1 2023-11-19 10:19:58,105 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2023-11-19 10:20:08,407 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7400, loss[loss=0.1024, simple_loss=0.1308, pruned_loss=0.02907, audio_tagging_loss=0.007957, over 15637.00 frames. ], tot_loss[loss=0.08725, simple_loss=0.106, pruned_loss=0.02378, audio_tagging_loss=0.01045, over 3043049.98 frames. ], batch size: 56, lr: 7.66e-03, grad_scale: 16.0 2023-11-19 10:20:30,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=690706.6666666666, ans=0.0 2023-11-19 10:20:34,035 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.592e+01 8.284e+01 9.235e+01 1.033e+02 1.364e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 10:20:37,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=690706.6666666666, ans=0.125 2023-11-19 10:20:42,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2023-11-19 10:21:04,105 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7450, loss[loss=0.07294, simple_loss=0.08687, pruned_loss=0.01834, audio_tagging_loss=0.01116, over 15054.00 frames. ], tot_loss[loss=0.08775, simple_loss=0.1066, pruned_loss=0.02411, audio_tagging_loss=0.01034, over 3046695.30 frames. ], batch size: 57, lr: 7.65e-03, grad_scale: 16.0 2023-11-19 10:21:18,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=690973.3333333334, ans=0.05 2023-11-19 10:21:21,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=690973.3333333334, ans=0.125 2023-11-19 10:21:24,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=690973.3333333334, ans=0.1 2023-11-19 10:21:28,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.47 vs. limit=5.0 2023-11-19 10:21:30,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=691040.0, ans=0.0 2023-11-19 10:21:45,606 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=12.0 2023-11-19 10:21:54,500 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=691173.3333333334, ans=0.0 2023-11-19 10:21:56,398 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=691173.3333333334, ans=0.125 2023-11-19 10:21:56,539 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=691173.3333333334, ans=0.0 2023-11-19 10:21:59,374 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7500, loss[loss=0.08849, simple_loss=0.1013, pruned_loss=0.02501, audio_tagging_loss=0.01285, over 14855.00 frames. ], tot_loss[loss=0.08658, simple_loss=0.1053, pruned_loss=0.0236, audio_tagging_loss=0.01032, over 3049189.84 frames. ], batch size: 57, lr: 7.65e-03, grad_scale: 16.0 2023-11-19 10:22:21,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=691373.3333333334, ans=0.1 2023-11-19 10:22:25,171 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.501e+01 8.537e+01 9.196e+01 9.974e+01 1.563e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-19 10:22:36,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=691440.0, ans=0.125 2023-11-19 10:22:48,611 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=691506.6666666666, ans=0.125 2023-11-19 10:22:48,723 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=691506.6666666666, ans=0.125 2023-11-19 10:22:54,768 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7550, loss[loss=0.0839, simple_loss=0.0985, pruned_loss=0.0221, audio_tagging_loss=0.01255, over 15493.00 frames. ], tot_loss[loss=0.08684, simple_loss=0.1052, pruned_loss=0.02383, audio_tagging_loss=0.01041, over 3048626.56 frames. ], batch size: 57, lr: 7.65e-03, grad_scale: 16.0 2023-11-19 10:23:22,995 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=691706.6666666666, ans=0.2 2023-11-19 10:23:37,273 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=691773.3333333334, ans=0.0 2023-11-19 10:23:50,749 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7600, loss[loss=0.08096, simple_loss=0.0998, pruned_loss=0.02268, audio_tagging_loss=0.008386, over 15199.00 frames. ], tot_loss[loss=0.08664, simple_loss=0.105, pruned_loss=0.02376, audio_tagging_loss=0.01038, over 3044097.28 frames. ], batch size: 56, lr: 7.65e-03, grad_scale: 32.0 2023-11-19 10:23:51,990 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=691906.6666666666, ans=0.125 2023-11-19 10:24:13,359 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:24:15,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=692040.0, ans=0.0 2023-11-19 10:24:16,209 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.367e+01 9.110e+01 1.007e+02 1.295e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-19 10:24:23,651 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2023-11-19 10:24:27,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=692106.6666666666, ans=22.5 2023-11-19 10:24:38,646 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=692173.3333333334, ans=0.04949747468305833 2023-11-19 10:24:41,092 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2023-11-19 10:24:46,401 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7650, loss[loss=0.07477, simple_loss=0.08535, pruned_loss=0.02116, audio_tagging_loss=0.01093, over 14956.00 frames. ], tot_loss[loss=0.0869, simple_loss=0.1054, pruned_loss=0.02373, audio_tagging_loss=0.01045, over 3050860.95 frames. ], batch size: 57, lr: 7.65e-03, grad_scale: 16.0 2023-11-19 10:25:13,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=692373.3333333334, ans=0.1 2023-11-19 10:25:30,398 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.22 vs. limit=22.5 2023-11-19 10:25:42,036 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7700, loss[loss=0.07192, simple_loss=0.08041, pruned_loss=0.0197, audio_tagging_loss=0.01202, over 15109.00 frames. ], tot_loss[loss=0.08753, simple_loss=0.1063, pruned_loss=0.024, audio_tagging_loss=0.01036, over 3047279.74 frames. ], batch size: 57, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:25:57,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=692640.0, ans=0.0 2023-11-19 10:26:08,253 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.466e+01 9.076e+01 9.722e+01 1.155e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-19 10:26:19,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=692773.3333333334, ans=0.2 2023-11-19 10:26:32,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=692840.0, ans=0.0 2023-11-19 10:26:35,267 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=692840.0, ans=0.0 2023-11-19 10:26:38,247 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7750, loss[loss=0.07571, simple_loss=0.08356, pruned_loss=0.02062, audio_tagging_loss=0.01331, over 14670.00 frames. ], tot_loss[loss=0.08774, simple_loss=0.1065, pruned_loss=0.024, audio_tagging_loss=0.0105, over 3043867.16 frames. ], batch size: 57, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:27:32,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=693240.0, ans=0.125 2023-11-19 10:27:33,224 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7800, loss[loss=0.07433, simple_loss=0.0834, pruned_loss=0.01808, audio_tagging_loss=0.01454, over 15121.00 frames. ], tot_loss[loss=0.08752, simple_loss=0.1061, pruned_loss=0.02393, audio_tagging_loss=0.01053, over 3040908.27 frames. ], batch size: 57, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:27:42,911 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=693240.0, ans=0.125 2023-11-19 10:28:02,856 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.622e+01 9.449e+01 1.060e+02 1.939e+02, threshold=1.890e+02, percent-clipped=1.0 2023-11-19 10:28:09,534 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=693440.0, ans=0.125 2023-11-19 10:28:26,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=693506.6666666666, ans=0.09899494936611666 2023-11-19 10:28:26,395 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=693506.6666666666, ans=0.0 2023-11-19 10:28:31,426 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7850, loss[loss=0.1045, simple_loss=0.1325, pruned_loss=0.02731, audio_tagging_loss=0.01097, over 15576.00 frames. ], tot_loss[loss=0.08792, simple_loss=0.1068, pruned_loss=0.02392, audio_tagging_loss=0.01059, over 3037748.99 frames. ], batch size: 55, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:28:44,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=693640.0, ans=0.125 2023-11-19 10:29:01,774 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2023-11-19 10:29:13,389 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.99 vs. limit=10.0 2023-11-19 10:29:21,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=693840.0, ans=0.0 2023-11-19 10:29:27,564 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7900, loss[loss=0.07897, simple_loss=0.09626, pruned_loss=0.02002, audio_tagging_loss=0.01082, over 15441.00 frames. ], tot_loss[loss=0.08813, simple_loss=0.1068, pruned_loss=0.02402, audio_tagging_loss=0.0107, over 3040430.23 frames. ], batch size: 59, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:29:37,602 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-19 10:29:46,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=693973.3333333334, ans=0.0 2023-11-19 10:29:52,169 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2023-11-19 10:29:52,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=694040.0, ans=0.0 2023-11-19 10:29:53,845 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.703e+01 8.460e+01 9.050e+01 1.000e+02 1.219e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 10:30:05,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=694106.6666666666, ans=0.0 2023-11-19 10:30:05,626 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2023-11-19 10:30:06,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=694106.6666666666, ans=0.125 2023-11-19 10:30:07,890 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:30:10,090 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=694106.6666666666, ans=0.125 2023-11-19 10:30:22,403 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 7950, loss[loss=0.08005, simple_loss=0.09335, pruned_loss=0.02203, audio_tagging_loss=0.01135, over 14552.00 frames. ], tot_loss[loss=0.08796, simple_loss=0.1064, pruned_loss=0.02394, audio_tagging_loss=0.01085, over 3045077.10 frames. ], batch size: 56, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:30:36,306 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:31:02,318 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.99 vs. limit=22.5 2023-11-19 10:31:13,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=694506.6666666666, ans=0.0 2023-11-19 10:31:18,668 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8000, loss[loss=0.126, simple_loss=0.1585, pruned_loss=0.04032, audio_tagging_loss=0.006448, over 15372.00 frames. ], tot_loss[loss=0.08791, simple_loss=0.1066, pruned_loss=0.02375, audio_tagging_loss=0.01089, over 3042786.74 frames. ], batch size: 56, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:31:39,318 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=694640.0, ans=0.125 2023-11-19 10:31:45,476 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.170e+01 9.028e+01 9.822e+01 2.160e+02, threshold=1.806e+02, percent-clipped=1.0 2023-11-19 10:32:05,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=694840.0, ans=0.0 2023-11-19 10:32:14,653 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8050, loss[loss=0.09635, simple_loss=0.1119, pruned_loss=0.02853, audio_tagging_loss=0.01189, over 16204.00 frames. ], tot_loss[loss=0.08827, simple_loss=0.107, pruned_loss=0.02389, audio_tagging_loss=0.01088, over 3042451.53 frames. ], batch size: 59, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:32:17,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=694906.6666666666, ans=0.025 2023-11-19 10:32:22,096 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2023-11-19 10:32:39,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=695040.0, ans=0.09899494936611666 2023-11-19 10:32:41,031 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=695040.0, ans=0.5 2023-11-19 10:32:42,138 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:32:51,238 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=695106.6666666666, ans=0.1 2023-11-19 10:33:05,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=695173.3333333334, ans=0.0 2023-11-19 10:33:05,305 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.00 vs. limit=10.0 2023-11-19 10:33:10,030 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8100, loss[loss=0.07812, simple_loss=0.08888, pruned_loss=0.02253, audio_tagging_loss=0.01115, over 15183.00 frames. ], tot_loss[loss=0.08778, simple_loss=0.1064, pruned_loss=0.02374, audio_tagging_loss=0.01086, over 3042252.72 frames. ], batch size: 58, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:33:16,468 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=695240.0, ans=0.125 2023-11-19 10:33:19,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2023-11-19 10:33:26,651 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=695306.6666666666, ans=0.125 2023-11-19 10:33:27,577 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=695306.6666666666, ans=0.125 2023-11-19 10:33:37,037 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.784e+01 8.319e+01 9.007e+01 9.983e+01 1.168e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-19 10:33:50,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=695440.0, ans=0.125 2023-11-19 10:33:56,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=695506.6666666666, ans=0.07 2023-11-19 10:34:05,392 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8150, loss[loss=0.0945, simple_loss=0.1179, pruned_loss=0.02348, audio_tagging_loss=0.01206, over 15457.00 frames. ], tot_loss[loss=0.08763, simple_loss=0.1064, pruned_loss=0.02374, audio_tagging_loss=0.01071, over 3044448.82 frames. ], batch size: 55, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:34:25,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=695640.0, ans=0.0 2023-11-19 10:34:29,841 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=12.0 2023-11-19 10:34:46,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=695773.3333333334, ans=0.0 2023-11-19 10:34:50,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=695840.0, ans=0.1 2023-11-19 10:34:53,711 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=695840.0, ans=0.125 2023-11-19 10:35:01,175 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8200, loss[loss=0.1114, simple_loss=0.1305, pruned_loss=0.03511, audio_tagging_loss=0.01104, over 15321.00 frames. ], tot_loss[loss=0.08726, simple_loss=0.1059, pruned_loss=0.02368, audio_tagging_loss=0.01065, over 3048407.68 frames. ], batch size: 56, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:35:02,258 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:35:07,113 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:35:27,077 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.672e+01 8.400e+01 8.844e+01 9.876e+01 1.152e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 10:35:40,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=696106.6666666666, ans=0.2 2023-11-19 10:35:48,447 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=696173.3333333334, ans=0.125 2023-11-19 10:35:56,651 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8250, loss[loss=0.06422, simple_loss=0.08086, pruned_loss=0.01603, audio_tagging_loss=0.007761, over 15878.00 frames. ], tot_loss[loss=0.0872, simple_loss=0.106, pruned_loss=0.02368, audio_tagging_loss=0.0105, over 3051866.82 frames. ], batch size: 62, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:36:00,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=696240.0, ans=0.2 2023-11-19 10:36:00,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=696240.0, ans=0.1 2023-11-19 10:36:42,235 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=696506.6666666666, ans=0.0 2023-11-19 10:36:52,154 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8300, loss[loss=0.05772, simple_loss=0.06803, pruned_loss=0.01239, audio_tagging_loss=0.01132, over 13877.00 frames. ], tot_loss[loss=0.08699, simple_loss=0.106, pruned_loss=0.02361, audio_tagging_loss=0.01039, over 3050771.85 frames. ], batch size: 55, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:36:52,444 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=696573.3333333334, ans=0.2 2023-11-19 10:37:01,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=696640.0, ans=0.125 2023-11-19 10:37:03,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2023-11-19 10:37:05,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=696640.0, ans=0.125 2023-11-19 10:37:18,735 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.396e+01 9.218e+01 1.018e+02 1.275e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 10:37:21,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=696706.6666666666, ans=0.0 2023-11-19 10:37:34,324 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=696773.3333333334, ans=0.95 2023-11-19 10:37:43,729 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:37:44,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=696840.0, ans=0.125 2023-11-19 10:37:44,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=696840.0, ans=0.125 2023-11-19 10:37:47,186 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8350, loss[loss=0.0903, simple_loss=0.1065, pruned_loss=0.02337, audio_tagging_loss=0.01369, over 13564.00 frames. ], tot_loss[loss=0.08661, simple_loss=0.1055, pruned_loss=0.02342, audio_tagging_loss=0.01043, over 3045732.16 frames. ], batch size: 52, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:38:01,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=696973.3333333334, ans=0.1 2023-11-19 10:38:10,594 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=697040.0, ans=0.125 2023-11-19 10:38:24,895 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:38:40,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=697173.3333333334, ans=0.1 2023-11-19 10:38:43,137 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8400, loss[loss=0.09629, simple_loss=0.1144, pruned_loss=0.02751, audio_tagging_loss=0.01156, over 14501.00 frames. ], tot_loss[loss=0.08683, simple_loss=0.106, pruned_loss=0.02343, audio_tagging_loss=0.01039, over 3045233.43 frames. ], batch size: 56, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:39:09,204 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.657e+01 8.184e+01 9.115e+01 9.863e+01 1.459e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-19 10:39:31,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=697506.6666666666, ans=0.0 2023-11-19 10:39:37,755 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8450, loss[loss=0.05877, simple_loss=0.06813, pruned_loss=0.0135, audio_tagging_loss=0.0112, over 14561.00 frames. ], tot_loss[loss=0.08683, simple_loss=0.1059, pruned_loss=0.02342, audio_tagging_loss=0.01048, over 3048514.43 frames. ], batch size: 56, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:39:46,821 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=697573.3333333334, ans=0.1 2023-11-19 10:40:07,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=697706.6666666666, ans=0.125 2023-11-19 10:40:22,703 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=697840.0, ans=0.2 2023-11-19 10:40:31,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2023-11-19 10:40:33,449 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8500, loss[loss=0.08439, simple_loss=0.1019, pruned_loss=0.02486, audio_tagging_loss=0.008581, over 14963.00 frames. ], tot_loss[loss=0.08697, simple_loss=0.106, pruned_loss=0.0235, audio_tagging_loss=0.01047, over 3040183.29 frames. ], batch size: 56, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:40:49,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=697973.3333333334, ans=0.0 2023-11-19 10:40:59,992 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.736e+01 1.015e+02 1.178e+02 2.396e+02, threshold=2.030e+02, percent-clipped=2.0 2023-11-19 10:41:29,470 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8550, loss[loss=0.07887, simple_loss=0.1025, pruned_loss=0.01872, audio_tagging_loss=0.008884, over 15646.00 frames. ], tot_loss[loss=0.08791, simple_loss=0.107, pruned_loss=0.02394, audio_tagging_loss=0.01047, over 3039994.68 frames. ], batch size: 59, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:41:33,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=698240.0, ans=0.2 2023-11-19 10:41:49,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=698306.6666666666, ans=0.2 2023-11-19 10:42:10,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=698440.0, ans=0.0 2023-11-19 10:42:17,824 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=698506.6666666666, ans=0.0 2023-11-19 10:42:20,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2023-11-19 10:42:23,873 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8600, loss[loss=0.08742, simple_loss=0.1102, pruned_loss=0.02467, audio_tagging_loss=0.007635, over 15072.00 frames. ], tot_loss[loss=0.08735, simple_loss=0.106, pruned_loss=0.02381, audio_tagging_loss=0.01054, over 3034484.78 frames. ], batch size: 56, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:42:34,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=698640.0, ans=0.1 2023-11-19 10:42:50,814 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.955e+01 8.414e+01 9.085e+01 1.004e+02 1.428e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 10:42:52,176 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=698706.6666666666, ans=0.5 2023-11-19 10:42:56,877 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=698773.3333333334, ans=0.125 2023-11-19 10:43:12,282 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=698840.0, ans=0.2 2023-11-19 10:43:13,349 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=698840.0, ans=0.2 2023-11-19 10:43:19,390 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8650, loss[loss=0.08696, simple_loss=0.1101, pruned_loss=0.02097, audio_tagging_loss=0.01092, over 15250.00 frames. ], tot_loss[loss=0.08764, simple_loss=0.1065, pruned_loss=0.02383, audio_tagging_loss=0.01057, over 3034979.49 frames. ], batch size: 57, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:43:36,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=698973.3333333334, ans=0.0 2023-11-19 10:43:36,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=15.0 2023-11-19 10:43:43,316 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=699040.0, ans=10.0 2023-11-19 10:43:58,741 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2023-11-19 10:43:59,392 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=699106.6666666666, ans=0.125 2023-11-19 10:44:15,097 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8700, loss[loss=0.1142, simple_loss=0.1315, pruned_loss=0.03482, audio_tagging_loss=0.01365, over 14557.00 frames. ], tot_loss[loss=0.08755, simple_loss=0.1063, pruned_loss=0.02373, audio_tagging_loss=0.01069, over 3039316.55 frames. ], batch size: 54, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:44:23,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=699240.0, ans=0.125 2023-11-19 10:44:35,436 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=699306.6666666666, ans=0.2 2023-11-19 10:44:41,442 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.962e+01 8.409e+01 9.264e+01 1.013e+02 1.808e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-19 10:44:45,276 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=699373.3333333334, ans=0.125 2023-11-19 10:44:48,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=699440.0, ans=0.0 2023-11-19 10:44:51,858 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=699440.0, ans=0.05 2023-11-19 10:45:10,502 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8750, loss[loss=0.1158, simple_loss=0.1375, pruned_loss=0.03813, audio_tagging_loss=0.008931, over 15705.00 frames. ], tot_loss[loss=0.08797, simple_loss=0.1065, pruned_loss=0.02399, audio_tagging_loss=0.01072, over 3043732.48 frames. ], batch size: 58, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:45:10,775 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=699573.3333333334, ans=0.1 2023-11-19 10:45:25,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=699640.0, ans=0.125 2023-11-19 10:45:26,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2023-11-19 10:45:51,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=699773.3333333334, ans=0.125 2023-11-19 10:46:04,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=699906.6666666666, ans=0.0 2023-11-19 10:46:05,641 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8800, loss[loss=0.1148, simple_loss=0.1495, pruned_loss=0.03208, audio_tagging_loss=0.007948, over 15774.00 frames. ], tot_loss[loss=0.08869, simple_loss=0.1076, pruned_loss=0.02421, audio_tagging_loss=0.0107, over 3048463.30 frames. ], batch size: 57, lr: 7.60e-03, grad_scale: 32.0 2023-11-19 10:46:11,239 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2023-11-19 10:46:11,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=699906.6666666666, ans=0.0 2023-11-19 10:46:33,880 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.476e+01 9.103e+01 9.978e+01 1.212e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-19 10:47:01,767 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8850, loss[loss=0.06528, simple_loss=0.07838, pruned_loss=0.0118, audio_tagging_loss=0.01429, over 16780.00 frames. ], tot_loss[loss=0.08842, simple_loss=0.1074, pruned_loss=0.02398, audio_tagging_loss=0.01071, over 3057706.34 frames. ], batch size: 65, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:47:04,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=700240.0, ans=0.0 2023-11-19 10:47:10,488 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=700240.0, ans=0.0 2023-11-19 10:47:12,252 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:47:20,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=700306.6666666666, ans=0.0 2023-11-19 10:47:22,218 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2023-11-19 10:47:56,185 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8900, loss[loss=0.08581, simple_loss=0.1076, pruned_loss=0.02191, audio_tagging_loss=0.01009, over 14852.00 frames. ], tot_loss[loss=0.08814, simple_loss=0.107, pruned_loss=0.02394, audio_tagging_loss=0.01069, over 3054963.20 frames. ], batch size: 55, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:48:00,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.56 vs. limit=15.0 2023-11-19 10:48:02,302 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=700573.3333333334, ans=0.0 2023-11-19 10:48:25,771 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.967e+01 8.384e+01 9.220e+01 1.025e+02 1.340e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 10:48:32,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=700773.3333333334, ans=0.0 2023-11-19 10:48:52,100 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 8950, loss[loss=0.07293, simple_loss=0.08575, pruned_loss=0.01808, audio_tagging_loss=0.01197, over 16112.00 frames. ], tot_loss[loss=0.0877, simple_loss=0.1062, pruned_loss=0.02399, audio_tagging_loss=0.01063, over 3056076.67 frames. ], batch size: 63, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:49:07,141 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=700973.3333333334, ans=0.1 2023-11-19 10:49:19,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=701040.0, ans=0.2 2023-11-19 10:49:36,857 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-11-19 10:49:37,592 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=701173.3333333334, ans=0.05 2023-11-19 10:49:47,835 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9000, loss[loss=0.1004, simple_loss=0.1273, pruned_loss=0.02772, audio_tagging_loss=0.009024, over 16233.00 frames. ], tot_loss[loss=0.08779, simple_loss=0.1064, pruned_loss=0.02404, audio_tagging_loss=0.01055, over 3048959.49 frames. ], batch size: 63, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:49:47,836 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 10:50:13,122 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9692, 3.1471, 2.9480, 3.0422, 3.3972, 2.6053, 3.1903, 2.7408], device='cuda:1') 2023-11-19 10:50:13,902 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4100, 3.7542, 2.1924, 3.6167], device='cuda:1') 2023-11-19 10:50:20,507 INFO [train_asr.py:1147] (1/4) Epoch 9, validation: loss=0.06655, simple_loss=0.05588, pruned_loss=0.006694, audio_tagging_loss=0.03192, over 4681554.00 frames. 2023-11-19 10:50:20,508 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 10:50:22,374 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.92 vs. limit=22.5 2023-11-19 10:50:24,511 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=701240.0, ans=0.0 2023-11-19 10:50:50,124 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.156e+01 8.217e+01 9.187e+01 1.011e+02 1.342e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-19 10:50:51,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-11-19 10:51:16,390 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9050, loss[loss=0.1008, simple_loss=0.1219, pruned_loss=0.02559, audio_tagging_loss=0.01422, over 15001.00 frames. ], tot_loss[loss=0.08828, simple_loss=0.1076, pruned_loss=0.02414, audio_tagging_loss=0.01036, over 3046730.33 frames. ], batch size: 56, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:51:22,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=701573.3333333334, ans=0.04949747468305833 2023-11-19 10:51:51,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=701773.3333333334, ans=0.125 2023-11-19 10:51:55,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=701773.3333333334, ans=0.0 2023-11-19 10:52:10,452 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=701840.0, ans=0.1 2023-11-19 10:52:11,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=701906.6666666666, ans=0.125 2023-11-19 10:52:12,363 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9100, loss[loss=0.06511, simple_loss=0.08097, pruned_loss=0.01459, audio_tagging_loss=0.01004, over 14514.00 frames. ], tot_loss[loss=0.08815, simple_loss=0.1077, pruned_loss=0.02399, audio_tagging_loss=0.01031, over 3055096.36 frames. ], batch size: 57, lr: 7.59e-03, grad_scale: 8.0 2023-11-19 10:52:14,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=701906.6666666666, ans=0.1 2023-11-19 10:52:14,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=701906.6666666666, ans=0.125 2023-11-19 10:52:41,734 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.498e+01 9.044e+01 9.975e+01 1.337e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-19 10:52:51,214 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.80 vs. limit=15.0 2023-11-19 10:52:58,036 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=702173.3333333334, ans=0.0 2023-11-19 10:53:07,270 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9150, loss[loss=0.07667, simple_loss=0.09224, pruned_loss=0.01899, audio_tagging_loss=0.01156, over 14905.00 frames. ], tot_loss[loss=0.08867, simple_loss=0.1087, pruned_loss=0.02408, audio_tagging_loss=0.01023, over 3056776.44 frames. ], batch size: 57, lr: 7.59e-03, grad_scale: 8.0 2023-11-19 10:53:07,571 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:53:15,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=702240.0, ans=0.125 2023-11-19 10:53:19,056 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.46 vs. limit=22.5 2023-11-19 10:53:19,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=702306.6666666666, ans=0.125 2023-11-19 10:53:22,672 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2023-11-19 10:53:31,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=702373.3333333334, ans=0.125 2023-11-19 10:54:01,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=702573.3333333334, ans=0.125 2023-11-19 10:54:02,854 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9200, loss[loss=0.08542, simple_loss=0.1028, pruned_loss=0.02297, audio_tagging_loss=0.01107, over 14544.00 frames. ], tot_loss[loss=0.08829, simple_loss=0.1078, pruned_loss=0.02415, audio_tagging_loss=0.01024, over 3044944.09 frames. ], batch size: 56, lr: 7.59e-03, grad_scale: 16.0 2023-11-19 10:54:13,680 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=702640.0, ans=0.0 2023-11-19 10:54:21,058 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=702640.0, ans=0.125 2023-11-19 10:54:25,675 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.04 vs. limit=15.0 2023-11-19 10:54:26,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=702706.6666666666, ans=0.125 2023-11-19 10:54:33,448 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.309e+01 9.173e+01 1.001e+02 1.258e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 10:54:54,225 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=702840.0, ans=0.125 2023-11-19 10:54:59,953 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9250, loss[loss=0.09516, simple_loss=0.1171, pruned_loss=0.02626, audio_tagging_loss=0.01035, over 15469.00 frames. ], tot_loss[loss=0.08792, simple_loss=0.1073, pruned_loss=0.02407, audio_tagging_loss=0.01021, over 3040808.18 frames. ], batch size: 57, lr: 7.59e-03, grad_scale: 16.0 2023-11-19 10:55:27,513 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.54 vs. limit=15.0 2023-11-19 10:55:44,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=15.0 2023-11-19 10:55:54,233 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9300, loss[loss=0.09112, simple_loss=0.1109, pruned_loss=0.02587, audio_tagging_loss=0.009796, over 14169.00 frames. ], tot_loss[loss=0.0879, simple_loss=0.1073, pruned_loss=0.02407, audio_tagging_loss=0.01019, over 3042938.49 frames. ], batch size: 53, lr: 7.59e-03, grad_scale: 16.0 2023-11-19 10:55:59,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=703240.0, ans=0.0 2023-11-19 10:55:59,393 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=703240.0, ans=0.2 2023-11-19 10:56:04,610 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:56:07,952 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=703306.6666666666, ans=0.1 2023-11-19 10:56:25,077 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.507e+01 9.203e+01 9.999e+01 1.405e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 10:56:35,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=703440.0, ans=0.1 2023-11-19 10:56:39,421 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2023-11-19 10:56:50,080 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9350, loss[loss=0.08941, simple_loss=0.111, pruned_loss=0.02377, audio_tagging_loss=0.01017, over 15796.00 frames. ], tot_loss[loss=0.0878, simple_loss=0.1071, pruned_loss=0.02395, audio_tagging_loss=0.01032, over 3043637.40 frames. ], batch size: 57, lr: 7.59e-03, grad_scale: 16.0 2023-11-19 10:56:53,417 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=703573.3333333334, ans=0.125 2023-11-19 10:56:57,192 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=703573.3333333334, ans=0.0 2023-11-19 10:57:25,443 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.86 vs. limit=10.0 2023-11-19 10:57:29,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=703773.3333333334, ans=0.125 2023-11-19 10:57:34,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=703840.0, ans=0.2 2023-11-19 10:57:46,474 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9400, loss[loss=0.1126, simple_loss=0.1329, pruned_loss=0.0375, audio_tagging_loss=0.008704, over 16007.00 frames. ], tot_loss[loss=0.08838, simple_loss=0.1075, pruned_loss=0.02417, audio_tagging_loss=0.01044, over 3048874.98 frames. ], batch size: 57, lr: 7.58e-03, grad_scale: 16.0 2023-11-19 10:58:10,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=704040.0, ans=0.125 2023-11-19 10:58:15,736 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.579e+01 9.388e+01 1.029e+02 1.331e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-19 10:58:26,146 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=704106.6666666666, ans=0.2 2023-11-19 10:58:32,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=704173.3333333334, ans=0.125 2023-11-19 10:58:34,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=704173.3333333334, ans=0.0 2023-11-19 10:58:36,289 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.03 vs. limit=22.5 2023-11-19 10:58:39,702 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:58:40,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=704240.0, ans=0.1 2023-11-19 10:58:41,779 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9450, loss[loss=0.08042, simple_loss=0.09073, pruned_loss=0.02145, audio_tagging_loss=0.01361, over 15773.00 frames. ], tot_loss[loss=0.08886, simple_loss=0.1079, pruned_loss=0.02435, audio_tagging_loss=0.01058, over 3053196.34 frames. ], batch size: 61, lr: 7.58e-03, grad_scale: 16.0 2023-11-19 10:58:46,373 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:59:27,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=704506.6666666666, ans=0.0 2023-11-19 10:59:35,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=704573.3333333334, ans=0.125 2023-11-19 10:59:36,678 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9500, loss[loss=0.08284, simple_loss=0.09737, pruned_loss=0.02239, audio_tagging_loss=0.01176, over 16002.00 frames. ], tot_loss[loss=0.08796, simple_loss=0.1067, pruned_loss=0.02404, audio_tagging_loss=0.01055, over 3047066.25 frames. ], batch size: 61, lr: 7.58e-03, grad_scale: 16.0 2023-11-19 10:59:39,183 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=12.0 2023-11-19 10:59:54,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=704640.0, ans=0.0 2023-11-19 11:00:02,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2023-11-19 11:00:06,712 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.471e+01 8.371e+01 9.001e+01 9.881e+01 1.664e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 11:00:10,232 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=704773.3333333334, ans=0.125 2023-11-19 11:00:28,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=704840.0, ans=0.125 2023-11-19 11:00:30,454 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2023-11-19 11:00:32,115 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9550, loss[loss=0.07203, simple_loss=0.09067, pruned_loss=0.01604, audio_tagging_loss=0.01065, over 15005.00 frames. ], tot_loss[loss=0.08805, simple_loss=0.1067, pruned_loss=0.02412, audio_tagging_loss=0.0106, over 3041979.23 frames. ], batch size: 56, lr: 7.58e-03, grad_scale: 16.0 2023-11-19 11:00:39,328 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.86 vs. limit=8.0 2023-11-19 11:00:39,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=704906.6666666666, ans=0.125 2023-11-19 11:00:43,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=704973.3333333334, ans=0.125 2023-11-19 11:00:55,139 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=705040.0, ans=0.2 2023-11-19 11:01:12,373 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=705106.6666666666, ans=0.125 2023-11-19 11:01:28,003 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9600, loss[loss=0.1071, simple_loss=0.1349, pruned_loss=0.03056, audio_tagging_loss=0.009073, over 15020.00 frames. ], tot_loss[loss=0.08852, simple_loss=0.1072, pruned_loss=0.02422, audio_tagging_loss=0.01069, over 3042960.08 frames. ], batch size: 56, lr: 7.58e-03, grad_scale: 32.0 2023-11-19 11:01:33,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=705240.0, ans=0.125 2023-11-19 11:01:47,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2023-11-19 11:01:58,228 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.539e+01 8.373e+01 9.061e+01 9.893e+01 1.304e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-19 11:01:59,604 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=705373.3333333334, ans=0.0 2023-11-19 11:02:21,566 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=705506.6666666666, ans=0.125 2023-11-19 11:02:23,456 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9650, loss[loss=0.09039, simple_loss=0.1096, pruned_loss=0.02586, audio_tagging_loss=0.009746, over 16350.00 frames. ], tot_loss[loss=0.08834, simple_loss=0.1071, pruned_loss=0.02407, audio_tagging_loss=0.0107, over 3042353.96 frames. ], batch size: 61, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:02:32,116 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=705573.3333333334, ans=0.125 2023-11-19 11:02:49,432 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:02:57,890 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=705773.3333333334, ans=10.0 2023-11-19 11:02:59,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=705773.3333333334, ans=12.0 2023-11-19 11:03:02,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2023-11-19 11:03:09,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705840.0, ans=0.1 2023-11-19 11:03:17,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=705906.6666666666, ans=0.125 2023-11-19 11:03:18,570 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9700, loss[loss=0.06889, simple_loss=0.08304, pruned_loss=0.0161, audio_tagging_loss=0.01127, over 16051.00 frames. ], tot_loss[loss=0.08727, simple_loss=0.1059, pruned_loss=0.02371, audio_tagging_loss=0.01058, over 3046388.71 frames. ], batch size: 61, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:03:44,727 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.219e-02 2023-11-19 11:03:48,686 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.927e+01 8.495e+01 9.250e+01 1.013e+02 1.601e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-19 11:04:13,979 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9750, loss[loss=0.09293, simple_loss=0.1222, pruned_loss=0.02495, audio_tagging_loss=0.006871, over 15901.00 frames. ], tot_loss[loss=0.08758, simple_loss=0.1065, pruned_loss=0.02381, audio_tagging_loss=0.01053, over 3046396.67 frames. ], batch size: 60, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:04:21,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=706240.0, ans=0.0 2023-11-19 11:04:31,026 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=706306.6666666666, ans=0.1 2023-11-19 11:04:33,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=706306.6666666666, ans=0.2 2023-11-19 11:04:41,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=706373.3333333334, ans=0.125 2023-11-19 11:04:42,118 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=706373.3333333334, ans=0.125 2023-11-19 11:04:47,137 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2023-11-19 11:05:00,507 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.76 vs. limit=6.0 2023-11-19 11:05:04,645 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.41 vs. limit=10.0 2023-11-19 11:05:09,869 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9800, loss[loss=0.06819, simple_loss=0.08107, pruned_loss=0.01599, audio_tagging_loss=0.01166, over 15077.00 frames. ], tot_loss[loss=0.08757, simple_loss=0.1065, pruned_loss=0.02387, audio_tagging_loss=0.01045, over 3043323.25 frames. ], batch size: 57, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:05:11,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=706573.3333333334, ans=0.1 2023-11-19 11:05:13,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=706573.3333333334, ans=0.07 2023-11-19 11:05:14,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=706573.3333333334, ans=0.125 2023-11-19 11:05:16,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=706573.3333333334, ans=0.0 2023-11-19 11:05:19,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=706640.0, ans=0.125 2023-11-19 11:05:24,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=706640.0, ans=0.09899494936611666 2023-11-19 11:05:26,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=706640.0, ans=0.125 2023-11-19 11:05:28,694 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.06 vs. limit=22.5 2023-11-19 11:05:31,437 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=706706.6666666666, ans=0.0 2023-11-19 11:05:32,937 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.69 vs. limit=15.0 2023-11-19 11:05:39,679 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.318e+01 9.223e+01 1.040e+02 1.482e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 11:05:55,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=706840.0, ans=0.1 2023-11-19 11:05:56,211 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-11-19 11:05:59,842 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:06:05,691 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9850, loss[loss=0.0932, simple_loss=0.1047, pruned_loss=0.03002, audio_tagging_loss=0.01084, over 14906.00 frames. ], tot_loss[loss=0.08754, simple_loss=0.1064, pruned_loss=0.02393, audio_tagging_loss=0.0104, over 3043784.26 frames. ], batch size: 56, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:06:19,990 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:06:33,826 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=707040.0, ans=0.125 2023-11-19 11:06:41,580 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.91 vs. limit=22.5 2023-11-19 11:06:48,373 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2023-11-19 11:06:57,129 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=707173.3333333334, ans=0.2 2023-11-19 11:06:57,615 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-19 11:07:01,183 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9900, loss[loss=0.07519, simple_loss=0.09013, pruned_loss=0.01476, audio_tagging_loss=0.01536, over 15981.00 frames. ], tot_loss[loss=0.08733, simple_loss=0.1061, pruned_loss=0.02385, audio_tagging_loss=0.01042, over 3041241.58 frames. ], batch size: 61, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:07:11,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=707306.6666666666, ans=0.07 2023-11-19 11:07:19,464 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=707306.6666666666, ans=0.2 2023-11-19 11:07:22,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=707373.3333333334, ans=0.125 2023-11-19 11:07:31,311 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.468e+01 8.507e+01 9.296e+01 1.006e+02 1.418e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 11:07:31,620 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=707373.3333333334, ans=0.125 2023-11-19 11:07:46,797 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=707506.6666666666, ans=0.5 2023-11-19 11:07:49,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=707506.6666666666, ans=0.0 2023-11-19 11:07:50,142 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=707506.6666666666, ans=0.0 2023-11-19 11:07:51,141 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.937e-02 2023-11-19 11:07:54,945 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=707506.6666666666, ans=0.04949747468305833 2023-11-19 11:07:56,507 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=707573.3333333334, ans=0.125 2023-11-19 11:07:57,329 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 9950, loss[loss=0.08251, simple_loss=0.11, pruned_loss=0.01894, audio_tagging_loss=0.008594, over 16507.00 frames. ], tot_loss[loss=0.08751, simple_loss=0.1061, pruned_loss=0.024, audio_tagging_loss=0.01047, over 3049506.09 frames. ], batch size: 61, lr: 7.56e-03, grad_scale: 32.0 2023-11-19 11:08:05,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=707573.3333333334, ans=0.0 2023-11-19 11:08:06,838 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=707640.0, ans=0.125 2023-11-19 11:08:12,025 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.04 vs. limit=22.5 2023-11-19 11:08:17,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2023-11-19 11:08:29,280 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=707773.3333333334, ans=0.125 2023-11-19 11:08:39,972 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=707773.3333333334, ans=0.125 2023-11-19 11:08:51,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=707906.6666666666, ans=0.125 2023-11-19 11:08:52,383 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10000, loss[loss=0.1219, simple_loss=0.1419, pruned_loss=0.04005, audio_tagging_loss=0.01086, over 15165.00 frames. ], tot_loss[loss=0.08687, simple_loss=0.1055, pruned_loss=0.02367, audio_tagging_loss=0.01043, over 3048822.91 frames. ], batch size: 56, lr: 7.56e-03, grad_scale: 32.0 2023-11-19 11:09:20,833 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=708040.0, ans=0.125 2023-11-19 11:09:20,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=708040.0, ans=0.125 2023-11-19 11:09:23,367 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.949e+01 8.309e+01 8.896e+01 1.009e+02 1.315e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-19 11:09:25,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=708106.6666666666, ans=0.2 2023-11-19 11:09:31,492 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2023-11-19 11:09:49,143 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10050, loss[loss=0.06587, simple_loss=0.07346, pruned_loss=0.01564, audio_tagging_loss=0.0135, over 14587.00 frames. ], tot_loss[loss=0.08675, simple_loss=0.1051, pruned_loss=0.02373, audio_tagging_loss=0.01049, over 3041188.27 frames. ], batch size: 58, lr: 7.56e-03, grad_scale: 32.0 2023-11-19 11:09:51,474 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=708240.0, ans=0.125 2023-11-19 11:09:58,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=708306.6666666666, ans=0.1 2023-11-19 11:10:11,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2023-11-19 11:10:13,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=708373.3333333334, ans=0.125 2023-11-19 11:10:29,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=708440.0, ans=0.125 2023-11-19 11:10:44,152 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10100, loss[loss=0.09211, simple_loss=0.1147, pruned_loss=0.0241, audio_tagging_loss=0.01068, over 15601.00 frames. ], tot_loss[loss=0.08682, simple_loss=0.1053, pruned_loss=0.02371, audio_tagging_loss=0.01045, over 3050142.32 frames. ], batch size: 55, lr: 7.56e-03, grad_scale: 32.0 2023-11-19 11:10:58,315 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-19 11:11:11,308 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2023-11-19 11:11:15,749 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.397e+01 8.991e+01 1.000e+02 1.217e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 11:11:29,051 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:11:40,079 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10150, loss[loss=0.08585, simple_loss=0.09551, pruned_loss=0.02701, audio_tagging_loss=0.01109, over 15530.00 frames. ], tot_loss[loss=0.08713, simple_loss=0.1055, pruned_loss=0.0238, audio_tagging_loss=0.01058, over 3052719.32 frames. ], batch size: 60, lr: 7.56e-03, grad_scale: 16.0 2023-11-19 11:11:41,812 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2023-11-19 11:11:52,999 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:12:05,490 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:12:11,415 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=709040.0, ans=0.2 2023-11-19 11:12:36,051 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10200, loss[loss=0.1191, simple_loss=0.1493, pruned_loss=0.03397, audio_tagging_loss=0.01047, over 15997.00 frames. ], tot_loss[loss=0.08682, simple_loss=0.1051, pruned_loss=0.02364, audio_tagging_loss=0.0106, over 3057837.88 frames. ], batch size: 59, lr: 7.56e-03, grad_scale: 16.0 2023-11-19 11:12:39,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=709240.0, ans=0.125 2023-11-19 11:12:48,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2023-11-19 11:12:56,014 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.98 vs. limit=15.0 2023-11-19 11:12:56,599 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:13:06,567 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.611e+01 9.302e+01 1.032e+02 1.464e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 11:13:18,818 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2023-11-19 11:13:25,825 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=709506.6666666666, ans=0.0 2023-11-19 11:13:30,927 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10250, loss[loss=0.05728, simple_loss=0.06546, pruned_loss=0.01283, audio_tagging_loss=0.01172, over 14364.00 frames. ], tot_loss[loss=0.08725, simple_loss=0.1058, pruned_loss=0.02376, audio_tagging_loss=0.01059, over 3054856.81 frames. ], batch size: 56, lr: 7.55e-03, grad_scale: 16.0 2023-11-19 11:13:52,811 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=709706.6666666666, ans=0.125 2023-11-19 11:13:57,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=709706.6666666666, ans=0.0 2023-11-19 11:14:12,732 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2023-11-19 11:14:16,286 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.65 vs. limit=15.0 2023-11-19 11:14:26,447 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10300, loss[loss=0.09104, simple_loss=0.1012, pruned_loss=0.02634, audio_tagging_loss=0.01411, over 14722.00 frames. ], tot_loss[loss=0.08794, simple_loss=0.1065, pruned_loss=0.02407, audio_tagging_loss=0.01064, over 3056862.36 frames. ], batch size: 56, lr: 7.55e-03, grad_scale: 16.0 2023-11-19 11:14:39,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=709973.3333333334, ans=0.0 2023-11-19 11:14:54,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=710040.0, ans=0.0 2023-11-19 11:14:58,177 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 8.163e+01 8.880e+01 9.962e+01 1.363e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-19 11:14:59,669 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=15.0 2023-11-19 11:15:04,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.04 vs. limit=10.0 2023-11-19 11:15:04,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=710106.6666666666, ans=0.125 2023-11-19 11:15:10,011 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=710173.3333333334, ans=0.0 2023-11-19 11:15:23,145 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10350, loss[loss=0.09542, simple_loss=0.1214, pruned_loss=0.02719, audio_tagging_loss=0.007515, over 15125.00 frames. ], tot_loss[loss=0.08739, simple_loss=0.1056, pruned_loss=0.02375, audio_tagging_loss=0.01085, over 3053656.98 frames. ], batch size: 56, lr: 7.55e-03, grad_scale: 16.0 2023-11-19 11:15:27,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=710240.0, ans=0.125 2023-11-19 11:15:28,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=710240.0, ans=0.0 2023-11-19 11:15:49,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=710373.3333333334, ans=0.2 2023-11-19 11:15:50,405 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=710373.3333333334, ans=0.2 2023-11-19 11:16:18,293 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10400, loss[loss=0.07007, simple_loss=0.07642, pruned_loss=0.01477, audio_tagging_loss=0.0171, over 15225.00 frames. ], tot_loss[loss=0.08677, simple_loss=0.1049, pruned_loss=0.02342, audio_tagging_loss=0.01091, over 3052616.87 frames. ], batch size: 57, lr: 7.55e-03, grad_scale: 32.0 2023-11-19 11:16:32,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=710640.0, ans=0.1 2023-11-19 11:16:38,653 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2023-11-19 11:16:41,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=710706.6666666666, ans=0.2 2023-11-19 11:16:50,350 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.272e+01 8.278e+01 9.028e+01 1.000e+02 1.286e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-19 11:17:02,561 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.57 vs. limit=22.5 2023-11-19 11:17:04,483 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=710840.0, ans=0.2 2023-11-19 11:17:05,997 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=710840.0, ans=0.0 2023-11-19 11:17:14,234 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10450, loss[loss=0.07755, simple_loss=0.09978, pruned_loss=0.01677, audio_tagging_loss=0.01089, over 14945.00 frames. ], tot_loss[loss=0.08752, simple_loss=0.106, pruned_loss=0.02368, audio_tagging_loss=0.01082, over 3059425.08 frames. ], batch size: 54, lr: 7.55e-03, grad_scale: 32.0 2023-11-19 11:17:16,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=710906.6666666666, ans=0.0 2023-11-19 11:17:38,724 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-19 11:17:40,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=711040.0, ans=0.125 2023-11-19 11:17:57,073 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=711106.6666666666, ans=0.0 2023-11-19 11:18:06,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=711173.3333333334, ans=0.125 2023-11-19 11:18:10,460 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10500, loss[loss=0.08878, simple_loss=0.1137, pruned_loss=0.02122, audio_tagging_loss=0.0107, over 15508.00 frames. ], tot_loss[loss=0.08713, simple_loss=0.1061, pruned_loss=0.02347, audio_tagging_loss=0.01062, over 3056167.70 frames. ], batch size: 59, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:18:41,179 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.988e+01 8.288e+01 9.011e+01 9.876e+01 1.227e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 11:18:46,264 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=711440.0, ans=0.2 2023-11-19 11:18:49,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=711440.0, ans=0.1 2023-11-19 11:18:49,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=711440.0, ans=0.0 2023-11-19 11:18:53,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=711440.0, ans=0.2 2023-11-19 11:18:59,910 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=711506.6666666666, ans=10.0 2023-11-19 11:19:06,043 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10550, loss[loss=0.0657, simple_loss=0.07485, pruned_loss=0.0193, audio_tagging_loss=0.00897, over 14099.00 frames. ], tot_loss[loss=0.08655, simple_loss=0.1055, pruned_loss=0.02324, audio_tagging_loss=0.01055, over 3054732.63 frames. ], batch size: 53, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:19:15,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=711573.3333333334, ans=0.125 2023-11-19 11:19:52,302 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2023-11-19 11:19:55,843 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.92 vs. limit=6.0 2023-11-19 11:20:00,735 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=711906.6666666666, ans=0.125 2023-11-19 11:20:01,628 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10600, loss[loss=0.1213, simple_loss=0.1464, pruned_loss=0.04104, audio_tagging_loss=0.007011, over 17000.00 frames. ], tot_loss[loss=0.0868, simple_loss=0.1058, pruned_loss=0.02347, audio_tagging_loss=0.01043, over 3049396.95 frames. ], batch size: 62, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:20:06,292 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=711906.6666666666, ans=22.5 2023-11-19 11:20:14,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2023-11-19 11:20:27,509 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=712040.0, ans=0.1 2023-11-19 11:20:32,532 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.185e+01 9.113e+01 1.014e+02 1.245e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-19 11:20:41,832 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=712106.6666666666, ans=0.125 2023-11-19 11:20:48,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=712173.3333333334, ans=0.0 2023-11-19 11:20:56,904 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10650, loss[loss=0.08198, simple_loss=0.09493, pruned_loss=0.02541, audio_tagging_loss=0.009104, over 14802.00 frames. ], tot_loss[loss=0.08628, simple_loss=0.1053, pruned_loss=0.0232, audio_tagging_loss=0.01044, over 3052197.22 frames. ], batch size: 57, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:21:05,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=712240.0, ans=0.1 2023-11-19 11:21:21,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=712373.3333333334, ans=0.125 2023-11-19 11:21:24,411 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=712373.3333333334, ans=0.04949747468305833 2023-11-19 11:21:24,816 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.58 vs. limit=10.0 2023-11-19 11:21:30,780 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=712440.0, ans=0.125 2023-11-19 11:21:53,344 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10700, loss[loss=0.0896, simple_loss=0.1158, pruned_loss=0.02393, audio_tagging_loss=0.007755, over 15912.00 frames. ], tot_loss[loss=0.08627, simple_loss=0.1055, pruned_loss=0.0232, audio_tagging_loss=0.01035, over 3047636.93 frames. ], batch size: 61, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:22:12,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=712640.0, ans=0.0 2023-11-19 11:22:15,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.43 vs. limit=15.0 2023-11-19 11:22:18,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=712706.6666666666, ans=0.0 2023-11-19 11:22:22,493 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2023-11-19 11:22:24,045 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.779e+01 8.116e+01 8.823e+01 9.508e+01 1.570e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-19 11:22:39,230 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=712840.0, ans=0.0 2023-11-19 11:22:49,116 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10750, loss[loss=0.05488, simple_loss=0.06182, pruned_loss=0.01228, audio_tagging_loss=0.01169, over 14409.00 frames. ], tot_loss[loss=0.08622, simple_loss=0.1054, pruned_loss=0.02314, audio_tagging_loss=0.01037, over 3055187.77 frames. ], batch size: 58, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:23:06,974 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.34 vs. limit=15.0 2023-11-19 11:23:08,690 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2023-11-19 11:23:11,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=713040.0, ans=0.1 2023-11-19 11:23:16,470 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=713040.0, ans=0.0 2023-11-19 11:23:20,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=713040.0, ans=0.2 2023-11-19 11:23:29,896 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=713106.6666666666, ans=0.125 2023-11-19 11:23:31,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=713106.6666666666, ans=0.125 2023-11-19 11:23:33,984 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=713173.3333333334, ans=0.125 2023-11-19 11:23:39,610 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.45 vs. limit=6.0 2023-11-19 11:23:44,102 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=713240.0, ans=0.1 2023-11-19 11:23:44,952 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10800, loss[loss=0.09833, simple_loss=0.1262, pruned_loss=0.02786, audio_tagging_loss=0.007388, over 15218.00 frames. ], tot_loss[loss=0.08629, simple_loss=0.1053, pruned_loss=0.02329, audio_tagging_loss=0.01036, over 3050933.58 frames. ], batch size: 56, lr: 7.53e-03, grad_scale: 32.0 2023-11-19 11:23:56,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=713306.6666666666, ans=0.2 2023-11-19 11:24:02,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=713306.6666666666, ans=0.125 2023-11-19 11:24:06,523 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=713373.3333333334, ans=0.125 2023-11-19 11:24:17,317 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.693e+01 8.285e+01 8.937e+01 9.621e+01 1.564e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-19 11:24:32,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=713506.6666666666, ans=0.125 2023-11-19 11:24:40,666 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10850, loss[loss=0.08098, simple_loss=0.1015, pruned_loss=0.01794, audio_tagging_loss=0.01229, over 14258.00 frames. ], tot_loss[loss=0.08655, simple_loss=0.1057, pruned_loss=0.02334, audio_tagging_loss=0.01037, over 3044502.12 frames. ], batch size: 56, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:25:16,863 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.91 vs. limit=15.0 2023-11-19 11:25:19,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=713773.3333333334, ans=0.0 2023-11-19 11:25:23,963 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:25:30,004 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=713840.0, ans=0.1 2023-11-19 11:25:32,925 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:25:36,679 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10900, loss[loss=0.08959, simple_loss=0.1129, pruned_loss=0.02444, audio_tagging_loss=0.008686, over 15308.00 frames. ], tot_loss[loss=0.08642, simple_loss=0.1057, pruned_loss=0.02325, audio_tagging_loss=0.01035, over 3045590.26 frames. ], batch size: 56, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:25:46,270 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:26:02,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=714040.0, ans=0.125 2023-11-19 11:26:06,424 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=714040.0, ans=0.1 2023-11-19 11:26:08,743 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.026e+01 8.550e+01 9.422e+01 1.018e+02 1.383e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-19 11:26:14,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=714106.6666666666, ans=0.0 2023-11-19 11:26:14,188 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=714106.6666666666, ans=0.125 2023-11-19 11:26:26,252 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=714173.3333333334, ans=0.2 2023-11-19 11:26:31,829 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 10950, loss[loss=0.08748, simple_loss=0.1116, pruned_loss=0.02216, audio_tagging_loss=0.009494, over 14771.00 frames. ], tot_loss[loss=0.08722, simple_loss=0.1066, pruned_loss=0.0235, audio_tagging_loss=0.01043, over 3040332.06 frames. ], batch size: 54, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:26:36,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=714240.0, ans=0.0 2023-11-19 11:26:44,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=714306.6666666666, ans=0.125 2023-11-19 11:26:53,050 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=714373.3333333334, ans=0.125 2023-11-19 11:26:56,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=714373.3333333334, ans=0.0 2023-11-19 11:27:09,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=714440.0, ans=0.0 2023-11-19 11:27:09,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=714440.0, ans=0.0 2023-11-19 11:27:10,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=714440.0, ans=0.0 2023-11-19 11:27:11,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=714440.0, ans=0.125 2023-11-19 11:27:18,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=714506.6666666666, ans=0.1 2023-11-19 11:27:26,999 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11000, loss[loss=0.09469, simple_loss=0.1109, pruned_loss=0.0288, audio_tagging_loss=0.01047, over 15036.00 frames. ], tot_loss[loss=0.08717, simple_loss=0.1064, pruned_loss=0.02355, audio_tagging_loss=0.01044, over 3047090.23 frames. ], batch size: 55, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:27:36,056 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:27:36,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=714573.3333333334, ans=0.05 2023-11-19 11:27:48,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.47 vs. limit=15.0 2023-11-19 11:27:52,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=714706.6666666666, ans=0.025 2023-11-19 11:27:52,751 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=714706.6666666666, ans=0.0 2023-11-19 11:27:59,347 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.973e+01 8.544e+01 9.124e+01 1.001e+02 1.240e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 11:28:14,688 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=714840.0, ans=0.125 2023-11-19 11:28:18,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=714840.0, ans=0.1 2023-11-19 11:28:19,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=714840.0, ans=0.0 2023-11-19 11:28:22,460 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11050, loss[loss=0.0974, simple_loss=0.1141, pruned_loss=0.02889, audio_tagging_loss=0.01145, over 15381.00 frames. ], tot_loss[loss=0.08715, simple_loss=0.1061, pruned_loss=0.02354, audio_tagging_loss=0.01057, over 3050819.44 frames. ], batch size: 59, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:28:29,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=714906.6666666666, ans=0.015 2023-11-19 11:28:31,642 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=714906.6666666666, ans=0.0 2023-11-19 11:28:35,927 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=714973.3333333334, ans=0.125 2023-11-19 11:28:56,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=715106.6666666666, ans=0.125 2023-11-19 11:28:58,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.00 vs. limit=15.0 2023-11-19 11:29:01,216 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:29:12,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=715173.3333333334, ans=0.1 2023-11-19 11:29:17,198 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=12.0 2023-11-19 11:29:17,830 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11100, loss[loss=0.08355, simple_loss=0.09799, pruned_loss=0.02216, audio_tagging_loss=0.01239, over 15051.00 frames. ], tot_loss[loss=0.08668, simple_loss=0.1053, pruned_loss=0.02327, audio_tagging_loss=0.01074, over 3050961.97 frames. ], batch size: 56, lr: 7.52e-03, grad_scale: 16.0 2023-11-19 11:29:23,266 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2023-11-19 11:29:32,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=715306.6666666666, ans=0.125 2023-11-19 11:29:33,636 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=715306.6666666666, ans=0.125 2023-11-19 11:29:33,734 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=715306.6666666666, ans=0.0 2023-11-19 11:29:36,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=715306.6666666666, ans=0.125 2023-11-19 11:29:44,381 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=715373.3333333334, ans=0.035 2023-11-19 11:29:44,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=715373.3333333334, ans=0.125 2023-11-19 11:29:47,312 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.06 vs. limit=15.0 2023-11-19 11:29:50,614 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.681e+01 9.190e+01 1.002e+02 1.339e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 11:30:01,839 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=715506.6666666666, ans=0.0 2023-11-19 11:30:01,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=715506.6666666666, ans=0.125 2023-11-19 11:30:13,757 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11150, loss[loss=0.06006, simple_loss=0.07529, pruned_loss=0.01248, audio_tagging_loss=0.009942, over 15612.00 frames. ], tot_loss[loss=0.08656, simple_loss=0.1053, pruned_loss=0.02317, audio_tagging_loss=0.01077, over 3052067.22 frames. ], batch size: 59, lr: 7.52e-03, grad_scale: 16.0 2023-11-19 11:30:17,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=715573.3333333334, ans=0.04949747468305833 2023-11-19 11:30:19,703 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2023-11-19 11:30:21,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=715573.3333333334, ans=0.125 2023-11-19 11:30:44,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=715706.6666666666, ans=0.125 2023-11-19 11:30:46,180 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=715773.3333333334, ans=0.0 2023-11-19 11:30:51,387 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=715773.3333333334, ans=0.2 2023-11-19 11:30:53,111 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=715773.3333333334, ans=0.05 2023-11-19 11:31:02,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=715840.0, ans=0.125 2023-11-19 11:31:04,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=715840.0, ans=0.0 2023-11-19 11:31:07,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.50 vs. limit=6.0 2023-11-19 11:31:08,655 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11200, loss[loss=0.08699, simple_loss=0.1147, pruned_loss=0.02011, audio_tagging_loss=0.009507, over 15506.00 frames. ], tot_loss[loss=0.08692, simple_loss=0.1057, pruned_loss=0.02326, audio_tagging_loss=0.01082, over 3053070.99 frames. ], batch size: 57, lr: 7.52e-03, grad_scale: 32.0 2023-11-19 11:31:09,245 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2023-11-19 11:31:13,697 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=715906.6666666666, ans=0.125 2023-11-19 11:31:24,075 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2023-11-19 11:31:31,926 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=716040.0, ans=0.0 2023-11-19 11:31:33,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=716040.0, ans=0.0 2023-11-19 11:31:41,667 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.895e+01 8.422e+01 9.292e+01 1.018e+02 1.279e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-19 11:31:46,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=716106.6666666666, ans=0.125 2023-11-19 11:31:51,388 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=716106.6666666666, ans=0.125 2023-11-19 11:31:53,637 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=716173.3333333334, ans=0.2 2023-11-19 11:32:05,051 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11250, loss[loss=0.1051, simple_loss=0.1337, pruned_loss=0.0296, audio_tagging_loss=0.008631, over 16124.00 frames. ], tot_loss[loss=0.08691, simple_loss=0.1058, pruned_loss=0.0234, audio_tagging_loss=0.0106, over 3052571.19 frames. ], batch size: 57, lr: 7.52e-03, grad_scale: 32.0 2023-11-19 11:32:07,320 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=716240.0, ans=0.0 2023-11-19 11:32:19,452 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.62 vs. limit=22.5 2023-11-19 11:32:24,198 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=716306.6666666666, ans=0.0 2023-11-19 11:33:00,947 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11300, loss[loss=0.08047, simple_loss=0.0969, pruned_loss=0.0213, audio_tagging_loss=0.01071, over 14453.00 frames. ], tot_loss[loss=0.08645, simple_loss=0.105, pruned_loss=0.02334, audio_tagging_loss=0.01059, over 3049386.05 frames. ], batch size: 57, lr: 7.52e-03, grad_scale: 32.0 2023-11-19 11:33:18,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=716640.0, ans=0.125 2023-11-19 11:33:25,155 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2023-11-19 11:33:32,929 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.70 vs. limit=15.0 2023-11-19 11:33:33,218 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.698e+01 9.365e+01 1.016e+02 1.421e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-19 11:33:55,809 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11350, loss[loss=0.0946, simple_loss=0.1104, pruned_loss=0.02594, audio_tagging_loss=0.01348, over 14845.00 frames. ], tot_loss[loss=0.08672, simple_loss=0.1058, pruned_loss=0.02345, audio_tagging_loss=0.01036, over 3050829.14 frames. ], batch size: 57, lr: 7.51e-03, grad_scale: 32.0 2023-11-19 11:34:08,248 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=716973.3333333334, ans=0.1 2023-11-19 11:34:13,540 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.61 vs. limit=15.0 2023-11-19 11:34:26,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=717040.0, ans=0.5 2023-11-19 11:34:48,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=717173.3333333334, ans=0.1 2023-11-19 11:34:51,226 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11400, loss[loss=0.08949, simple_loss=0.101, pruned_loss=0.02872, audio_tagging_loss=0.01027, over 14815.00 frames. ], tot_loss[loss=0.08685, simple_loss=0.1058, pruned_loss=0.02367, audio_tagging_loss=0.01029, over 3041258.20 frames. ], batch size: 57, lr: 7.51e-03, grad_scale: 32.0 2023-11-19 11:34:59,052 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=717240.0, ans=0.125 2023-11-19 11:35:04,687 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.53 vs. limit=15.0 2023-11-19 11:35:17,329 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=717373.3333333334, ans=0.125 2023-11-19 11:35:24,353 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.257e+01 8.952e+01 9.927e+01 1.340e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 11:35:28,712 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2023-11-19 11:35:46,993 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11450, loss[loss=0.102, simple_loss=0.1265, pruned_loss=0.02911, audio_tagging_loss=0.009589, over 14122.00 frames. ], tot_loss[loss=0.08727, simple_loss=0.1064, pruned_loss=0.02381, audio_tagging_loss=0.01027, over 3038221.76 frames. ], batch size: 54, lr: 7.51e-03, grad_scale: 16.0 2023-11-19 11:35:59,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=717640.0, ans=0.125 2023-11-19 11:36:13,643 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:36:18,100 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2023-11-19 11:36:26,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=717773.3333333334, ans=0.09899494936611666 2023-11-19 11:36:41,943 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11500, loss[loss=0.07806, simple_loss=0.09872, pruned_loss=0.01939, audio_tagging_loss=0.009314, over 14880.00 frames. ], tot_loss[loss=0.08697, simple_loss=0.1059, pruned_loss=0.02381, audio_tagging_loss=0.0102, over 3038618.18 frames. ], batch size: 56, lr: 7.51e-03, grad_scale: 16.0 2023-11-19 11:37:15,740 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.484e+01 9.226e+01 9.872e+01 1.474e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 11:37:37,521 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11550, loss[loss=0.09785, simple_loss=0.1192, pruned_loss=0.03006, audio_tagging_loss=0.008194, over 15255.00 frames. ], tot_loss[loss=0.08698, simple_loss=0.106, pruned_loss=0.02377, audio_tagging_loss=0.01023, over 3043761.26 frames. ], batch size: 57, lr: 7.51e-03, grad_scale: 16.0 2023-11-19 11:37:39,921 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=718240.0, ans=0.0 2023-11-19 11:37:57,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=718306.6666666666, ans=0.2 2023-11-19 11:38:03,408 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=718373.3333333334, ans=0.125 2023-11-19 11:38:07,616 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:38:09,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=718440.0, ans=0.2 2023-11-19 11:38:10,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=718440.0, ans=0.015 2023-11-19 11:38:11,667 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:38:23,261 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.94 vs. limit=6.0 2023-11-19 11:38:24,281 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=12.0 2023-11-19 11:38:27,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=718506.6666666666, ans=0.07 2023-11-19 11:38:29,796 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=718506.6666666666, ans=0.125 2023-11-19 11:38:33,212 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11600, loss[loss=0.07623, simple_loss=0.08552, pruned_loss=0.02364, audio_tagging_loss=0.009828, over 14993.00 frames. ], tot_loss[loss=0.08678, simple_loss=0.1057, pruned_loss=0.02359, audio_tagging_loss=0.01033, over 3043692.74 frames. ], batch size: 58, lr: 7.51e-03, grad_scale: 32.0 2023-11-19 11:39:06,025 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.247e+01 8.169e+01 9.328e+01 1.011e+02 1.273e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-19 11:39:08,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=718773.3333333334, ans=0.125 2023-11-19 11:39:28,769 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11650, loss[loss=0.08598, simple_loss=0.1058, pruned_loss=0.02329, audio_tagging_loss=0.009804, over 15300.00 frames. ], tot_loss[loss=0.08707, simple_loss=0.1061, pruned_loss=0.0237, audio_tagging_loss=0.01034, over 3040755.28 frames. ], batch size: 57, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:39:32,323 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=718906.6666666666, ans=0.1 2023-11-19 11:39:36,514 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.878e-01 2023-11-19 11:39:38,159 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.587e-01 2023-11-19 11:39:42,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=718973.3333333334, ans=0.125 2023-11-19 11:39:54,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=719040.0, ans=0.0 2023-11-19 11:39:57,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=719040.0, ans=0.1 2023-11-19 11:39:57,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=12.0 2023-11-19 11:39:59,048 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=719040.0, ans=0.125 2023-11-19 11:40:04,918 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=719106.6666666666, ans=0.1 2023-11-19 11:40:24,341 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11700, loss[loss=0.08163, simple_loss=0.08518, pruned_loss=0.02425, audio_tagging_loss=0.01479, over 13821.00 frames. ], tot_loss[loss=0.0871, simple_loss=0.1062, pruned_loss=0.02368, audio_tagging_loss=0.01033, over 3043826.75 frames. ], batch size: 53, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:40:37,802 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=719306.6666666666, ans=10.0 2023-11-19 11:40:57,451 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.259e+01 8.836e+01 9.705e+01 1.355e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-19 11:40:57,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=719440.0, ans=0.125 2023-11-19 11:41:00,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=719440.0, ans=0.0 2023-11-19 11:41:01,420 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.13 vs. limit=22.5 2023-11-19 11:41:12,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.37 vs. limit=15.0 2023-11-19 11:41:19,873 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11750, loss[loss=0.08311, simple_loss=0.1038, pruned_loss=0.02347, audio_tagging_loss=0.007761, over 15437.00 frames. ], tot_loss[loss=0.08767, simple_loss=0.1069, pruned_loss=0.02392, audio_tagging_loss=0.0103, over 3040927.14 frames. ], batch size: 59, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:41:48,726 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=719706.6666666666, ans=0.09899494936611666 2023-11-19 11:42:01,803 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=719773.3333333334, ans=0.125 2023-11-19 11:42:08,706 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=719840.0, ans=0.1 2023-11-19 11:42:09,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=719840.0, ans=0.0 2023-11-19 11:42:14,819 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11800, loss[loss=0.06703, simple_loss=0.08109, pruned_loss=0.01607, audio_tagging_loss=0.01042, over 15288.00 frames. ], tot_loss[loss=0.087, simple_loss=0.1058, pruned_loss=0.02374, audio_tagging_loss=0.01035, over 3029140.12 frames. ], batch size: 58, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:42:22,314 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=719906.6666666666, ans=0.0 2023-11-19 11:42:28,119 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=719973.3333333334, ans=0.125 2023-11-19 11:42:47,948 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:42:48,184 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=12.0 2023-11-19 11:42:49,860 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.669e+01 9.441e+01 1.062e+02 1.406e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-19 11:42:59,612 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=15.0 2023-11-19 11:43:08,503 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=720173.3333333334, ans=0.1 2023-11-19 11:43:11,844 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11850, loss[loss=0.09485, simple_loss=0.1154, pruned_loss=0.02502, audio_tagging_loss=0.01213, over 16535.00 frames. ], tot_loss[loss=0.08707, simple_loss=0.1054, pruned_loss=0.02384, audio_tagging_loss=0.01052, over 3040186.10 frames. ], batch size: 62, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:43:20,980 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.90 vs. limit=15.0 2023-11-19 11:43:32,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=720373.3333333334, ans=10.0 2023-11-19 11:43:34,414 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.74 vs. limit=15.0 2023-11-19 11:43:35,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=720373.3333333334, ans=0.125 2023-11-19 11:43:41,363 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=720373.3333333334, ans=15.0 2023-11-19 11:43:47,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.66 vs. limit=15.0 2023-11-19 11:43:50,713 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=720440.0, ans=0.1 2023-11-19 11:44:01,689 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=720506.6666666666, ans=0.2 2023-11-19 11:44:01,862 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.40 vs. limit=10.0 2023-11-19 11:44:05,374 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=720506.6666666666, ans=0.2 2023-11-19 11:44:07,204 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11900, loss[loss=0.09069, simple_loss=0.1132, pruned_loss=0.0245, audio_tagging_loss=0.009608, over 14054.00 frames. ], tot_loss[loss=0.08742, simple_loss=0.1059, pruned_loss=0.02393, audio_tagging_loss=0.01055, over 3042746.01 frames. ], batch size: 54, lr: 7.50e-03, grad_scale: 16.0 2023-11-19 11:44:11,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=720573.3333333334, ans=0.125 2023-11-19 11:44:41,512 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.859e+01 8.394e+01 8.886e+01 9.944e+01 1.266e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-19 11:44:59,885 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=720840.0, ans=0.125 2023-11-19 11:45:03,434 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 11950, loss[loss=0.06476, simple_loss=0.06814, pruned_loss=0.01577, audio_tagging_loss=0.01492, over 14289.00 frames. ], tot_loss[loss=0.08746, simple_loss=0.106, pruned_loss=0.02382, audio_tagging_loss=0.01062, over 3040437.54 frames. ], batch size: 56, lr: 7.49e-03, grad_scale: 16.0 2023-11-19 11:45:04,667 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=720906.6666666666, ans=0.125 2023-11-19 11:45:07,162 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.79 vs. limit=22.5 2023-11-19 11:45:22,316 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=12.0 2023-11-19 11:45:35,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=721106.6666666666, ans=0.125 2023-11-19 11:45:52,954 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=721173.3333333334, ans=10.0 2023-11-19 11:45:56,801 INFO [train_asr.py:1115] (1/4) Epoch 9, batch 12000, loss[loss=0.1096, simple_loss=0.145, pruned_loss=0.02922, audio_tagging_loss=0.007852, over 14759.00 frames. ], tot_loss[loss=0.08769, simple_loss=0.1065, pruned_loss=0.02369, audio_tagging_loss=0.01075, over 3035610.72 frames. ], batch size: 54, lr: 7.49e-03, grad_scale: 32.0 2023-11-19 11:45:56,802 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 11:46:25,880 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3565, 3.7590, 2.5584, 3.6328], device='cuda:1') 2023-11-19 11:46:29,176 INFO [train_asr.py:1147] (1/4) Epoch 9, validation: loss=0.06606, simple_loss=0.05578, pruned_loss=0.006612, audio_tagging_loss=0.03155, over 4681554.00 frames. 2023-11-19 11:46:29,176 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 11:46:33,421 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=721240.0, ans=0.07 2023-11-19 11:46:39,752 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=15.0 2023-11-19 11:46:42,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=721306.6666666666, ans=0.125 2023-11-19 11:46:52,345 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=721373.3333333334, ans=0.125 2023-11-19 11:47:32,057 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 0, loss[loss=0.1042, simple_loss=0.1044, pruned_loss=0.02428, audio_tagging_loss=0.02778, over 16514.00 frames. ], tot_loss[loss=0.1042, simple_loss=0.1044, pruned_loss=0.02428, audio_tagging_loss=0.02778, over 16514.00 frames. ], batch size: 62, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:47:32,058 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 11:48:03,894 INFO [train_asr.py:1147] (1/4) Epoch 10, validation: loss=0.06458, simple_loss=0.05578, pruned_loss=0.006606, audio_tagging_loss=0.03009, over 4681554.00 frames. 2023-11-19 11:48:03,895 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 11:48:09,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=721400.0, ans=0.07 2023-11-19 11:48:11,163 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.461e+01 9.125e+01 9.697e+01 1.516e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 11:48:18,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=721466.6666666666, ans=0.0 2023-11-19 11:48:21,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-19 11:48:57,922 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=721666.6666666666, ans=0.07 2023-11-19 11:48:59,735 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 50, loss[loss=0.08756, simple_loss=0.08993, pruned_loss=0.02093, audio_tagging_loss=0.02167, over 14808.00 frames. ], tot_loss[loss=0.09981, simple_loss=0.1096, pruned_loss=0.02457, audio_tagging_loss=0.02045, over 686944.35 frames. ], batch size: 57, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:49:23,222 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=721866.6666666666, ans=0.0 2023-11-19 11:49:25,361 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:49:26,998 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=721866.6666666666, ans=0.125 2023-11-19 11:49:37,517 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=721933.3333333334, ans=0.125 2023-11-19 11:49:55,442 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 100, loss[loss=0.1024, simple_loss=0.1143, pruned_loss=0.02537, audio_tagging_loss=0.01985, over 14528.00 frames. ], tot_loss[loss=0.09693, simple_loss=0.1059, pruned_loss=0.02403, audio_tagging_loss=0.01993, over 1215781.95 frames. ], batch size: 54, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:49:55,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=722066.6666666666, ans=0.125 2023-11-19 11:50:03,451 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.830e+01 9.521e+01 1.052e+02 1.360e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 11:50:22,463 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=722200.0, ans=0.2 2023-11-19 11:50:25,247 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=722200.0, ans=0.125 2023-11-19 11:50:38,505 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=722266.6666666666, ans=0.04949747468305833 2023-11-19 11:50:51,946 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 150, loss[loss=0.1033, simple_loss=0.1199, pruned_loss=0.02816, audio_tagging_loss=0.0152, over 17017.00 frames. ], tot_loss[loss=0.0946, simple_loss=0.1064, pruned_loss=0.02385, audio_tagging_loss=0.01756, over 1623415.95 frames. ], batch size: 62, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:51:05,378 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=722466.6666666666, ans=0.0 2023-11-19 11:51:10,570 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2023-11-19 11:51:34,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=722600.0, ans=0.0 2023-11-19 11:51:36,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=722666.6666666666, ans=0.125 2023-11-19 11:51:45,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=722666.6666666666, ans=0.125 2023-11-19 11:51:47,944 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 200, loss[loss=0.1023, simple_loss=0.1247, pruned_loss=0.02843, audio_tagging_loss=0.01154, over 14823.00 frames. ], tot_loss[loss=0.09298, simple_loss=0.1073, pruned_loss=0.02397, audio_tagging_loss=0.01538, over 1940399.90 frames. ], batch size: 54, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:51:50,279 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=722733.3333333334, ans=0.0 2023-11-19 11:51:56,596 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.422e+01 8.351e+01 9.298e+01 1.028e+02 1.327e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 11:52:00,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=722800.0, ans=0.125 2023-11-19 11:52:01,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=722800.0, ans=0.0 2023-11-19 11:52:04,487 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=722800.0, ans=0.04949747468305833 2023-11-19 11:52:12,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=722866.6666666666, ans=0.0 2023-11-19 11:52:21,499 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=722933.3333333334, ans=0.125 2023-11-19 11:52:31,976 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=723000.0, ans=0.125 2023-11-19 11:52:33,132 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=723000.0, ans=0.125 2023-11-19 11:52:41,873 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2023-11-19 11:52:44,661 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 250, loss[loss=0.0927, simple_loss=0.1117, pruned_loss=0.02663, audio_tagging_loss=0.01024, over 15099.00 frames. ], tot_loss[loss=0.09246, simple_loss=0.109, pruned_loss=0.02423, audio_tagging_loss=0.01374, over 2185201.65 frames. ], batch size: 58, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:52:49,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=723066.6666666666, ans=0.125 2023-11-19 11:52:52,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=723066.6666666666, ans=0.0 2023-11-19 11:53:01,073 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2023-11-19 11:53:03,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=723133.3333333334, ans=0.125 2023-11-19 11:53:08,186 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=723200.0, ans=0.125 2023-11-19 11:53:16,789 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=723266.6666666666, ans=0.125 2023-11-19 11:53:40,158 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 300, loss[loss=0.09499, simple_loss=0.124, pruned_loss=0.02201, audio_tagging_loss=0.01096, over 15292.00 frames. ], tot_loss[loss=0.09101, simple_loss=0.1086, pruned_loss=0.02406, audio_tagging_loss=0.01267, over 2370340.15 frames. ], batch size: 54, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:53:45,148 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=723400.0, ans=0.125 2023-11-19 11:53:48,086 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.717e+01 8.399e+01 9.137e+01 9.924e+01 1.644e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 11:53:51,550 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=723466.6666666666, ans=0.125 2023-11-19 11:53:55,030 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2023-11-19 11:54:02,776 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=723533.3333333334, ans=0.125 2023-11-19 11:54:03,835 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=723533.3333333334, ans=0.125 2023-11-19 11:54:07,882 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=15.0 2023-11-19 11:54:09,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=723533.3333333334, ans=0.09899494936611666 2023-11-19 11:54:22,886 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=723600.0, ans=0.125 2023-11-19 11:54:24,136 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=723666.6666666666, ans=0.0 2023-11-19 11:54:32,975 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=723666.6666666666, ans=0.125 2023-11-19 11:54:36,132 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 350, loss[loss=0.09314, simple_loss=0.1164, pruned_loss=0.02745, audio_tagging_loss=0.007503, over 16527.00 frames. ], tot_loss[loss=0.08974, simple_loss=0.1078, pruned_loss=0.02378, audio_tagging_loss=0.01206, over 2517279.15 frames. ], batch size: 63, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:54:54,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=723800.0, ans=0.1 2023-11-19 11:55:23,381 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.87 vs. limit=10.0 2023-11-19 11:55:26,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=724000.0, ans=0.2 2023-11-19 11:55:32,495 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 400, loss[loss=0.08746, simple_loss=0.1081, pruned_loss=0.02347, audio_tagging_loss=0.009961, over 14511.00 frames. ], tot_loss[loss=0.08934, simple_loss=0.1079, pruned_loss=0.02387, audio_tagging_loss=0.01154, over 2634950.31 frames. ], batch size: 54, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:55:32,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=724066.6666666666, ans=0.125 2023-11-19 11:55:32,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=724066.6666666666, ans=0.2 2023-11-19 11:55:36,892 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=724066.6666666666, ans=0.0 2023-11-19 11:55:36,974 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=724066.6666666666, ans=0.07 2023-11-19 11:55:39,907 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.670e+01 8.520e+01 9.395e+01 1.002e+02 1.359e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 11:55:41,270 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:55:43,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=724133.3333333334, ans=0.07 2023-11-19 11:55:45,225 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.66 vs. limit=10.0 2023-11-19 11:56:02,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=724200.0, ans=0.125 2023-11-19 11:56:14,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=724266.6666666666, ans=0.0 2023-11-19 11:56:19,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=724333.3333333334, ans=0.0 2023-11-19 11:56:28,209 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 450, loss[loss=0.09412, simple_loss=0.1135, pruned_loss=0.02492, audio_tagging_loss=0.01247, over 15165.00 frames. ], tot_loss[loss=0.0889, simple_loss=0.1075, pruned_loss=0.02381, audio_tagging_loss=0.01134, over 2725453.96 frames. ], batch size: 57, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:56:49,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=724533.3333333334, ans=10.0 2023-11-19 11:56:51,018 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.06 vs. limit=15.0 2023-11-19 11:56:58,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=724533.3333333334, ans=0.0 2023-11-19 11:57:23,928 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 500, loss[loss=0.08616, simple_loss=0.1, pruned_loss=0.02569, audio_tagging_loss=0.01047, over 15242.00 frames. ], tot_loss[loss=0.08821, simple_loss=0.1068, pruned_loss=0.02368, audio_tagging_loss=0.01111, over 2797031.34 frames. ], batch size: 57, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:57:24,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=724733.3333333334, ans=0.125 2023-11-19 11:57:31,349 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.489e+01 8.496e+01 9.010e+01 1.030e+02 1.418e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 11:57:39,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=724800.0, ans=0.125 2023-11-19 11:57:42,827 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2023-11-19 11:57:48,341 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=724866.6666666666, ans=0.0 2023-11-19 11:57:54,200 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=724866.6666666666, ans=0.2 2023-11-19 11:57:55,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=724866.6666666666, ans=0.2 2023-11-19 11:58:06,614 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=724933.3333333334, ans=0.1 2023-11-19 11:58:06,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=724933.3333333334, ans=0.04949747468305833 2023-11-19 11:58:07,682 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=725000.0, ans=10.0 2023-11-19 11:58:18,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=725066.6666666666, ans=0.04949747468305833 2023-11-19 11:58:19,508 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 550, loss[loss=0.08455, simple_loss=0.09851, pruned_loss=0.02082, audio_tagging_loss=0.01447, over 15536.00 frames. ], tot_loss[loss=0.08718, simple_loss=0.1057, pruned_loss=0.02335, audio_tagging_loss=0.01098, over 2862908.91 frames. ], batch size: 57, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:58:28,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=725066.6666666666, ans=0.1 2023-11-19 11:58:35,748 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=725133.3333333334, ans=0.125 2023-11-19 11:58:43,355 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.21 vs. limit=10.0 2023-11-19 11:58:52,705 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2023-11-19 11:59:04,551 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=725333.3333333334, ans=0.0 2023-11-19 11:59:05,806 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.74 vs. limit=15.0 2023-11-19 11:59:07,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=725333.3333333334, ans=0.2 2023-11-19 11:59:15,991 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 600, loss[loss=0.07739, simple_loss=0.09686, pruned_loss=0.01767, audio_tagging_loss=0.0113, over 13933.00 frames. ], tot_loss[loss=0.08733, simple_loss=0.1058, pruned_loss=0.02345, audio_tagging_loss=0.01099, over 2909729.91 frames. ], batch size: 56, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 11:59:19,646 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2023-11-19 11:59:23,903 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.497e+01 8.301e+01 8.690e+01 9.584e+01 1.385e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-19 12:00:00,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=725666.6666666666, ans=0.0 2023-11-19 12:00:11,789 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 650, loss[loss=0.09541, simple_loss=0.1227, pruned_loss=0.02864, audio_tagging_loss=0.00544, over 15971.00 frames. ], tot_loss[loss=0.08725, simple_loss=0.1061, pruned_loss=0.02344, audio_tagging_loss=0.01077, over 2942856.01 frames. ], batch size: 58, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:00:21,527 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=725800.0, ans=0.125 2023-11-19 12:00:23,297 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2023-11-19 12:00:25,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=725800.0, ans=0.125 2023-11-19 12:00:30,498 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=725800.0, ans=0.0 2023-11-19 12:00:46,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2023-11-19 12:00:51,320 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2023-11-19 12:00:54,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=725933.3333333334, ans=0.2 2023-11-19 12:01:06,897 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 700, loss[loss=0.08468, simple_loss=0.1058, pruned_loss=0.02354, audio_tagging_loss=0.008249, over 16047.00 frames. ], tot_loss[loss=0.08667, simple_loss=0.1054, pruned_loss=0.0232, audio_tagging_loss=0.01078, over 2958976.43 frames. ], batch size: 62, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:01:09,325 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=726066.6666666666, ans=0.2 2023-11-19 12:01:09,346 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=726066.6666666666, ans=0.125 2023-11-19 12:01:09,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.81 vs. limit=10.0 2023-11-19 12:01:14,250 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.015e+01 8.278e+01 8.967e+01 9.847e+01 1.279e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 12:01:24,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2023-11-19 12:01:41,618 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=726266.6666666666, ans=0.0 2023-11-19 12:01:43,666 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=726266.6666666666, ans=0.0 2023-11-19 12:02:02,256 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 750, loss[loss=0.09738, simple_loss=0.1269, pruned_loss=0.02606, audio_tagging_loss=0.007859, over 14450.00 frames. ], tot_loss[loss=0.08605, simple_loss=0.1047, pruned_loss=0.0229, audio_tagging_loss=0.01081, over 2979861.12 frames. ], batch size: 55, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:02:12,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=726400.0, ans=0.125 2023-11-19 12:02:14,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=726466.6666666666, ans=0.125 2023-11-19 12:02:28,905 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2023-11-19 12:02:40,863 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=726600.0, ans=0.125 2023-11-19 12:02:59,397 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 800, loss[loss=0.07289, simple_loss=0.08379, pruned_loss=0.01962, audio_tagging_loss=0.01137, over 14367.00 frames. ], tot_loss[loss=0.08678, simple_loss=0.1057, pruned_loss=0.02322, audio_tagging_loss=0.01073, over 2998349.58 frames. ], batch size: 56, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:03:06,736 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.428e+01 9.274e+01 1.007e+02 1.434e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-19 12:03:08,000 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=726733.3333333334, ans=0.0 2023-11-19 12:03:17,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=726800.0, ans=0.0 2023-11-19 12:03:19,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=726866.6666666666, ans=0.0 2023-11-19 12:03:22,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=726866.6666666666, ans=0.0 2023-11-19 12:03:28,986 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=726866.6666666666, ans=0.1 2023-11-19 12:03:46,307 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2023-11-19 12:03:54,770 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 850, loss[loss=0.0961, simple_loss=0.1196, pruned_loss=0.02669, audio_tagging_loss=0.009627, over 15758.00 frames. ], tot_loss[loss=0.08683, simple_loss=0.1057, pruned_loss=0.02326, audio_tagging_loss=0.01073, over 3005072.34 frames. ], batch size: 57, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:03:56,025 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=727066.6666666666, ans=0.125 2023-11-19 12:04:12,384 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=727133.3333333334, ans=0.2 2023-11-19 12:04:18,484 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=727200.0, ans=0.1 2023-11-19 12:04:24,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=727200.0, ans=0.1 2023-11-19 12:04:27,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=727266.6666666666, ans=0.1 2023-11-19 12:04:50,018 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 900, loss[loss=0.07652, simple_loss=0.08242, pruned_loss=0.02298, audio_tagging_loss=0.01233, over 14426.00 frames. ], tot_loss[loss=0.08625, simple_loss=0.1047, pruned_loss=0.0231, audio_tagging_loss=0.01082, over 3013491.59 frames. ], batch size: 55, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:04:56,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=727400.0, ans=0.125 2023-11-19 12:04:57,845 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.263e+01 8.793e+01 9.779e+01 1.235e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-19 12:05:00,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=727466.6666666666, ans=0.07 2023-11-19 12:05:19,685 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=727533.3333333334, ans=0.125 2023-11-19 12:05:21,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=727533.3333333334, ans=0.125 2023-11-19 12:05:31,532 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=727600.0, ans=0.125 2023-11-19 12:05:46,700 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 950, loss[loss=0.09057, simple_loss=0.1059, pruned_loss=0.0232, audio_tagging_loss=0.0144, over 14933.00 frames. ], tot_loss[loss=0.08741, simple_loss=0.1065, pruned_loss=0.02344, audio_tagging_loss=0.0107, over 3026117.67 frames. ], batch size: 54, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:05:46,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=727733.3333333334, ans=0.125 2023-11-19 12:06:04,530 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2023-11-19 12:06:27,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=727933.3333333334, ans=0.125 2023-11-19 12:06:28,705 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=727933.3333333334, ans=0.07 2023-11-19 12:06:42,010 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1000, loss[loss=0.09426, simple_loss=0.1152, pruned_loss=0.02577, audio_tagging_loss=0.01087, over 16021.00 frames. ], tot_loss[loss=0.08637, simple_loss=0.1054, pruned_loss=0.02311, audio_tagging_loss=0.01057, over 3031284.63 frames. ], batch size: 63, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:06:43,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=728066.6666666666, ans=0.125 2023-11-19 12:06:44,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=728066.6666666666, ans=0.125 2023-11-19 12:06:49,834 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.784e+01 8.258e+01 8.941e+01 9.779e+01 1.255e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-19 12:07:02,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=728133.3333333334, ans=0.0 2023-11-19 12:07:02,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=728133.3333333334, ans=0.125 2023-11-19 12:07:04,335 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=728200.0, ans=0.025 2023-11-19 12:07:05,170 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:07:15,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=728266.6666666666, ans=0.125 2023-11-19 12:07:37,678 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1050, loss[loss=0.07536, simple_loss=0.0956, pruned_loss=0.0197, audio_tagging_loss=0.007862, over 14593.00 frames. ], tot_loss[loss=0.08571, simple_loss=0.1046, pruned_loss=0.02296, audio_tagging_loss=0.01043, over 3032980.99 frames. ], batch size: 55, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:07:45,828 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=728400.0, ans=0.1 2023-11-19 12:07:53,934 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=728466.6666666666, ans=0.0 2023-11-19 12:08:04,344 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.12 vs. limit=22.5 2023-11-19 12:08:29,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=728666.6666666666, ans=0.0 2023-11-19 12:08:34,194 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1100, loss[loss=0.08229, simple_loss=0.1015, pruned_loss=0.02104, audio_tagging_loss=0.01048, over 14216.00 frames. ], tot_loss[loss=0.08487, simple_loss=0.1036, pruned_loss=0.02263, audio_tagging_loss=0.01042, over 3028123.65 frames. ], batch size: 52, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:08:36,369 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:08:39,708 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=728733.3333333334, ans=0.2 2023-11-19 12:08:42,199 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.136e+01 8.991e+01 9.834e+01 1.618e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 12:08:48,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=728800.0, ans=0.125 2023-11-19 12:08:56,063 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.06 vs. limit=22.5 2023-11-19 12:09:22,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=729000.0, ans=0.125 2023-11-19 12:09:30,429 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1150, loss[loss=0.1029, simple_loss=0.1341, pruned_loss=0.02706, audio_tagging_loss=0.008785, over 15641.00 frames. ], tot_loss[loss=0.08574, simple_loss=0.1045, pruned_loss=0.02306, audio_tagging_loss=0.01044, over 3031686.17 frames. ], batch size: 55, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:10:02,861 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=729266.6666666666, ans=0.0 2023-11-19 12:10:23,713 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.57 vs. limit=6.0 2023-11-19 12:10:26,229 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1200, loss[loss=0.1003, simple_loss=0.1269, pruned_loss=0.02474, audio_tagging_loss=0.01214, over 16085.00 frames. ], tot_loss[loss=0.08591, simple_loss=0.1048, pruned_loss=0.02318, audio_tagging_loss=0.01032, over 3037125.87 frames. ], batch size: 59, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 12:10:26,368 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=729400.0, ans=0.1 2023-11-19 12:10:35,185 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.399e+01 9.041e+01 1.012e+02 1.294e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 12:10:49,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=729533.3333333334, ans=0.0 2023-11-19 12:10:56,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2023-11-19 12:11:00,663 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2023-11-19 12:11:05,585 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=729600.0, ans=0.125 2023-11-19 12:11:07,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=729600.0, ans=0.0 2023-11-19 12:11:21,317 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.44 vs. limit=22.5 2023-11-19 12:11:21,630 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1250, loss[loss=0.07373, simple_loss=0.09585, pruned_loss=0.01631, audio_tagging_loss=0.009485, over 14870.00 frames. ], tot_loss[loss=0.08588, simple_loss=0.1049, pruned_loss=0.02313, audio_tagging_loss=0.0103, over 3032234.38 frames. ], batch size: 56, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 12:11:22,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=729733.3333333334, ans=0.0 2023-11-19 12:11:24,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=729733.3333333334, ans=0.0 2023-11-19 12:11:24,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=729733.3333333334, ans=0.1 2023-11-19 12:11:31,978 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:11:54,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=729933.3333333334, ans=0.0 2023-11-19 12:11:54,439 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.62 vs. limit=6.0 2023-11-19 12:12:03,488 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.70 vs. limit=8.0 2023-11-19 12:12:07,633 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=730000.0, ans=0.125 2023-11-19 12:12:17,089 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1300, loss[loss=0.07563, simple_loss=0.09005, pruned_loss=0.01899, audio_tagging_loss=0.01161, over 15642.00 frames. ], tot_loss[loss=0.08581, simple_loss=0.1047, pruned_loss=0.02304, audio_tagging_loss=0.01039, over 3034417.41 frames. ], batch size: 58, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:12:22,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=730066.6666666666, ans=0.2 2023-11-19 12:12:27,633 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.101e+01 8.789e+01 9.869e+01 1.258e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-19 12:12:46,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=730200.0, ans=0.0 2023-11-19 12:13:07,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=730333.3333333334, ans=0.125 2023-11-19 12:13:13,112 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1350, loss[loss=0.08532, simple_loss=0.1102, pruned_loss=0.02315, audio_tagging_loss=0.0071, over 16400.00 frames. ], tot_loss[loss=0.08627, simple_loss=0.1051, pruned_loss=0.02332, audio_tagging_loss=0.01038, over 3046604.34 frames. ], batch size: 58, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:13:27,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=730466.6666666666, ans=15.0 2023-11-19 12:13:36,686 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=730533.3333333334, ans=0.0 2023-11-19 12:13:43,941 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=730533.3333333334, ans=0.1 2023-11-19 12:13:49,898 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=730600.0, ans=0.0 2023-11-19 12:13:52,824 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:13:57,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=730666.6666666666, ans=0.2 2023-11-19 12:14:00,823 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=730666.6666666666, ans=0.0 2023-11-19 12:14:04,395 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2023-11-19 12:14:05,912 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.52 vs. limit=15.0 2023-11-19 12:14:08,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=730733.3333333334, ans=15.0 2023-11-19 12:14:08,539 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1400, loss[loss=0.08241, simple_loss=0.1051, pruned_loss=0.01856, audio_tagging_loss=0.0113, over 14272.00 frames. ], tot_loss[loss=0.08636, simple_loss=0.1049, pruned_loss=0.02347, audio_tagging_loss=0.01042, over 3044769.67 frames. ], batch size: 55, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:14:18,522 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.609e+01 8.095e+01 8.801e+01 9.622e+01 1.373e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-19 12:14:19,028 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2023-11-19 12:14:36,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=730866.6666666666, ans=0.125 2023-11-19 12:14:58,978 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=731000.0, ans=0.0 2023-11-19 12:15:04,106 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1450, loss[loss=0.09767, simple_loss=0.1181, pruned_loss=0.02653, audio_tagging_loss=0.0121, over 15245.00 frames. ], tot_loss[loss=0.08626, simple_loss=0.1051, pruned_loss=0.02337, audio_tagging_loss=0.01036, over 3044838.21 frames. ], batch size: 58, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:15:17,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=731133.3333333334, ans=0.2 2023-11-19 12:15:18,905 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=731133.3333333334, ans=0.0 2023-11-19 12:15:47,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=731333.3333333334, ans=0.125 2023-11-19 12:15:48,721 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=731333.3333333334, ans=0.0 2023-11-19 12:16:00,077 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1500, loss[loss=0.105, simple_loss=0.1282, pruned_loss=0.03258, audio_tagging_loss=0.008302, over 14814.00 frames. ], tot_loss[loss=0.08684, simple_loss=0.1058, pruned_loss=0.02358, audio_tagging_loss=0.01038, over 3046956.17 frames. ], batch size: 53, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:16:07,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=731400.0, ans=0.0 2023-11-19 12:16:09,649 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.686e+01 9.376e+01 1.030e+02 1.552e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-19 12:16:09,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=731466.6666666666, ans=0.125 2023-11-19 12:16:12,597 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=731466.6666666666, ans=0.125 2023-11-19 12:16:16,677 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=731466.6666666666, ans=0.2 2023-11-19 12:16:25,887 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=731533.3333333334, ans=0.125 2023-11-19 12:16:27,336 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=731533.3333333334, ans=15.0 2023-11-19 12:16:30,193 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.90 vs. limit=10.0 2023-11-19 12:16:31,609 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.21 vs. limit=15.0 2023-11-19 12:16:41,275 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=731600.0, ans=0.125 2023-11-19 12:16:46,049 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=731666.6666666666, ans=0.125 2023-11-19 12:16:55,767 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1550, loss[loss=0.07944, simple_loss=0.09076, pruned_loss=0.02408, audio_tagging_loss=0.009983, over 16303.00 frames. ], tot_loss[loss=0.08746, simple_loss=0.1064, pruned_loss=0.02374, audio_tagging_loss=0.0105, over 3051456.57 frames. ], batch size: 64, lr: 7.07e-03, grad_scale: 16.0 2023-11-19 12:17:23,546 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=731866.6666666666, ans=0.0 2023-11-19 12:17:37,397 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=731933.3333333334, ans=0.1 2023-11-19 12:17:39,482 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=732000.0, ans=0.0 2023-11-19 12:17:46,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=732000.0, ans=0.2 2023-11-19 12:17:51,783 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1600, loss[loss=0.09439, simple_loss=0.1157, pruned_loss=0.02393, audio_tagging_loss=0.01262, over 15799.00 frames. ], tot_loss[loss=0.08767, simple_loss=0.1069, pruned_loss=0.02368, audio_tagging_loss=0.01052, over 3052380.54 frames. ], batch size: 58, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:17:57,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=732066.6666666666, ans=0.125 2023-11-19 12:17:57,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=732066.6666666666, ans=0.0 2023-11-19 12:18:01,803 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.642e+01 8.544e+01 9.122e+01 1.002e+02 1.471e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 12:18:13,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=15.41 vs. limit=15.0 2023-11-19 12:18:18,440 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=732200.0, ans=0.1 2023-11-19 12:18:20,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=732200.0, ans=0.125 2023-11-19 12:18:30,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=732266.6666666666, ans=0.5 2023-11-19 12:18:38,301 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=732333.3333333334, ans=0.05 2023-11-19 12:18:47,117 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1650, loss[loss=0.1136, simple_loss=0.1391, pruned_loss=0.03276, audio_tagging_loss=0.01129, over 15736.00 frames. ], tot_loss[loss=0.08764, simple_loss=0.1071, pruned_loss=0.02355, audio_tagging_loss=0.01054, over 3056862.23 frames. ], batch size: 57, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:18:49,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=732400.0, ans=0.125 2023-11-19 12:18:51,559 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=732400.0, ans=0.125 2023-11-19 12:19:07,012 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.14 vs. limit=22.5 2023-11-19 12:19:09,980 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=732533.3333333334, ans=0.0 2023-11-19 12:19:21,654 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=732600.0, ans=0.0 2023-11-19 12:19:22,753 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=732600.0, ans=0.1 2023-11-19 12:19:32,816 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=732666.6666666666, ans=0.125 2023-11-19 12:19:42,565 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1700, loss[loss=0.07953, simple_loss=0.1005, pruned_loss=0.01916, audio_tagging_loss=0.01012, over 15314.00 frames. ], tot_loss[loss=0.0872, simple_loss=0.1062, pruned_loss=0.02342, audio_tagging_loss=0.01069, over 3052449.78 frames. ], batch size: 58, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:19:53,003 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.193e+01 8.787e+01 9.627e+01 1.247e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-19 12:20:01,053 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=15.0 2023-11-19 12:20:17,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=732933.3333333334, ans=0.125 2023-11-19 12:20:19,243 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=732933.3333333334, ans=0.125 2023-11-19 12:20:38,897 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1750, loss[loss=0.05461, simple_loss=0.06048, pruned_loss=0.01114, audio_tagging_loss=0.01324, over 14796.00 frames. ], tot_loss[loss=0.08772, simple_loss=0.1069, pruned_loss=0.02368, audio_tagging_loss=0.0106, over 3055279.89 frames. ], batch size: 58, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:20:39,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=733066.6666666666, ans=0.125 2023-11-19 12:20:41,826 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.88 vs. limit=10.0 2023-11-19 12:20:51,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=733133.3333333334, ans=0.0 2023-11-19 12:21:03,144 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-19 12:21:34,739 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1800, loss[loss=0.08767, simple_loss=0.1085, pruned_loss=0.02414, audio_tagging_loss=0.009273, over 15074.00 frames. ], tot_loss[loss=0.08642, simple_loss=0.1056, pruned_loss=0.02311, audio_tagging_loss=0.01053, over 3056290.48 frames. ], batch size: 55, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:21:34,940 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=733400.0, ans=0.0 2023-11-19 12:21:44,107 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.335e+01 9.001e+01 1.003e+02 1.279e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 12:22:08,833 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.74 vs. limit=22.5 2023-11-19 12:22:15,397 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.72 vs. limit=10.0 2023-11-19 12:22:25,739 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:22:29,666 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1850, loss[loss=0.06677, simple_loss=0.08702, pruned_loss=0.01493, audio_tagging_loss=0.008333, over 14586.00 frames. ], tot_loss[loss=0.08596, simple_loss=0.1049, pruned_loss=0.023, audio_tagging_loss=0.01053, over 3055894.77 frames. ], batch size: 58, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:22:44,237 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=733800.0, ans=0.1 2023-11-19 12:22:52,741 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=733866.6666666666, ans=0.125 2023-11-19 12:23:06,856 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=733933.3333333334, ans=0.1 2023-11-19 12:23:13,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=734000.0, ans=0.02 2023-11-19 12:23:16,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=734000.0, ans=0.0 2023-11-19 12:23:16,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=734000.0, ans=0.125 2023-11-19 12:23:26,246 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1900, loss[loss=0.07239, simple_loss=0.0839, pruned_loss=0.01705, audio_tagging_loss=0.0134, over 14956.00 frames. ], tot_loss[loss=0.08648, simple_loss=0.1058, pruned_loss=0.02318, audio_tagging_loss=0.0104, over 3062295.15 frames. ], batch size: 58, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:23:36,334 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.533e+01 9.158e+01 9.922e+01 1.269e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 12:23:38,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=734133.3333333334, ans=0.0 2023-11-19 12:23:40,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734133.3333333334, ans=0.1 2023-11-19 12:23:50,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=734200.0, ans=0.1 2023-11-19 12:23:54,521 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=734200.0, ans=0.0 2023-11-19 12:23:57,370 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2023-11-19 12:24:10,740 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:24:21,912 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 1950, loss[loss=0.07779, simple_loss=0.08991, pruned_loss=0.02194, audio_tagging_loss=0.01089, over 14679.00 frames. ], tot_loss[loss=0.08624, simple_loss=0.1057, pruned_loss=0.02313, audio_tagging_loss=0.01028, over 3054491.65 frames. ], batch size: 56, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:24:22,178 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=734400.0, ans=0.125 2023-11-19 12:24:23,478 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2023-11-19 12:24:26,591 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.34 vs. limit=15.0 2023-11-19 12:24:29,729 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=22.5 2023-11-19 12:24:38,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.67 vs. limit=22.5 2023-11-19 12:24:46,215 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=734533.3333333334, ans=0.125 2023-11-19 12:24:50,463 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:24:53,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734533.3333333334, ans=0.1 2023-11-19 12:24:55,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=734600.0, ans=0.0 2023-11-19 12:25:03,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=734600.0, ans=0.0 2023-11-19 12:25:08,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=734666.6666666666, ans=0.125 2023-11-19 12:25:13,495 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=734666.6666666666, ans=0.125 2023-11-19 12:25:15,472 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=734666.6666666666, ans=0.1 2023-11-19 12:25:17,478 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2000, loss[loss=0.08809, simple_loss=0.1077, pruned_loss=0.02395, audio_tagging_loss=0.01026, over 14999.00 frames. ], tot_loss[loss=0.08552, simple_loss=0.1045, pruned_loss=0.02287, audio_tagging_loss=0.01039, over 3053104.90 frames. ], batch size: 57, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:25:20,165 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=12.0 2023-11-19 12:25:28,540 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.084e+01 8.826e+01 9.531e+01 1.443e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-19 12:25:40,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=734866.6666666666, ans=0.125 2023-11-19 12:25:45,210 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=734866.6666666666, ans=0.0 2023-11-19 12:25:52,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=734933.3333333334, ans=0.04949747468305833 2023-11-19 12:26:07,234 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-11-19 12:26:13,999 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2050, loss[loss=0.07228, simple_loss=0.09316, pruned_loss=0.01797, audio_tagging_loss=0.007732, over 14766.00 frames. ], tot_loss[loss=0.08551, simple_loss=0.1047, pruned_loss=0.02286, audio_tagging_loss=0.01031, over 3051211.33 frames. ], batch size: 55, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:26:21,782 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2023-11-19 12:26:40,949 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=735200.0, ans=0.125 2023-11-19 12:26:57,563 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=735333.3333333334, ans=0.125 2023-11-19 12:27:00,568 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=735333.3333333334, ans=0.125 2023-11-19 12:27:09,244 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2100, loss[loss=0.07945, simple_loss=0.09788, pruned_loss=0.01846, audio_tagging_loss=0.01205, over 15589.00 frames. ], tot_loss[loss=0.08572, simple_loss=0.1051, pruned_loss=0.02291, audio_tagging_loss=0.01023, over 3055561.65 frames. ], batch size: 59, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:27:18,797 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.692e+01 8.614e+01 9.149e+01 1.029e+02 1.234e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 12:27:29,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=735466.6666666666, ans=0.125 2023-11-19 12:27:29,112 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=735466.6666666666, ans=0.0 2023-11-19 12:27:41,772 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735600.0, ans=0.1 2023-11-19 12:27:53,290 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.58 vs. limit=22.5 2023-11-19 12:27:59,424 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2023-11-19 12:28:04,278 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2150, loss[loss=0.08356, simple_loss=0.1149, pruned_loss=0.0187, audio_tagging_loss=0.007411, over 14347.00 frames. ], tot_loss[loss=0.08612, simple_loss=0.1057, pruned_loss=0.02303, audio_tagging_loss=0.01023, over 3054314.16 frames. ], batch size: 53, lr: 7.05e-03, grad_scale: 32.0 2023-11-19 12:28:11,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=735733.3333333334, ans=0.125 2023-11-19 12:28:14,009 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2023-11-19 12:28:17,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.65 vs. limit=6.0 2023-11-19 12:28:27,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735866.6666666666, ans=0.1 2023-11-19 12:28:31,653 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735866.6666666666, ans=0.1 2023-11-19 12:28:38,274 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:28:39,526 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=735933.3333333334, ans=0.0 2023-11-19 12:28:41,557 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:28:44,819 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=735933.3333333334, ans=0.0 2023-11-19 12:28:56,022 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=736000.0, ans=0.125 2023-11-19 12:28:56,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=736000.0, ans=0.125 2023-11-19 12:29:00,632 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2200, loss[loss=0.1114, simple_loss=0.1379, pruned_loss=0.03275, audio_tagging_loss=0.009719, over 14954.00 frames. ], tot_loss[loss=0.08623, simple_loss=0.1059, pruned_loss=0.02309, audio_tagging_loss=0.01021, over 3050707.35 frames. ], batch size: 54, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 12:29:06,893 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=736066.6666666666, ans=0.09899494936611666 2023-11-19 12:29:11,898 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.518e+01 8.663e+01 9.454e+01 1.034e+02 1.518e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-19 12:29:14,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=736133.3333333334, ans=0.125 2023-11-19 12:29:29,496 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=736200.0, ans=0.125 2023-11-19 12:29:29,567 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.671e-01 2023-11-19 12:29:40,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=736266.6666666666, ans=0.2 2023-11-19 12:29:53,473 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=736333.3333333334, ans=0.0 2023-11-19 12:29:56,573 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2250, loss[loss=0.08614, simple_loss=0.1117, pruned_loss=0.02079, audio_tagging_loss=0.009487, over 15420.00 frames. ], tot_loss[loss=0.0868, simple_loss=0.1064, pruned_loss=0.02335, audio_tagging_loss=0.01022, over 3052724.68 frames. ], batch size: 56, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 12:30:05,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=736400.0, ans=0.1 2023-11-19 12:30:11,110 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=736466.6666666666, ans=0.125 2023-11-19 12:30:17,400 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=736533.3333333334, ans=0.0 2023-11-19 12:30:19,065 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=736533.3333333334, ans=0.0 2023-11-19 12:30:25,908 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=736533.3333333334, ans=0.125 2023-11-19 12:30:44,640 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:30:51,280 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-11-19 12:30:51,734 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2300, loss[loss=0.09692, simple_loss=0.1109, pruned_loss=0.03125, audio_tagging_loss=0.01021, over 14308.00 frames. ], tot_loss[loss=0.08629, simple_loss=0.1052, pruned_loss=0.02324, audio_tagging_loss=0.01044, over 3046367.57 frames. ], batch size: 54, lr: 7.05e-03, grad_scale: 8.0 2023-11-19 12:31:03,994 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.615e+01 8.238e+01 9.177e+01 1.028e+02 1.469e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 12:31:13,156 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=736866.6666666666, ans=0.125 2023-11-19 12:31:15,409 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:31:19,989 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=736866.6666666666, ans=0.125 2023-11-19 12:31:23,904 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=15.0 2023-11-19 12:31:29,098 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=736933.3333333334, ans=0.125 2023-11-19 12:31:38,104 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2023-11-19 12:31:40,611 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:31:48,023 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2350, loss[loss=0.1163, simple_loss=0.1358, pruned_loss=0.03696, audio_tagging_loss=0.01146, over 16310.00 frames. ], tot_loss[loss=0.08716, simple_loss=0.1062, pruned_loss=0.02356, audio_tagging_loss=0.01049, over 3045866.84 frames. ], batch size: 62, lr: 7.05e-03, grad_scale: 8.0 2023-11-19 12:31:50,583 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=15.0 2023-11-19 12:31:56,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=737066.6666666666, ans=0.125 2023-11-19 12:31:59,847 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=737133.3333333334, ans=0.0 2023-11-19 12:32:05,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=737133.3333333334, ans=0.125 2023-11-19 12:32:14,567 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=737200.0, ans=0.1 2023-11-19 12:32:28,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=737266.6666666666, ans=0.1 2023-11-19 12:32:37,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=737333.3333333334, ans=0.125 2023-11-19 12:32:39,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=737333.3333333334, ans=0.2 2023-11-19 12:32:43,225 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2400, loss[loss=0.08546, simple_loss=0.0994, pruned_loss=0.02403, audio_tagging_loss=0.01172, over 14388.00 frames. ], tot_loss[loss=0.08639, simple_loss=0.1051, pruned_loss=0.02316, audio_tagging_loss=0.01069, over 3048197.35 frames. ], batch size: 56, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 12:32:52,159 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.23 vs. limit=22.5 2023-11-19 12:32:55,921 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.415e+01 9.190e+01 1.007e+02 1.395e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 12:33:10,370 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=737533.3333333334, ans=0.125 2023-11-19 12:33:18,716 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.65 vs. limit=22.5 2023-11-19 12:33:38,939 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2450, loss[loss=0.0994, simple_loss=0.1305, pruned_loss=0.02346, audio_tagging_loss=0.0107, over 15556.00 frames. ], tot_loss[loss=0.08611, simple_loss=0.1049, pruned_loss=0.02297, audio_tagging_loss=0.01067, over 3050714.66 frames. ], batch size: 57, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:34:15,598 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=737933.3333333334, ans=0.0 2023-11-19 12:34:15,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=737933.3333333334, ans=0.125 2023-11-19 12:34:20,314 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2023-11-19 12:34:33,836 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2500, loss[loss=0.08936, simple_loss=0.109, pruned_loss=0.0234, audio_tagging_loss=0.01149, over 15330.00 frames. ], tot_loss[loss=0.08611, simple_loss=0.1047, pruned_loss=0.02304, audio_tagging_loss=0.01071, over 3048676.28 frames. ], batch size: 59, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:34:35,690 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=738066.6666666666, ans=0.125 2023-11-19 12:34:40,132 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.02 vs. limit=6.0 2023-11-19 12:34:45,794 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.644e+01 9.382e+01 1.016e+02 1.260e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-19 12:34:52,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=738133.3333333334, ans=0.125 2023-11-19 12:35:03,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=738200.0, ans=0.125 2023-11-19 12:35:29,254 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2550, loss[loss=0.07672, simple_loss=0.09924, pruned_loss=0.02015, audio_tagging_loss=0.006951, over 16121.00 frames. ], tot_loss[loss=0.08576, simple_loss=0.1043, pruned_loss=0.023, audio_tagging_loss=0.01061, over 3043369.46 frames. ], batch size: 62, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:35:35,569 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.87 vs. limit=22.5 2023-11-19 12:35:47,847 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.42 vs. limit=15.0 2023-11-19 12:35:49,952 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2023-11-19 12:35:58,808 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=738533.3333333334, ans=0.07 2023-11-19 12:36:03,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=738600.0, ans=0.2 2023-11-19 12:36:10,567 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2023-11-19 12:36:24,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=738666.6666666666, ans=0.125 2023-11-19 12:36:26,093 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2600, loss[loss=0.08853, simple_loss=0.1015, pruned_loss=0.02673, audio_tagging_loss=0.01102, over 15053.00 frames. ], tot_loss[loss=0.08598, simple_loss=0.1045, pruned_loss=0.02315, audio_tagging_loss=0.0106, over 3047326.18 frames. ], batch size: 59, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:36:37,711 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.428e+01 9.293e+01 1.021e+02 1.415e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 12:36:54,322 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=738866.6666666666, ans=15.0 2023-11-19 12:36:59,624 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=738933.3333333334, ans=0.0 2023-11-19 12:37:06,582 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=738933.3333333334, ans=0.2 2023-11-19 12:37:08,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=738933.3333333334, ans=0.125 2023-11-19 12:37:17,019 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=739000.0, ans=0.125 2023-11-19 12:37:20,404 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2023-11-19 12:37:21,570 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2650, loss[loss=0.07483, simple_loss=0.09702, pruned_loss=0.01832, audio_tagging_loss=0.008, over 14627.00 frames. ], tot_loss[loss=0.08622, simple_loss=0.1051, pruned_loss=0.0233, audio_tagging_loss=0.01039, over 3052006.51 frames. ], batch size: 54, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:37:33,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=739133.3333333334, ans=0.0 2023-11-19 12:37:35,969 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=739133.3333333334, ans=0.0 2023-11-19 12:37:36,935 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=739133.3333333334, ans=0.125 2023-11-19 12:37:55,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=739266.6666666666, ans=0.1 2023-11-19 12:37:57,096 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=739266.6666666666, ans=0.1 2023-11-19 12:38:10,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=739333.3333333334, ans=0.0 2023-11-19 12:38:14,268 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2023-11-19 12:38:16,958 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2700, loss[loss=0.07475, simple_loss=0.08803, pruned_loss=0.01917, audio_tagging_loss=0.01156, over 15119.00 frames. ], tot_loss[loss=0.08564, simple_loss=0.1044, pruned_loss=0.02314, audio_tagging_loss=0.01032, over 3055920.62 frames. ], batch size: 58, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:38:18,294 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=739400.0, ans=0.125 2023-11-19 12:38:18,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=739400.0, ans=0.125 2023-11-19 12:38:25,292 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=15.0 2023-11-19 12:38:26,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=739400.0, ans=0.05 2023-11-19 12:38:29,020 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.043e+01 8.415e+01 9.342e+01 1.060e+02 2.991e+02, threshold=1.868e+02, percent-clipped=1.0 2023-11-19 12:38:34,079 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=739466.6666666666, ans=0.0 2023-11-19 12:38:36,522 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.26 vs. limit=10.0 2023-11-19 12:38:50,367 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.02 vs. limit=22.5 2023-11-19 12:38:53,066 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=739600.0, ans=0.125 2023-11-19 12:39:01,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=739666.6666666666, ans=0.125 2023-11-19 12:39:08,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=739666.6666666666, ans=0.125 2023-11-19 12:39:08,592 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.11 vs. limit=15.0 2023-11-19 12:39:10,792 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.51 vs. limit=15.0 2023-11-19 12:39:12,474 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2750, loss[loss=0.08339, simple_loss=0.1012, pruned_loss=0.02309, audio_tagging_loss=0.009699, over 14997.00 frames. ], tot_loss[loss=0.08547, simple_loss=0.1043, pruned_loss=0.02304, audio_tagging_loss=0.01029, over 3052406.51 frames. ], batch size: 55, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:39:33,558 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=12.0 2023-11-19 12:39:45,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=739933.3333333334, ans=0.0 2023-11-19 12:39:46,649 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2023-11-19 12:39:59,484 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:40:06,879 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2023-11-19 12:40:08,440 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2800, loss[loss=0.089, simple_loss=0.1099, pruned_loss=0.02378, audio_tagging_loss=0.01026, over 15521.00 frames. ], tot_loss[loss=0.08529, simple_loss=0.1038, pruned_loss=0.02303, audio_tagging_loss=0.01035, over 3050305.73 frames. ], batch size: 58, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:40:20,944 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.67 vs. limit=15.0 2023-11-19 12:40:21,271 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.890e+01 8.368e+01 8.840e+01 9.465e+01 1.289e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-19 12:40:30,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=740200.0, ans=0.02 2023-11-19 12:40:46,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=740266.6666666666, ans=0.125 2023-11-19 12:41:00,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=740333.3333333334, ans=0.04949747468305833 2023-11-19 12:41:02,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=740333.3333333334, ans=0.125 2023-11-19 12:41:04,776 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2850, loss[loss=0.07171, simple_loss=0.08686, pruned_loss=0.0169, audio_tagging_loss=0.01138, over 14974.00 frames. ], tot_loss[loss=0.08487, simple_loss=0.1035, pruned_loss=0.02278, audio_tagging_loss=0.01034, over 3056093.28 frames. ], batch size: 61, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:41:08,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=740400.0, ans=10.0 2023-11-19 12:41:09,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=740400.0, ans=0.0 2023-11-19 12:41:25,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=740533.3333333334, ans=0.0 2023-11-19 12:41:46,814 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.38 vs. limit=10.0 2023-11-19 12:41:47,691 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-11-19 12:41:47,934 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.10 vs. limit=22.5 2023-11-19 12:41:49,578 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=740666.6666666666, ans=0.125 2023-11-19 12:42:00,347 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2900, loss[loss=0.08919, simple_loss=0.1143, pruned_loss=0.02488, audio_tagging_loss=0.00715, over 15154.00 frames. ], tot_loss[loss=0.08493, simple_loss=0.104, pruned_loss=0.02264, audio_tagging_loss=0.01031, over 3056369.08 frames. ], batch size: 56, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:42:12,379 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.633e+01 9.349e+01 9.901e+01 1.327e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-19 12:42:17,623 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2023-11-19 12:42:46,332 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.50 vs. limit=15.0 2023-11-19 12:42:50,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741000.0, ans=0.1 2023-11-19 12:42:53,431 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=741000.0, ans=0.125 2023-11-19 12:42:55,889 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 2950, loss[loss=0.09713, simple_loss=0.1264, pruned_loss=0.02674, audio_tagging_loss=0.007214, over 14810.00 frames. ], tot_loss[loss=0.08452, simple_loss=0.1035, pruned_loss=0.02235, audio_tagging_loss=0.01041, over 3050593.93 frames. ], batch size: 54, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:43:02,462 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=741066.6666666666, ans=0.125 2023-11-19 12:43:07,715 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=741133.3333333334, ans=0.0 2023-11-19 12:43:09,029 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2023-11-19 12:43:10,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=741133.3333333334, ans=0.125 2023-11-19 12:43:24,007 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=741200.0, ans=0.125 2023-11-19 12:43:37,795 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=741266.6666666666, ans=0.2 2023-11-19 12:43:50,422 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=741333.3333333334, ans=0.0 2023-11-19 12:43:50,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=741333.3333333334, ans=0.0 2023-11-19 12:43:52,233 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3000, loss[loss=0.08423, simple_loss=0.09239, pruned_loss=0.02731, audio_tagging_loss=0.01072, over 15197.00 frames. ], tot_loss[loss=0.08522, simple_loss=0.104, pruned_loss=0.02273, audio_tagging_loss=0.01052, over 3048331.93 frames. ], batch size: 56, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:43:52,233 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 12:44:08,656 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.7801, 4.2205, 3.7761, 3.3990], device='cuda:1') 2023-11-19 12:44:20,803 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5314, 2.6195, 3.6899, 3.2252], device='cuda:1') 2023-11-19 12:44:24,096 INFO [train_asr.py:1147] (1/4) Epoch 10, validation: loss=0.06403, simple_loss=0.05543, pruned_loss=0.006395, audio_tagging_loss=0.02992, over 4681554.00 frames. 2023-11-19 12:44:24,097 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 12:44:35,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2023-11-19 12:44:35,740 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.397e+01 8.400e+01 9.242e+01 1.017e+02 1.416e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 12:44:55,091 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:44:57,208 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.123e-02 2023-11-19 12:44:58,548 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.19 vs. limit=22.5 2023-11-19 12:45:03,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=741600.0, ans=0.125 2023-11-19 12:45:14,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=741666.6666666666, ans=0.0 2023-11-19 12:45:17,949 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.008e-01 2023-11-19 12:45:19,200 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3050, loss[loss=0.08853, simple_loss=0.1205, pruned_loss=0.01926, audio_tagging_loss=0.00901, over 14667.00 frames. ], tot_loss[loss=0.0853, simple_loss=0.1042, pruned_loss=0.02267, audio_tagging_loss=0.01054, over 3050465.05 frames. ], batch size: 54, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:45:22,956 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2023-11-19 12:45:25,763 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=741733.3333333334, ans=0.125 2023-11-19 12:45:50,491 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=741866.6666666666, ans=0.125 2023-11-19 12:45:51,388 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:45:52,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=741933.3333333334, ans=0.0 2023-11-19 12:45:52,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=741933.3333333334, ans=0.125 2023-11-19 12:45:55,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=741933.3333333334, ans=0.125 2023-11-19 12:45:58,309 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.41 vs. limit=6.0 2023-11-19 12:46:07,554 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=742000.0, ans=0.025 2023-11-19 12:46:12,786 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=742000.0, ans=0.0 2023-11-19 12:46:13,841 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=742066.6666666666, ans=0.125 2023-11-19 12:46:14,658 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3100, loss[loss=0.09513, simple_loss=0.1224, pruned_loss=0.02376, audio_tagging_loss=0.01018, over 16097.00 frames. ], tot_loss[loss=0.08514, simple_loss=0.104, pruned_loss=0.02253, audio_tagging_loss=0.0106, over 3049794.51 frames. ], batch size: 61, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 12:46:16,133 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=22.5 2023-11-19 12:46:16,944 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=742066.6666666666, ans=0.125 2023-11-19 12:46:26,934 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.581e+01 8.703e+01 9.646e+01 1.063e+02 1.410e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-19 12:47:00,545 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=742333.3333333334, ans=0.2 2023-11-19 12:47:00,634 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=742333.3333333334, ans=0.05 2023-11-19 12:47:10,466 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3150, loss[loss=0.08674, simple_loss=0.1174, pruned_loss=0.01845, audio_tagging_loss=0.009609, over 15254.00 frames. ], tot_loss[loss=0.08584, simple_loss=0.1046, pruned_loss=0.02287, audio_tagging_loss=0.01066, over 3049528.55 frames. ], batch size: 56, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 12:47:10,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=742400.0, ans=0.125 2023-11-19 12:47:17,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=742400.0, ans=0.0 2023-11-19 12:47:20,886 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.28 vs. limit=15.0 2023-11-19 12:47:22,407 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=742466.6666666666, ans=0.125 2023-11-19 12:47:26,209 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=742466.6666666666, ans=0.0 2023-11-19 12:47:36,785 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=742533.3333333334, ans=0.2 2023-11-19 12:47:36,854 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:47:41,472 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2023-11-19 12:47:51,448 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.30 vs. limit=6.0 2023-11-19 12:47:53,466 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2023-11-19 12:47:55,676 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=742666.6666666666, ans=0.1 2023-11-19 12:48:01,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=742666.6666666666, ans=0.125 2023-11-19 12:48:01,979 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=742666.6666666666, ans=0.0 2023-11-19 12:48:05,202 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=742733.3333333334, ans=0.125 2023-11-19 12:48:06,082 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3200, loss[loss=0.07718, simple_loss=0.0947, pruned_loss=0.02051, audio_tagging_loss=0.00932, over 15200.00 frames. ], tot_loss[loss=0.08596, simple_loss=0.1046, pruned_loss=0.023, audio_tagging_loss=0.01065, over 3040932.86 frames. ], batch size: 59, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 12:48:20,449 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.457e+01 8.995e+01 9.869e+01 1.157e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-19 12:48:38,101 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=742866.6666666666, ans=0.1 2023-11-19 12:48:44,699 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2023-11-19 12:48:54,961 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=743000.0, ans=0.2 2023-11-19 12:48:58,572 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-19 12:49:02,764 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3250, loss[loss=0.06225, simple_loss=0.07464, pruned_loss=0.01311, audio_tagging_loss=0.01182, over 16142.00 frames. ], tot_loss[loss=0.08581, simple_loss=0.1045, pruned_loss=0.0228, audio_tagging_loss=0.01075, over 3045660.41 frames. ], batch size: 63, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 12:49:07,422 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2023-11-19 12:49:18,589 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-11-19 12:49:21,908 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2023-11-19 12:49:31,728 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=743200.0, ans=0.125 2023-11-19 12:49:33,753 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.01 vs. limit=22.5 2023-11-19 12:49:45,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=743266.6666666666, ans=0.05 2023-11-19 12:49:57,964 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3300, loss[loss=0.1033, simple_loss=0.1242, pruned_loss=0.03253, audio_tagging_loss=0.008628, over 15604.00 frames. ], tot_loss[loss=0.08515, simple_loss=0.1035, pruned_loss=0.02241, audio_tagging_loss=0.01098, over 3042799.95 frames. ], batch size: 58, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 12:49:58,262 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=743400.0, ans=0.125 2023-11-19 12:50:07,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=743466.6666666666, ans=0.125 2023-11-19 12:50:10,524 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.235e+01 8.992e+01 9.663e+01 1.572e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 12:50:19,295 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=743533.3333333334, ans=0.2 2023-11-19 12:50:25,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=743533.3333333334, ans=0.125 2023-11-19 12:50:39,343 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=743600.0, ans=0.0 2023-11-19 12:50:42,492 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=743666.6666666666, ans=0.1 2023-11-19 12:50:52,826 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3350, loss[loss=0.07306, simple_loss=0.08454, pruned_loss=0.0221, audio_tagging_loss=0.008684, over 15240.00 frames. ], tot_loss[loss=0.08569, simple_loss=0.1041, pruned_loss=0.0228, audio_tagging_loss=0.01082, over 3040976.64 frames. ], batch size: 58, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 12:50:56,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=743733.3333333334, ans=0.2 2023-11-19 12:51:15,745 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=743866.6666666666, ans=0.95 2023-11-19 12:51:16,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.21 vs. limit=15.0 2023-11-19 12:51:17,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=743866.6666666666, ans=0.125 2023-11-19 12:51:18,189 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=12.0 2023-11-19 12:51:24,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=743866.6666666666, ans=22.5 2023-11-19 12:51:24,076 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=22.5 2023-11-19 12:51:26,875 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=743933.3333333334, ans=0.125 2023-11-19 12:51:29,897 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=743933.3333333334, ans=0.2 2023-11-19 12:51:30,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2023-11-19 12:51:37,670 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=744000.0, ans=0.025 2023-11-19 12:51:40,383 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=744000.0, ans=0.125 2023-11-19 12:51:44,123 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.90 vs. limit=15.0 2023-11-19 12:51:49,084 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3400, loss[loss=0.09542, simple_loss=0.114, pruned_loss=0.03039, audio_tagging_loss=0.008008, over 15239.00 frames. ], tot_loss[loss=0.08531, simple_loss=0.1039, pruned_loss=0.02274, audio_tagging_loss=0.01064, over 3041078.14 frames. ], batch size: 58, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 12:52:04,017 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.485e+01 9.102e+01 1.010e+02 1.792e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-19 12:52:05,310 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744133.3333333334, ans=0.1 2023-11-19 12:52:19,137 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=744200.0, ans=0.125 2023-11-19 12:52:25,723 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.19 vs. limit=12.0 2023-11-19 12:52:40,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=744333.3333333334, ans=0.125 2023-11-19 12:52:45,405 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3450, loss[loss=0.05039, simple_loss=0.06209, pruned_loss=0.009354, audio_tagging_loss=0.009991, over 14243.00 frames. ], tot_loss[loss=0.08522, simple_loss=0.104, pruned_loss=0.02266, audio_tagging_loss=0.01057, over 3046355.10 frames. ], batch size: 53, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 12:52:47,777 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=744400.0, ans=0.0 2023-11-19 12:52:48,778 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=744400.0, ans=0.125 2023-11-19 12:52:49,357 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2023-11-19 12:53:05,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=744466.6666666666, ans=0.125 2023-11-19 12:53:34,435 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=744666.6666666666, ans=0.0 2023-11-19 12:53:38,587 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=744666.6666666666, ans=0.0 2023-11-19 12:53:40,479 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3500, loss[loss=0.09908, simple_loss=0.1299, pruned_loss=0.02546, audio_tagging_loss=0.008686, over 15128.00 frames. ], tot_loss[loss=0.08542, simple_loss=0.1044, pruned_loss=0.02266, audio_tagging_loss=0.01055, over 3048711.30 frames. ], batch size: 56, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 12:53:51,552 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.26 vs. limit=15.0 2023-11-19 12:53:55,412 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.325e+01 9.039e+01 9.791e+01 1.416e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 12:53:56,645 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=744800.0, ans=0.0 2023-11-19 12:54:08,665 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:54:10,531 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=744866.6666666666, ans=0.2 2023-11-19 12:54:18,103 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=744933.3333333334, ans=0.125 2023-11-19 12:54:31,754 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=745000.0, ans=0.0 2023-11-19 12:54:36,801 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3550, loss[loss=0.08963, simple_loss=0.1149, pruned_loss=0.02309, audio_tagging_loss=0.009065, over 14719.00 frames. ], tot_loss[loss=0.0851, simple_loss=0.1038, pruned_loss=0.02261, audio_tagging_loss=0.01058, over 3038500.32 frames. ], batch size: 56, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 12:54:41,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=745066.6666666666, ans=0.1 2023-11-19 12:55:07,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=745200.0, ans=0.125 2023-11-19 12:55:21,684 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=745333.3333333334, ans=0.0 2023-11-19 12:55:27,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=745333.3333333334, ans=0.125 2023-11-19 12:55:31,988 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3600, loss[loss=0.09353, simple_loss=0.1184, pruned_loss=0.02417, audio_tagging_loss=0.01013, over 15325.00 frames. ], tot_loss[loss=0.08467, simple_loss=0.1035, pruned_loss=0.02248, audio_tagging_loss=0.01047, over 3043333.68 frames. ], batch size: 59, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 12:55:33,806 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=745400.0, ans=0.0 2023-11-19 12:55:45,987 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=745466.6666666666, ans=0.125 2023-11-19 12:55:46,777 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.231e+01 8.451e+01 8.878e+01 9.732e+01 1.759e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-19 12:55:49,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=745466.6666666666, ans=0.125 2023-11-19 12:55:50,195 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=745466.6666666666, ans=0.09899494936611666 2023-11-19 12:56:10,706 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2023-11-19 12:56:12,519 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=745600.0, ans=0.125 2023-11-19 12:56:13,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=745600.0, ans=0.0 2023-11-19 12:56:13,549 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=745600.0, ans=0.0 2023-11-19 12:56:26,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=745733.3333333334, ans=0.0 2023-11-19 12:56:27,492 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3650, loss[loss=0.07604, simple_loss=0.08548, pruned_loss=0.02443, audio_tagging_loss=0.008865, over 14113.00 frames. ], tot_loss[loss=0.08545, simple_loss=0.1045, pruned_loss=0.02291, audio_tagging_loss=0.01029, over 3047818.44 frames. ], batch size: 53, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 12:56:29,859 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=745733.3333333334, ans=0.09899494936611666 2023-11-19 12:56:30,339 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.40 vs. limit=15.0 2023-11-19 12:56:33,018 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=745733.3333333334, ans=0.07 2023-11-19 12:56:41,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=745800.0, ans=0.2 2023-11-19 12:56:51,172 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2023-11-19 12:56:54,585 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.24 vs. limit=15.0 2023-11-19 12:57:21,739 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=746000.0, ans=0.125 2023-11-19 12:57:23,625 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3700, loss[loss=0.1313, simple_loss=0.172, pruned_loss=0.03778, audio_tagging_loss=0.007523, over 15381.00 frames. ], tot_loss[loss=0.08576, simple_loss=0.1052, pruned_loss=0.0229, audio_tagging_loss=0.01026, over 3052688.34 frames. ], batch size: 56, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 12:57:30,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=746066.6666666666, ans=0.04949747468305833 2023-11-19 12:57:31,385 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=746066.6666666666, ans=0.0 2023-11-19 12:57:34,717 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=746133.3333333334, ans=0.2 2023-11-19 12:57:37,612 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.774e+01 9.543e+01 1.094e+02 1.492e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-19 12:57:45,288 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746200.0, ans=0.1 2023-11-19 12:57:47,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=746200.0, ans=0.1 2023-11-19 12:57:47,480 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=746200.0, ans=0.125 2023-11-19 12:57:50,819 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.52 vs. limit=22.5 2023-11-19 12:58:05,815 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.17 vs. limit=22.5 2023-11-19 12:58:11,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=746333.3333333334, ans=0.125 2023-11-19 12:58:19,028 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3750, loss[loss=0.1117, simple_loss=0.1422, pruned_loss=0.0331, audio_tagging_loss=0.007475, over 15531.00 frames. ], tot_loss[loss=0.08635, simple_loss=0.1058, pruned_loss=0.02313, audio_tagging_loss=0.01031, over 3047215.39 frames. ], batch size: 54, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 12:58:24,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=746400.0, ans=0.2 2023-11-19 12:58:34,082 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=746466.6666666666, ans=0.125 2023-11-19 12:58:39,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=746466.6666666666, ans=0.125 2023-11-19 12:58:40,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=746533.3333333334, ans=0.125 2023-11-19 12:58:49,855 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=746533.3333333334, ans=0.125 2023-11-19 12:58:56,952 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:59:09,179 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.35 vs. limit=15.0 2023-11-19 12:59:16,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=746733.3333333334, ans=0.0 2023-11-19 12:59:17,447 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3800, loss[loss=0.1001, simple_loss=0.1233, pruned_loss=0.02753, audio_tagging_loss=0.01096, over 15977.00 frames. ], tot_loss[loss=0.08541, simple_loss=0.1044, pruned_loss=0.02277, audio_tagging_loss=0.01045, over 3050664.77 frames. ], batch size: 59, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 12:59:20,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746733.3333333334, ans=0.1 2023-11-19 12:59:28,394 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=746800.0, ans=0.125 2023-11-19 12:59:31,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=746800.0, ans=0.2 2023-11-19 12:59:32,188 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 8.389e+01 8.999e+01 1.009e+02 1.684e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 12:59:35,625 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=746800.0, ans=0.09899494936611666 2023-11-19 13:00:13,362 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3850, loss[loss=0.07321, simple_loss=0.08857, pruned_loss=0.01639, audio_tagging_loss=0.01253, over 15340.00 frames. ], tot_loss[loss=0.08583, simple_loss=0.105, pruned_loss=0.02282, audio_tagging_loss=0.01051, over 3051242.81 frames. ], batch size: 60, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 13:00:17,844 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:01:08,402 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3900, loss[loss=0.08706, simple_loss=0.1002, pruned_loss=0.02146, audio_tagging_loss=0.01553, over 14468.00 frames. ], tot_loss[loss=0.08652, simple_loss=0.1061, pruned_loss=0.02299, audio_tagging_loss=0.01048, over 3050381.21 frames. ], batch size: 53, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 13:01:09,716 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=747400.0, ans=0.02 2023-11-19 13:01:20,843 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=747466.6666666666, ans=0.125 2023-11-19 13:01:23,252 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.643e+01 8.736e+01 9.302e+01 1.013e+02 3.038e+02, threshold=1.860e+02, percent-clipped=1.0 2023-11-19 13:01:25,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2023-11-19 13:01:27,720 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=747466.6666666666, ans=0.1 2023-11-19 13:01:39,298 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=747533.3333333334, ans=0.2 2023-11-19 13:01:52,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=747666.6666666666, ans=0.09899494936611666 2023-11-19 13:01:55,406 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=747666.6666666666, ans=0.2 2023-11-19 13:01:55,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=747666.6666666666, ans=0.0 2023-11-19 13:02:04,250 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 3950, loss[loss=0.1115, simple_loss=0.1385, pruned_loss=0.03334, audio_tagging_loss=0.008856, over 15852.00 frames. ], tot_loss[loss=0.08661, simple_loss=0.106, pruned_loss=0.02304, audio_tagging_loss=0.01058, over 3052196.60 frames. ], batch size: 57, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 13:02:04,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=747733.3333333334, ans=0.125 2023-11-19 13:02:10,359 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=747733.3333333334, ans=0.0 2023-11-19 13:02:12,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=747733.3333333334, ans=0.125 2023-11-19 13:02:25,217 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=22.5 2023-11-19 13:02:29,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=747866.6666666666, ans=0.0 2023-11-19 13:02:44,791 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747933.3333333334, ans=0.1 2023-11-19 13:02:46,872 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=747933.3333333334, ans=0.125 2023-11-19 13:02:54,039 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=748000.0, ans=0.125 2023-11-19 13:03:01,254 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4000, loss[loss=0.07365, simple_loss=0.08469, pruned_loss=0.0187, audio_tagging_loss=0.01261, over 15340.00 frames. ], tot_loss[loss=0.08626, simple_loss=0.1055, pruned_loss=0.02285, audio_tagging_loss=0.01068, over 3045204.18 frames. ], batch size: 58, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 13:03:14,911 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.731e+01 8.515e+01 9.235e+01 1.023e+02 1.834e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 13:03:26,635 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.97 vs. limit=22.5 2023-11-19 13:03:42,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=748266.6666666666, ans=0.125 2023-11-19 13:03:50,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=748333.3333333334, ans=0.025 2023-11-19 13:03:53,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=748333.3333333334, ans=0.125 2023-11-19 13:03:56,340 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4050, loss[loss=0.08343, simple_loss=0.1023, pruned_loss=0.02218, audio_tagging_loss=0.0101, over 14735.00 frames. ], tot_loss[loss=0.08657, simple_loss=0.1057, pruned_loss=0.02304, audio_tagging_loss=0.01067, over 3043945.20 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:03:58,492 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:04:02,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=748400.0, ans=0.125 2023-11-19 13:04:04,494 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=22.5 2023-11-19 13:04:16,106 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=748466.6666666666, ans=0.125 2023-11-19 13:04:35,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=748600.0, ans=0.125 2023-11-19 13:04:39,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=748666.6666666666, ans=0.1 2023-11-19 13:04:51,375 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4100, loss[loss=0.07433, simple_loss=0.08244, pruned_loss=0.02048, audio_tagging_loss=0.01263, over 15157.00 frames. ], tot_loss[loss=0.08642, simple_loss=0.1055, pruned_loss=0.02297, audio_tagging_loss=0.01069, over 3046812.97 frames. ], batch size: 58, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:04:56,977 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=748733.3333333334, ans=0.2 2023-11-19 13:05:00,310 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=12.0 2023-11-19 13:05:05,419 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=748800.0, ans=0.125 2023-11-19 13:05:06,275 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.287e+01 8.382e+01 9.039e+01 9.605e+01 1.210e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 13:05:08,661 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=748800.0, ans=0.95 2023-11-19 13:05:19,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=748866.6666666666, ans=0.0 2023-11-19 13:05:21,986 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.23 vs. limit=15.0 2023-11-19 13:05:29,575 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=748933.3333333334, ans=0.2 2023-11-19 13:05:42,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=15.0 2023-11-19 13:05:46,725 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4150, loss[loss=0.0706, simple_loss=0.08825, pruned_loss=0.01636, audio_tagging_loss=0.01011, over 16145.00 frames. ], tot_loss[loss=0.08647, simple_loss=0.1058, pruned_loss=0.02307, audio_tagging_loss=0.01051, over 3045990.25 frames. ], batch size: 60, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:05:51,126 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=749066.6666666666, ans=0.1 2023-11-19 13:05:52,097 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=749066.6666666666, ans=0.125 2023-11-19 13:06:25,582 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:06:33,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.27 vs. limit=10.0 2023-11-19 13:06:41,690 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4200, loss[loss=0.08769, simple_loss=0.1116, pruned_loss=0.02225, audio_tagging_loss=0.009637, over 15641.00 frames. ], tot_loss[loss=0.08683, simple_loss=0.1066, pruned_loss=0.02323, audio_tagging_loss=0.0103, over 3049095.97 frames. ], batch size: 59, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:06:50,607 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=15.0 2023-11-19 13:06:55,964 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.871e+01 8.337e+01 9.071e+01 1.010e+02 1.242e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-19 13:06:57,707 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.82 vs. limit=15.0 2023-11-19 13:07:10,497 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=749533.3333333334, ans=0.1 2023-11-19 13:07:35,520 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=749666.6666666666, ans=0.0 2023-11-19 13:07:37,278 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4250, loss[loss=0.05777, simple_loss=0.06709, pruned_loss=0.01077, audio_tagging_loss=0.01346, over 15103.00 frames. ], tot_loss[loss=0.08643, simple_loss=0.1058, pruned_loss=0.0232, audio_tagging_loss=0.01032, over 3041756.13 frames. ], batch size: 58, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:07:38,489 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=749733.3333333334, ans=0.125 2023-11-19 13:08:01,268 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=749866.6666666666, ans=0.2 2023-11-19 13:08:01,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-19 13:08:32,532 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4300, loss[loss=0.08889, simple_loss=0.102, pruned_loss=0.02614, audio_tagging_loss=0.01175, over 14243.00 frames. ], tot_loss[loss=0.08672, simple_loss=0.1062, pruned_loss=0.02336, audio_tagging_loss=0.01026, over 3043286.58 frames. ], batch size: 56, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:08:45,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=750133.3333333334, ans=10.0 2023-11-19 13:08:47,738 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.890e+01 8.411e+01 9.252e+01 1.004e+02 1.296e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-19 13:08:51,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=750133.3333333334, ans=0.0 2023-11-19 13:08:52,624 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=12.0 2023-11-19 13:09:03,347 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=750200.0, ans=0.0 2023-11-19 13:09:23,794 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=750333.3333333334, ans=0.04949747468305833 2023-11-19 13:09:28,157 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.510e-03 2023-11-19 13:09:28,921 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4350, loss[loss=0.09469, simple_loss=0.1144, pruned_loss=0.02975, audio_tagging_loss=0.007729, over 14875.00 frames. ], tot_loss[loss=0.08694, simple_loss=0.1069, pruned_loss=0.02335, audio_tagging_loss=0.01014, over 3044532.40 frames. ], batch size: 55, lr: 6.99e-03, grad_scale: 16.0 2023-11-19 13:09:31,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=750400.0, ans=0.2 2023-11-19 13:09:34,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=750400.0, ans=0.0 2023-11-19 13:09:44,353 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=750466.6666666666, ans=0.5 2023-11-19 13:09:45,024 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=15.0 2023-11-19 13:09:57,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=750533.3333333334, ans=0.125 2023-11-19 13:10:03,456 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=12.0 2023-11-19 13:10:12,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=750666.6666666666, ans=0.125 2023-11-19 13:10:14,719 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=750666.6666666666, ans=0.125 2023-11-19 13:10:24,885 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4400, loss[loss=0.1069, simple_loss=0.137, pruned_loss=0.02642, audio_tagging_loss=0.01195, over 15650.00 frames. ], tot_loss[loss=0.08746, simple_loss=0.1076, pruned_loss=0.0235, audio_tagging_loss=0.01016, over 3039953.23 frames. ], batch size: 56, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:10:34,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=750800.0, ans=0.0 2023-11-19 13:10:39,851 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.801e+01 8.194e+01 9.026e+01 9.942e+01 1.275e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 13:11:20,612 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4450, loss[loss=0.08402, simple_loss=0.105, pruned_loss=0.01979, audio_tagging_loss=0.01173, over 16066.00 frames. ], tot_loss[loss=0.08786, simple_loss=0.1083, pruned_loss=0.02364, audio_tagging_loss=0.0101, over 3047979.31 frames. ], batch size: 59, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:11:27,346 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.85 vs. limit=22.5 2023-11-19 13:11:31,307 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=751133.3333333334, ans=0.125 2023-11-19 13:11:51,015 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.30 vs. limit=22.5 2023-11-19 13:12:16,441 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4500, loss[loss=0.08211, simple_loss=0.101, pruned_loss=0.01873, audio_tagging_loss=0.0129, over 16054.00 frames. ], tot_loss[loss=0.08811, simple_loss=0.1086, pruned_loss=0.02382, audio_tagging_loss=0.009972, over 3055731.93 frames. ], batch size: 58, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:12:19,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=751400.0, ans=0.125 2023-11-19 13:12:28,261 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=12.0 2023-11-19 13:12:31,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=751466.6666666666, ans=0.0 2023-11-19 13:12:32,373 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.486e+01 8.202e+01 9.163e+01 9.982e+01 1.315e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-19 13:12:36,194 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2023-11-19 13:12:43,095 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=751533.3333333334, ans=0.1 2023-11-19 13:12:49,033 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=751600.0, ans=0.0 2023-11-19 13:12:49,076 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=751600.0, ans=0.1 2023-11-19 13:13:00,077 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=751666.6666666666, ans=0.125 2023-11-19 13:13:12,551 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4550, loss[loss=0.087, simple_loss=0.1016, pruned_loss=0.02382, audio_tagging_loss=0.01238, over 14378.00 frames. ], tot_loss[loss=0.08729, simple_loss=0.1075, pruned_loss=0.02348, audio_tagging_loss=0.01005, over 3050628.94 frames. ], batch size: 55, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:13:14,900 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=751733.3333333334, ans=0.0 2023-11-19 13:13:30,113 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=751800.0, ans=0.125 2023-11-19 13:13:51,327 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=751933.3333333334, ans=0.0 2023-11-19 13:13:54,725 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:14:00,490 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=752000.0, ans=0.05 2023-11-19 13:14:02,646 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.123e-01 2023-11-19 13:14:03,930 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.00 vs. limit=22.5 2023-11-19 13:14:07,799 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4600, loss[loss=0.08812, simple_loss=0.1155, pruned_loss=0.02039, audio_tagging_loss=0.009954, over 14937.00 frames. ], tot_loss[loss=0.08592, simple_loss=0.1055, pruned_loss=0.02292, audio_tagging_loss=0.01025, over 3052337.81 frames. ], batch size: 56, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:14:22,830 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=752133.3333333334, ans=0.125 2023-11-19 13:14:23,753 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.499e+01 8.092e+01 8.852e+01 9.685e+01 1.325e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-19 13:14:29,737 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=22.5 2023-11-19 13:14:33,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=752200.0, ans=0.1 2023-11-19 13:14:38,807 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=752200.0, ans=0.2 2023-11-19 13:14:39,881 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=752200.0, ans=0.2 2023-11-19 13:14:48,315 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=752266.6666666666, ans=0.0 2023-11-19 13:14:53,704 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=752333.3333333334, ans=10.0 2023-11-19 13:15:03,427 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2023-11-19 13:15:04,122 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4650, loss[loss=0.06394, simple_loss=0.07593, pruned_loss=0.01598, audio_tagging_loss=0.009997, over 14943.00 frames. ], tot_loss[loss=0.08598, simple_loss=0.1056, pruned_loss=0.02287, audio_tagging_loss=0.0103, over 3048503.56 frames. ], batch size: 58, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:15:47,579 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=752666.6666666666, ans=0.0 2023-11-19 13:15:59,523 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4700, loss[loss=0.09082, simple_loss=0.1146, pruned_loss=0.02469, audio_tagging_loss=0.00883, over 14916.00 frames. ], tot_loss[loss=0.08594, simple_loss=0.1051, pruned_loss=0.02288, audio_tagging_loss=0.01051, over 3046017.31 frames. ], batch size: 54, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:16:04,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=752733.3333333334, ans=0.0 2023-11-19 13:16:14,821 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.706e+01 9.585e+01 1.066e+02 1.440e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-19 13:16:25,475 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=752866.6666666666, ans=0.025 2023-11-19 13:16:27,698 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=752866.6666666666, ans=0.125 2023-11-19 13:16:39,360 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=752933.3333333334, ans=0.125 2023-11-19 13:16:54,896 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4750, loss[loss=0.1039, simple_loss=0.134, pruned_loss=0.02529, audio_tagging_loss=0.01165, over 15326.00 frames. ], tot_loss[loss=0.0862, simple_loss=0.1057, pruned_loss=0.02278, audio_tagging_loss=0.01058, over 3055259.51 frames. ], batch size: 55, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:17:15,331 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=753133.3333333334, ans=0.1 2023-11-19 13:17:21,942 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.83 vs. limit=22.5 2023-11-19 13:17:24,287 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=753200.0, ans=0.125 2023-11-19 13:17:32,155 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=753266.6666666666, ans=0.125 2023-11-19 13:17:51,341 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4800, loss[loss=0.0786, simple_loss=0.09752, pruned_loss=0.01868, audio_tagging_loss=0.01116, over 14893.00 frames. ], tot_loss[loss=0.08686, simple_loss=0.1059, pruned_loss=0.02317, audio_tagging_loss=0.01075, over 3060176.52 frames. ], batch size: 56, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:17:54,267 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.11 vs. limit=8.0 2023-11-19 13:18:05,361 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=753466.6666666666, ans=10.0 2023-11-19 13:18:06,144 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.271e+01 9.115e+01 1.017e+02 1.442e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-19 13:18:35,990 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=22.5 2023-11-19 13:18:37,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=753666.6666666666, ans=0.1 2023-11-19 13:18:45,782 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4850, loss[loss=0.1111, simple_loss=0.1373, pruned_loss=0.03351, audio_tagging_loss=0.008945, over 15361.00 frames. ], tot_loss[loss=0.0859, simple_loss=0.1048, pruned_loss=0.02266, audio_tagging_loss=0.01082, over 3052164.00 frames. ], batch size: 57, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:18:59,916 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=753800.0, ans=0.0 2023-11-19 13:19:08,672 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=753866.6666666666, ans=0.125 2023-11-19 13:19:16,788 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=753866.6666666666, ans=0.125 2023-11-19 13:19:38,020 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2023-11-19 13:19:41,351 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-11-19 13:19:41,817 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4900, loss[loss=0.09401, simple_loss=0.1147, pruned_loss=0.02785, audio_tagging_loss=0.00882, over 15500.00 frames. ], tot_loss[loss=0.08653, simple_loss=0.1058, pruned_loss=0.0229, audio_tagging_loss=0.01071, over 3055020.56 frames. ], batch size: 58, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:19:43,115 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=754066.6666666666, ans=0.0 2023-11-19 13:19:57,513 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.439e+01 8.307e+01 9.002e+01 9.755e+01 1.261e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 13:20:04,099 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=754200.0, ans=0.125 2023-11-19 13:20:09,403 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=754200.0, ans=0.125 2023-11-19 13:20:11,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=754200.0, ans=0.125 2023-11-19 13:20:20,851 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=754266.6666666666, ans=0.125 2023-11-19 13:20:37,487 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 4950, loss[loss=0.07705, simple_loss=0.08787, pruned_loss=0.02107, audio_tagging_loss=0.01205, over 15047.00 frames. ], tot_loss[loss=0.08518, simple_loss=0.1043, pruned_loss=0.02253, audio_tagging_loss=0.01053, over 3044821.42 frames. ], batch size: 56, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:20:46,125 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=754400.0, ans=0.125 2023-11-19 13:21:10,156 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.69 vs. limit=15.0 2023-11-19 13:21:12,766 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=754600.0, ans=0.09899494936611666 2023-11-19 13:21:14,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=754600.0, ans=0.0 2023-11-19 13:21:32,749 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5000, loss[loss=0.09767, simple_loss=0.1154, pruned_loss=0.02439, audio_tagging_loss=0.01561, over 15541.00 frames. ], tot_loss[loss=0.0849, simple_loss=0.104, pruned_loss=0.02245, audio_tagging_loss=0.01044, over 3044766.79 frames. ], batch size: 56, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:21:48,527 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.750e+01 8.260e+01 8.986e+01 1.009e+02 1.320e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-19 13:21:49,857 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=754800.0, ans=0.125 2023-11-19 13:21:50,149 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.91 vs. limit=22.5 2023-11-19 13:21:51,471 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.09 vs. limit=15.0 2023-11-19 13:21:55,737 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=754866.6666666666, ans=0.125 2023-11-19 13:21:55,771 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=754866.6666666666, ans=0.125 2023-11-19 13:22:12,962 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=754933.3333333334, ans=0.2 2023-11-19 13:22:25,151 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=755000.0, ans=0.125 2023-11-19 13:22:28,173 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5050, loss[loss=0.1098, simple_loss=0.1384, pruned_loss=0.03047, audio_tagging_loss=0.01017, over 15557.00 frames. ], tot_loss[loss=0.08401, simple_loss=0.1028, pruned_loss=0.02221, audio_tagging_loss=0.01039, over 3045781.39 frames. ], batch size: 57, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:22:50,063 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:22:51,181 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=755200.0, ans=0.125 2023-11-19 13:22:56,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.45 vs. limit=22.5 2023-11-19 13:22:58,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=755200.0, ans=0.09899494936611666 2023-11-19 13:23:08,691 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=755266.6666666666, ans=0.0 2023-11-19 13:23:13,647 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755333.3333333334, ans=0.1 2023-11-19 13:23:15,715 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:23:22,724 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=755333.3333333334, ans=0.125 2023-11-19 13:23:24,488 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5100, loss[loss=0.09129, simple_loss=0.1003, pruned_loss=0.02918, audio_tagging_loss=0.01199, over 14824.00 frames. ], tot_loss[loss=0.08295, simple_loss=0.1017, pruned_loss=0.02174, audio_tagging_loss=0.01038, over 3039710.61 frames. ], batch size: 55, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:23:24,770 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=755400.0, ans=0.0 2023-11-19 13:23:26,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2023-11-19 13:23:31,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=755400.0, ans=0.125 2023-11-19 13:23:34,729 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=755466.6666666666, ans=0.1 2023-11-19 13:23:37,838 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:23:38,953 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=755466.6666666666, ans=0.125 2023-11-19 13:23:39,699 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.354e+01 8.218e+01 8.903e+01 1.035e+02 1.339e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-19 13:23:40,946 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=755466.6666666666, ans=0.0 2023-11-19 13:23:45,870 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=755533.3333333334, ans=10.0 2023-11-19 13:23:52,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=755533.3333333334, ans=0.125 2023-11-19 13:24:00,156 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:24:17,074 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=755666.6666666666, ans=0.0 2023-11-19 13:24:20,065 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5150, loss[loss=0.08567, simple_loss=0.1058, pruned_loss=0.02171, audio_tagging_loss=0.01107, over 14884.00 frames. ], tot_loss[loss=0.08344, simple_loss=0.1021, pruned_loss=0.02196, audio_tagging_loss=0.01042, over 3034566.25 frames. ], batch size: 58, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:24:28,496 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2023-11-19 13:24:32,615 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=755800.0, ans=0.1 2023-11-19 13:24:40,371 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=755800.0, ans=0.0 2023-11-19 13:24:48,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=755866.6666666666, ans=0.125 2023-11-19 13:24:52,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=755933.3333333334, ans=0.125 2023-11-19 13:24:59,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=755933.3333333334, ans=0.1 2023-11-19 13:25:15,941 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5200, loss[loss=0.1025, simple_loss=0.1292, pruned_loss=0.02923, audio_tagging_loss=0.008691, over 14354.00 frames. ], tot_loss[loss=0.08455, simple_loss=0.1039, pruned_loss=0.02233, audio_tagging_loss=0.01029, over 3031128.33 frames. ], batch size: 52, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:25:18,228 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=756066.6666666666, ans=0.125 2023-11-19 13:25:31,605 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.473e+01 9.085e+01 1.039e+02 1.273e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 13:25:37,683 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=756200.0, ans=0.125 2023-11-19 13:26:09,222 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2023-11-19 13:26:11,821 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5250, loss[loss=0.1024, simple_loss=0.1291, pruned_loss=0.02712, audio_tagging_loss=0.01068, over 15974.00 frames. ], tot_loss[loss=0.08569, simple_loss=0.1053, pruned_loss=0.02286, audio_tagging_loss=0.01015, over 3032278.42 frames. ], batch size: 58, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:26:15,837 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=756400.0, ans=0.1 2023-11-19 13:26:21,030 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=756400.0, ans=0.1 2023-11-19 13:26:24,229 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=756466.6666666666, ans=0.0 2023-11-19 13:26:27,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756466.6666666666, ans=0.1 2023-11-19 13:26:34,199 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:26:39,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.37 vs. limit=10.0 2023-11-19 13:26:43,718 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=756600.0, ans=0.125 2023-11-19 13:26:49,983 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=756600.0, ans=0.0 2023-11-19 13:26:54,535 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2023-11-19 13:27:03,061 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=756666.6666666666, ans=0.0 2023-11-19 13:27:07,158 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5300, loss[loss=0.06856, simple_loss=0.08445, pruned_loss=0.01753, audio_tagging_loss=0.008805, over 15059.00 frames. ], tot_loss[loss=0.08555, simple_loss=0.1051, pruned_loss=0.0228, audio_tagging_loss=0.0102, over 3035518.46 frames. ], batch size: 59, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:27:11,126 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2023-11-19 13:27:15,235 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-19 13:27:22,573 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.871e+01 8.251e+01 9.151e+01 1.020e+02 1.250e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 13:27:26,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=756800.0, ans=0.125 2023-11-19 13:27:28,194 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=756866.6666666666, ans=0.125 2023-11-19 13:27:43,485 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=756933.3333333334, ans=0.07 2023-11-19 13:27:59,846 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=757000.0, ans=0.125 2023-11-19 13:28:02,882 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5350, loss[loss=0.06598, simple_loss=0.0845, pruned_loss=0.01307, audio_tagging_loss=0.01066, over 14839.00 frames. ], tot_loss[loss=0.08622, simple_loss=0.1059, pruned_loss=0.02308, audio_tagging_loss=0.01019, over 3037829.04 frames. ], batch size: 54, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:28:05,173 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=757066.6666666666, ans=0.125 2023-11-19 13:28:09,367 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=757066.6666666666, ans=0.2 2023-11-19 13:28:10,598 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:28:12,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=757133.3333333334, ans=0.2 2023-11-19 13:28:18,332 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=757133.3333333334, ans=0.0 2023-11-19 13:28:54,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=757333.3333333334, ans=0.0 2023-11-19 13:28:57,745 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5400, loss[loss=0.07852, simple_loss=0.09501, pruned_loss=0.01951, audio_tagging_loss=0.01151, over 16266.00 frames. ], tot_loss[loss=0.08635, simple_loss=0.106, pruned_loss=0.02307, audio_tagging_loss=0.01026, over 3040267.17 frames. ], batch size: 61, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:29:01,781 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=757400.0, ans=0.125 2023-11-19 13:29:03,365 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=757400.0, ans=0.125 2023-11-19 13:29:12,212 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=757466.6666666666, ans=0.125 2023-11-19 13:29:14,080 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.379e+01 8.876e+01 9.570e+01 1.112e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-19 13:29:19,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=757533.3333333334, ans=0.125 2023-11-19 13:29:20,757 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=757533.3333333334, ans=0.2 2023-11-19 13:29:23,516 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=757533.3333333334, ans=0.0 2023-11-19 13:29:36,190 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757600.0, ans=0.1 2023-11-19 13:29:44,154 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=757666.6666666666, ans=0.2 2023-11-19 13:29:50,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=757666.6666666666, ans=0.1 2023-11-19 13:29:52,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=757666.6666666666, ans=0.125 2023-11-19 13:29:54,349 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5450, loss[loss=0.09462, simple_loss=0.1174, pruned_loss=0.0228, audio_tagging_loss=0.01312, over 14686.00 frames. ], tot_loss[loss=0.08697, simple_loss=0.1065, pruned_loss=0.02334, audio_tagging_loss=0.01035, over 3032299.12 frames. ], batch size: 54, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:29:55,865 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2023-11-19 13:29:56,707 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=757733.3333333334, ans=0.2 2023-11-19 13:30:06,865 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=757800.0, ans=0.0 2023-11-19 13:30:30,008 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=757933.3333333334, ans=0.0 2023-11-19 13:30:33,648 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=757933.3333333334, ans=0.0 2023-11-19 13:30:45,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=758000.0, ans=0.04949747468305833 2023-11-19 13:30:45,245 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=758000.0, ans=0.2 2023-11-19 13:30:49,738 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5500, loss[loss=0.07071, simple_loss=0.08658, pruned_loss=0.01567, audio_tagging_loss=0.01175, over 14478.00 frames. ], tot_loss[loss=0.08676, simple_loss=0.1065, pruned_loss=0.02324, audio_tagging_loss=0.01028, over 3040348.18 frames. ], batch size: 54, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:31:02,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=758133.3333333334, ans=0.1 2023-11-19 13:31:04,522 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.775e+01 8.323e+01 9.025e+01 9.961e+01 1.664e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 13:31:26,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=758266.6666666666, ans=0.1 2023-11-19 13:31:44,671 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5550, loss[loss=0.09682, simple_loss=0.1082, pruned_loss=0.03166, audio_tagging_loss=0.01108, over 16035.00 frames. ], tot_loss[loss=0.08733, simple_loss=0.1071, pruned_loss=0.02342, audio_tagging_loss=0.01035, over 3042940.65 frames. ], batch size: 63, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:31:46,152 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.49 vs. limit=10.0 2023-11-19 13:32:14,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=758533.3333333334, ans=0.125 2023-11-19 13:32:21,417 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.48 vs. limit=15.0 2023-11-19 13:32:40,815 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5600, loss[loss=0.07424, simple_loss=0.08848, pruned_loss=0.0208, audio_tagging_loss=0.009202, over 14635.00 frames. ], tot_loss[loss=0.08677, simple_loss=0.1062, pruned_loss=0.02313, audio_tagging_loss=0.01053, over 3044690.08 frames. ], batch size: 56, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:32:43,639 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=758733.3333333334, ans=0.5 2023-11-19 13:32:47,177 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2023-11-19 13:32:54,569 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=758800.0, ans=0.0 2023-11-19 13:32:56,429 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.309e+01 8.349e+01 9.140e+01 1.023e+02 1.369e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 13:33:14,072 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=758933.3333333334, ans=0.04949747468305833 2023-11-19 13:33:17,758 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=758933.3333333334, ans=0.1 2023-11-19 13:33:18,571 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:33:22,957 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=758933.3333333334, ans=0.02 2023-11-19 13:33:32,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=759000.0, ans=0.125 2023-11-19 13:33:36,687 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5650, loss[loss=0.06663, simple_loss=0.07403, pruned_loss=0.01807, audio_tagging_loss=0.01154, over 14624.00 frames. ], tot_loss[loss=0.08563, simple_loss=0.1046, pruned_loss=0.02272, audio_tagging_loss=0.01063, over 3049105.93 frames. ], batch size: 58, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:33:39,123 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=759066.6666666666, ans=0.125 2023-11-19 13:33:46,304 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=759133.3333333334, ans=0.09899494936611666 2023-11-19 13:34:00,016 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=759200.0, ans=0.0 2023-11-19 13:34:06,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=759200.0, ans=0.0 2023-11-19 13:34:31,777 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5700, loss[loss=0.1014, simple_loss=0.1269, pruned_loss=0.02796, audio_tagging_loss=0.009958, over 15011.00 frames. ], tot_loss[loss=0.08543, simple_loss=0.1045, pruned_loss=0.02257, audio_tagging_loss=0.01062, over 3051711.36 frames. ], batch size: 54, lr: 6.94e-03, grad_scale: 32.0 2023-11-19 13:34:36,174 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=759400.0, ans=0.125 2023-11-19 13:34:36,812 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=759400.0, ans=0.125 2023-11-19 13:34:47,535 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.640e+01 8.102e+01 8.841e+01 9.614e+01 1.155e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-19 13:34:54,601 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=759533.3333333334, ans=0.2 2023-11-19 13:34:55,699 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=759533.3333333334, ans=0.125 2023-11-19 13:35:00,432 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=759533.3333333334, ans=0.125 2023-11-19 13:35:00,572 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=759533.3333333334, ans=0.2 2023-11-19 13:35:15,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=759666.6666666666, ans=0.0 2023-11-19 13:35:20,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759666.6666666666, ans=0.1 2023-11-19 13:35:21,241 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.77 vs. limit=22.5 2023-11-19 13:35:27,571 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5750, loss[loss=0.1025, simple_loss=0.1268, pruned_loss=0.02855, audio_tagging_loss=0.01056, over 15842.00 frames. ], tot_loss[loss=0.08507, simple_loss=0.104, pruned_loss=0.02248, audio_tagging_loss=0.01057, over 3047475.96 frames. ], batch size: 61, lr: 6.94e-03, grad_scale: 32.0 2023-11-19 13:35:52,348 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.20 vs. limit=22.5 2023-11-19 13:35:54,313 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=759866.6666666666, ans=0.125 2023-11-19 13:35:55,269 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759866.6666666666, ans=0.1 2023-11-19 13:36:03,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=759933.3333333334, ans=0.125 2023-11-19 13:36:23,407 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5800, loss[loss=0.09076, simple_loss=0.1093, pruned_loss=0.02638, audio_tagging_loss=0.009747, over 15374.00 frames. ], tot_loss[loss=0.08474, simple_loss=0.1038, pruned_loss=0.02234, audio_tagging_loss=0.01049, over 3046728.90 frames. ], batch size: 56, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 13:36:27,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=760066.6666666666, ans=0.125 2023-11-19 13:36:39,823 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.949e+01 8.575e+01 9.143e+01 9.990e+01 1.422e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-19 13:36:52,760 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=760200.0, ans=0.0 2023-11-19 13:36:57,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=760266.6666666666, ans=0.125 2023-11-19 13:37:07,207 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=760333.3333333334, ans=0.0 2023-11-19 13:37:09,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=760333.3333333334, ans=0.0 2023-11-19 13:37:10,913 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=760333.3333333334, ans=0.0 2023-11-19 13:37:10,939 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=760333.3333333334, ans=0.125 2023-11-19 13:37:18,258 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=760400.0, ans=0.035 2023-11-19 13:37:19,231 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5850, loss[loss=0.09077, simple_loss=0.1173, pruned_loss=0.02407, audio_tagging_loss=0.008032, over 14411.00 frames. ], tot_loss[loss=0.08512, simple_loss=0.1042, pruned_loss=0.02262, audio_tagging_loss=0.01037, over 3054987.96 frames. ], batch size: 57, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 13:37:51,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=760533.3333333334, ans=0.125 2023-11-19 13:37:57,479 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=760600.0, ans=0.125 2023-11-19 13:38:15,433 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5900, loss[loss=0.06391, simple_loss=0.07699, pruned_loss=0.01347, audio_tagging_loss=0.01194, over 14580.00 frames. ], tot_loss[loss=0.08442, simple_loss=0.1033, pruned_loss=0.02233, audio_tagging_loss=0.01045, over 3053954.84 frames. ], batch size: 60, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 13:38:17,325 INFO [scaling.py:1022] (1/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=8.0 2023-11-19 13:38:17,914 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=760733.3333333334, ans=0.125 2023-11-19 13:38:32,424 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.639e+01 8.376e+01 9.188e+01 1.002e+02 1.553e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 13:38:35,790 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=760800.0, ans=0.2 2023-11-19 13:38:50,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=760933.3333333334, ans=0.04949747468305833 2023-11-19 13:39:01,081 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=761000.0, ans=0.125 2023-11-19 13:39:02,109 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=761000.0, ans=0.125 2023-11-19 13:39:03,189 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=761000.0, ans=0.2 2023-11-19 13:39:08,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=761000.0, ans=0.1 2023-11-19 13:39:10,372 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 5950, loss[loss=0.09644, simple_loss=0.1225, pruned_loss=0.02765, audio_tagging_loss=0.007552, over 15481.00 frames. ], tot_loss[loss=0.08441, simple_loss=0.1034, pruned_loss=0.02233, audio_tagging_loss=0.01036, over 3061868.06 frames. ], batch size: 56, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 13:39:10,630 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=761066.6666666666, ans=0.2 2023-11-19 13:39:10,695 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=761066.6666666666, ans=0.2 2023-11-19 13:39:14,399 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=761066.6666666666, ans=0.125 2023-11-19 13:39:22,888 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=761133.3333333334, ans=0.125 2023-11-19 13:39:29,246 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=761133.3333333334, ans=0.2 2023-11-19 13:39:46,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=761266.6666666666, ans=0.0 2023-11-19 13:40:06,249 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6000, loss[loss=0.08415, simple_loss=0.1049, pruned_loss=0.02205, audio_tagging_loss=0.009665, over 15737.00 frames. ], tot_loss[loss=0.08458, simple_loss=0.1039, pruned_loss=0.02239, audio_tagging_loss=0.01024, over 3065177.70 frames. ], batch size: 60, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:40:06,250 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 13:40:38,520 INFO [train_asr.py:1147] (1/4) Epoch 10, validation: loss=0.06367, simple_loss=0.05534, pruned_loss=0.00639, audio_tagging_loss=0.02961, over 4681554.00 frames. 2023-11-19 13:40:38,520 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 13:40:46,263 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=761400.0, ans=0.0 2023-11-19 13:40:55,647 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.484e+01 8.242e+01 8.869e+01 9.811e+01 1.293e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-19 13:41:17,846 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:41:20,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=761600.0, ans=0.0 2023-11-19 13:41:21,556 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=15.0 2023-11-19 13:41:27,583 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=761666.6666666666, ans=0.0 2023-11-19 13:41:34,258 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6050, loss[loss=0.0883, simple_loss=0.106, pruned_loss=0.02679, audio_tagging_loss=0.008495, over 15245.00 frames. ], tot_loss[loss=0.08459, simple_loss=0.1039, pruned_loss=0.02241, audio_tagging_loss=0.01025, over 3064444.74 frames. ], batch size: 56, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:41:39,285 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=761733.3333333334, ans=0.2 2023-11-19 13:41:39,328 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=761733.3333333334, ans=0.0 2023-11-19 13:41:56,818 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=761866.6666666666, ans=0.1 2023-11-19 13:42:13,169 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=761933.3333333334, ans=0.125 2023-11-19 13:42:28,014 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762000.0, ans=0.1 2023-11-19 13:42:30,042 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6100, loss[loss=0.06647, simple_loss=0.06427, pruned_loss=0.01902, audio_tagging_loss=0.01531, over 14185.00 frames. ], tot_loss[loss=0.08489, simple_loss=0.1041, pruned_loss=0.02253, audio_tagging_loss=0.01029, over 3057134.77 frames. ], batch size: 55, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:42:46,878 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.906e+01 8.561e+01 9.384e+01 1.050e+02 1.586e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-19 13:42:49,127 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=762133.3333333334, ans=0.125 2023-11-19 13:42:57,138 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=762200.0, ans=0.0 2023-11-19 13:43:17,839 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=15.0 2023-11-19 13:43:20,453 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=762333.3333333334, ans=0.035 2023-11-19 13:43:25,153 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762400.0, ans=0.1 2023-11-19 13:43:25,976 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6150, loss[loss=0.09339, simple_loss=0.1149, pruned_loss=0.0264, audio_tagging_loss=0.009523, over 15404.00 frames. ], tot_loss[loss=0.08578, simple_loss=0.105, pruned_loss=0.02292, audio_tagging_loss=0.01034, over 3050178.65 frames. ], batch size: 58, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:43:31,446 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762400.0, ans=0.1 2023-11-19 13:43:37,829 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=762466.6666666666, ans=0.1 2023-11-19 13:43:38,127 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2023-11-19 13:43:39,442 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=762466.6666666666, ans=0.07 2023-11-19 13:43:40,354 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=762466.6666666666, ans=0.125 2023-11-19 13:43:54,430 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762533.3333333334, ans=0.1 2023-11-19 13:44:00,344 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=762600.0, ans=0.125 2023-11-19 13:44:17,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=762666.6666666666, ans=0.0 2023-11-19 13:44:20,060 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=762733.3333333334, ans=0.125 2023-11-19 13:44:21,445 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6200, loss[loss=0.08868, simple_loss=0.1031, pruned_loss=0.02396, audio_tagging_loss=0.01319, over 16050.00 frames. ], tot_loss[loss=0.0859, simple_loss=0.1052, pruned_loss=0.02288, audio_tagging_loss=0.01041, over 3046215.14 frames. ], batch size: 62, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:44:31,117 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=762800.0, ans=0.0 2023-11-19 13:44:36,369 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2023-11-19 13:44:36,971 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=762800.0, ans=0.05 2023-11-19 13:44:37,805 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.950e+01 8.474e+01 9.020e+01 9.808e+01 1.345e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 13:44:50,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=762866.6666666666, ans=0.09899494936611666 2023-11-19 13:45:02,784 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2023-11-19 13:45:10,547 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=22.5 2023-11-19 13:45:11,445 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=763000.0, ans=0.125 2023-11-19 13:45:17,018 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6250, loss[loss=0.08491, simple_loss=0.1041, pruned_loss=0.02231, audio_tagging_loss=0.01053, over 14427.00 frames. ], tot_loss[loss=0.08633, simple_loss=0.1058, pruned_loss=0.02301, audio_tagging_loss=0.01041, over 3054027.43 frames. ], batch size: 56, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:45:17,443 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.61 vs. limit=15.0 2023-11-19 13:45:19,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=763066.6666666666, ans=0.125 2023-11-19 13:45:37,086 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=763133.3333333334, ans=0.2 2023-11-19 13:45:52,679 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=763266.6666666666, ans=0.125 2023-11-19 13:46:12,449 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6300, loss[loss=0.08565, simple_loss=0.1035, pruned_loss=0.02131, audio_tagging_loss=0.01258, over 14449.00 frames. ], tot_loss[loss=0.08678, simple_loss=0.1061, pruned_loss=0.02321, audio_tagging_loss=0.01055, over 3055418.99 frames. ], batch size: 54, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:46:16,796 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.88 vs. limit=15.0 2023-11-19 13:46:19,650 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=763400.0, ans=0.1 2023-11-19 13:46:29,494 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.022e+01 8.302e+01 8.988e+01 1.019e+02 1.261e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 13:47:02,208 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=763666.6666666666, ans=0.125 2023-11-19 13:47:06,509 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:47:08,310 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6350, loss[loss=0.09424, simple_loss=0.11, pruned_loss=0.02909, audio_tagging_loss=0.01013, over 14898.00 frames. ], tot_loss[loss=0.08662, simple_loss=0.1061, pruned_loss=0.02298, audio_tagging_loss=0.0106, over 3052064.09 frames. ], batch size: 56, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 13:47:12,515 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2023-11-19 13:47:32,593 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.79 vs. limit=15.0 2023-11-19 13:47:46,746 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=763933.3333333334, ans=0.0 2023-11-19 13:47:52,309 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.68 vs. limit=15.0 2023-11-19 13:47:56,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=764000.0, ans=0.125 2023-11-19 13:47:57,933 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=764000.0, ans=0.1 2023-11-19 13:48:03,970 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6400, loss[loss=0.09103, simple_loss=0.1083, pruned_loss=0.02802, audio_tagging_loss=0.008859, over 14869.00 frames. ], tot_loss[loss=0.08559, simple_loss=0.1045, pruned_loss=0.02262, audio_tagging_loss=0.01071, over 3048603.18 frames. ], batch size: 56, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 13:48:15,727 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=764133.3333333334, ans=0.0 2023-11-19 13:48:17,891 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=764133.3333333334, ans=0.125 2023-11-19 13:48:22,448 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 7.945e+01 8.669e+01 9.496e+01 1.172e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-19 13:48:40,023 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=764266.6666666666, ans=0.125 2023-11-19 13:48:44,788 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2023-11-19 13:48:51,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=764333.3333333334, ans=0.125 2023-11-19 13:48:54,019 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=15.0 2023-11-19 13:48:59,539 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6450, loss[loss=0.07773, simple_loss=0.1009, pruned_loss=0.01727, audio_tagging_loss=0.009991, over 14559.00 frames. ], tot_loss[loss=0.08631, simple_loss=0.1053, pruned_loss=0.02294, audio_tagging_loss=0.01071, over 3049209.40 frames. ], batch size: 55, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 13:49:05,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=764400.0, ans=0.2 2023-11-19 13:49:30,214 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=764533.3333333334, ans=0.2 2023-11-19 13:49:40,782 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=764600.0, ans=0.125 2023-11-19 13:49:54,754 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6500, loss[loss=0.08154, simple_loss=0.1022, pruned_loss=0.02174, audio_tagging_loss=0.008699, over 13758.00 frames. ], tot_loss[loss=0.08582, simple_loss=0.1051, pruned_loss=0.02264, audio_tagging_loss=0.01065, over 3045208.09 frames. ], batch size: 53, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 13:49:56,231 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=15.0 2023-11-19 13:50:06,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=764800.0, ans=0.2 2023-11-19 13:50:13,156 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.395e+01 9.151e+01 9.992e+01 1.336e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 13:50:50,144 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6550, loss[loss=0.1018, simple_loss=0.1325, pruned_loss=0.02866, audio_tagging_loss=0.006867, over 16992.00 frames. ], tot_loss[loss=0.08619, simple_loss=0.1059, pruned_loss=0.02279, audio_tagging_loss=0.01047, over 3048719.96 frames. ], batch size: 61, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 13:51:41,504 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=765333.3333333334, ans=0.125 2023-11-19 13:51:45,104 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6600, loss[loss=0.06848, simple_loss=0.07823, pruned_loss=0.01634, audio_tagging_loss=0.01303, over 16600.00 frames. ], tot_loss[loss=0.0858, simple_loss=0.1051, pruned_loss=0.02282, audio_tagging_loss=0.01042, over 3049469.49 frames. ], batch size: 62, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 13:51:50,085 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=765400.0, ans=0.0 2023-11-19 13:51:54,257 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=765400.0, ans=0.125 2023-11-19 13:52:02,561 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=765466.6666666666, ans=0.125 2023-11-19 13:52:03,381 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 7.948e+01 8.762e+01 9.685e+01 1.504e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-19 13:52:18,965 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=765600.0, ans=0.125 2023-11-19 13:52:26,110 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2023-11-19 13:52:35,357 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=765666.6666666666, ans=0.125 2023-11-19 13:52:40,469 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6650, loss[loss=0.09876, simple_loss=0.1182, pruned_loss=0.03114, audio_tagging_loss=0.008537, over 16291.00 frames. ], tot_loss[loss=0.08566, simple_loss=0.1046, pruned_loss=0.02296, audio_tagging_loss=0.01039, over 3045021.01 frames. ], batch size: 61, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 13:52:46,157 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-19 13:52:47,560 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=765733.3333333334, ans=0.1 2023-11-19 13:53:09,087 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=765866.6666666666, ans=0.1 2023-11-19 13:53:11,289 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=765866.6666666666, ans=0.2 2023-11-19 13:53:29,813 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=766000.0, ans=0.125 2023-11-19 13:53:33,028 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=766000.0, ans=0.1 2023-11-19 13:53:35,928 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6700, loss[loss=0.07902, simple_loss=0.09753, pruned_loss=0.02043, audio_tagging_loss=0.009825, over 13939.00 frames. ], tot_loss[loss=0.08513, simple_loss=0.1041, pruned_loss=0.02276, audio_tagging_loss=0.01032, over 3043363.63 frames. ], batch size: 54, lr: 6.91e-03, grad_scale: 16.0 2023-11-19 13:53:41,530 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=766066.6666666666, ans=0.125 2023-11-19 13:53:51,427 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=766133.3333333334, ans=0.125 2023-11-19 13:53:55,023 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.241e+01 9.083e+01 1.005e+02 1.420e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 13:54:11,043 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.51 vs. limit=10.0 2023-11-19 13:54:13,259 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=766266.6666666666, ans=0.0 2023-11-19 13:54:31,096 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6750, loss[loss=0.05506, simple_loss=0.06036, pruned_loss=0.012, audio_tagging_loss=0.01289, over 14421.00 frames. ], tot_loss[loss=0.08462, simple_loss=0.1035, pruned_loss=0.02257, audio_tagging_loss=0.01031, over 3039147.33 frames. ], batch size: 59, lr: 6.91e-03, grad_scale: 16.0 2023-11-19 13:54:44,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=766466.6666666666, ans=0.125 2023-11-19 13:55:20,168 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=766666.6666666666, ans=0.0 2023-11-19 13:55:28,010 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6800, loss[loss=0.08662, simple_loss=0.1023, pruned_loss=0.02294, audio_tagging_loss=0.01252, over 16809.00 frames. ], tot_loss[loss=0.08616, simple_loss=0.1057, pruned_loss=0.02319, audio_tagging_loss=0.01014, over 3041082.25 frames. ], batch size: 64, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 13:55:28,193 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=766733.3333333334, ans=0.125 2023-11-19 13:55:43,709 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=22.5 2023-11-19 13:55:46,472 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.136e+01 8.286e+01 8.985e+01 9.839e+01 1.456e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-19 13:56:03,736 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=766933.3333333334, ans=0.0 2023-11-19 13:56:11,730 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=767000.0, ans=0.0 2023-11-19 13:56:23,561 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6850, loss[loss=0.06476, simple_loss=0.07814, pruned_loss=0.01561, audio_tagging_loss=0.01008, over 14494.00 frames. ], tot_loss[loss=0.08552, simple_loss=0.1048, pruned_loss=0.02291, audio_tagging_loss=0.01019, over 3036683.32 frames. ], batch size: 55, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 13:56:26,991 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=767066.6666666666, ans=0.125 2023-11-19 13:56:43,579 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.74 vs. limit=10.0 2023-11-19 13:56:47,094 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=767200.0, ans=0.125 2023-11-19 13:56:56,092 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=767266.6666666666, ans=0.125 2023-11-19 13:56:58,303 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=767266.6666666666, ans=0.125 2023-11-19 13:57:18,754 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6900, loss[loss=0.09534, simple_loss=0.1117, pruned_loss=0.02695, audio_tagging_loss=0.01255, over 14539.00 frames. ], tot_loss[loss=0.08504, simple_loss=0.1042, pruned_loss=0.02269, audio_tagging_loss=0.01025, over 3044953.70 frames. ], batch size: 54, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 13:57:35,002 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2023-11-19 13:57:37,634 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.298e+01 8.115e+01 8.697e+01 9.342e+01 1.240e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-19 13:57:40,536 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=767533.3333333334, ans=0.2 2023-11-19 13:58:00,771 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:58:05,709 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=767666.6666666666, ans=0.05 2023-11-19 13:58:14,592 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 6950, loss[loss=0.0722, simple_loss=0.08739, pruned_loss=0.01802, audio_tagging_loss=0.01048, over 15492.00 frames. ], tot_loss[loss=0.08567, simple_loss=0.1053, pruned_loss=0.02288, audio_tagging_loss=0.01015, over 3052305.39 frames. ], batch size: 59, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 13:58:31,102 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.77 vs. limit=22.5 2023-11-19 13:58:42,568 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2023-11-19 13:58:44,372 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=767866.6666666666, ans=0.125 2023-11-19 13:58:49,353 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.43 vs. limit=22.5 2023-11-19 13:59:07,333 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=768000.0, ans=0.125 2023-11-19 13:59:10,789 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7000, loss[loss=0.08151, simple_loss=0.1023, pruned_loss=0.02034, audio_tagging_loss=0.01001, over 15044.00 frames. ], tot_loss[loss=0.08558, simple_loss=0.1052, pruned_loss=0.02279, audio_tagging_loss=0.01021, over 3045874.23 frames. ], batch size: 56, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 13:59:22,416 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=768133.3333333334, ans=0.125 2023-11-19 13:59:29,027 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.706e+01 8.254e+01 9.022e+01 1.015e+02 1.308e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 13:59:43,668 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=768266.6666666666, ans=0.125 2023-11-19 14:00:04,865 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7050, loss[loss=0.08541, simple_loss=0.104, pruned_loss=0.02263, audio_tagging_loss=0.01076, over 16064.00 frames. ], tot_loss[loss=0.08532, simple_loss=0.1049, pruned_loss=0.02259, audio_tagging_loss=0.01031, over 3046785.02 frames. ], batch size: 60, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:00:29,517 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2023-11-19 14:00:31,869 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=768533.3333333334, ans=0.05 2023-11-19 14:00:50,510 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=768666.6666666666, ans=0.05 2023-11-19 14:01:00,281 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7100, loss[loss=0.08595, simple_loss=0.1131, pruned_loss=0.0189, audio_tagging_loss=0.01052, over 14297.00 frames. ], tot_loss[loss=0.08551, simple_loss=0.1052, pruned_loss=0.02268, audio_tagging_loss=0.01024, over 3050402.58 frames. ], batch size: 56, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:01:06,740 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=768733.3333333334, ans=0.1 2023-11-19 14:01:19,245 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.436e+01 9.096e+01 9.960e+01 1.200e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-19 14:01:21,544 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=768866.6666666666, ans=0.0 2023-11-19 14:01:27,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=768866.6666666666, ans=0.125 2023-11-19 14:01:41,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=768933.3333333334, ans=0.125 2023-11-19 14:01:56,307 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7150, loss[loss=0.08957, simple_loss=0.112, pruned_loss=0.02023, audio_tagging_loss=0.01333, over 15900.00 frames. ], tot_loss[loss=0.08612, simple_loss=0.1058, pruned_loss=0.02297, audio_tagging_loss=0.01024, over 3050265.67 frames. ], batch size: 59, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:02:22,401 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=769200.0, ans=0.2 2023-11-19 14:02:25,808 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.55 vs. limit=10.0 2023-11-19 14:02:28,704 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:02:31,216 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.12 vs. limit=15.0 2023-11-19 14:02:44,278 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.84 vs. limit=22.5 2023-11-19 14:02:52,146 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7200, loss[loss=0.07125, simple_loss=0.08413, pruned_loss=0.01787, audio_tagging_loss=0.01131, over 15374.00 frames. ], tot_loss[loss=0.08593, simple_loss=0.1055, pruned_loss=0.02279, audio_tagging_loss=0.0104, over 3047777.71 frames. ], batch size: 57, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:02:54,428 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=769400.0, ans=0.1 2023-11-19 14:02:58,255 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=769400.0, ans=0.125 2023-11-19 14:03:00,369 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=769400.0, ans=0.125 2023-11-19 14:03:11,329 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.379e+01 9.176e+01 1.015e+02 1.604e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 14:03:21,708 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:03:43,915 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=769666.6666666666, ans=0.125 2023-11-19 14:03:46,913 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.79 vs. limit=15.0 2023-11-19 14:03:48,454 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7250, loss[loss=0.08573, simple_loss=0.1059, pruned_loss=0.02015, audio_tagging_loss=0.01265, over 14896.00 frames. ], tot_loss[loss=0.08602, simple_loss=0.1053, pruned_loss=0.0229, audio_tagging_loss=0.01047, over 3046380.82 frames. ], batch size: 55, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:04:05,258 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2023-11-19 14:04:19,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=769866.6666666666, ans=0.0 2023-11-19 14:04:19,460 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=769866.6666666666, ans=0.125 2023-11-19 14:04:41,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=770000.0, ans=0.1 2023-11-19 14:04:43,484 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7300, loss[loss=0.08727, simple_loss=0.1044, pruned_loss=0.02532, audio_tagging_loss=0.009741, over 14572.00 frames. ], tot_loss[loss=0.08529, simple_loss=0.1046, pruned_loss=0.02261, audio_tagging_loss=0.01036, over 3040834.57 frames. ], batch size: 56, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:04:49,425 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=770066.6666666666, ans=0.0 2023-11-19 14:04:59,570 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=770133.3333333334, ans=0.125 2023-11-19 14:04:59,574 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=770133.3333333334, ans=0.125 2023-11-19 14:05:02,641 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.433e+01 8.339e+01 9.048e+01 1.014e+02 1.411e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 14:05:11,771 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=12.0 2023-11-19 14:05:13,627 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=770200.0, ans=0.0 2023-11-19 14:05:22,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=770266.6666666666, ans=0.07 2023-11-19 14:05:33,152 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=770333.3333333334, ans=0.125 2023-11-19 14:05:35,241 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=770333.3333333334, ans=0.5 2023-11-19 14:05:36,885 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.92 vs. limit=15.0 2023-11-19 14:05:39,783 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7350, loss[loss=0.05544, simple_loss=0.0536, pruned_loss=0.01022, audio_tagging_loss=0.01842, over 16105.00 frames. ], tot_loss[loss=0.08457, simple_loss=0.1037, pruned_loss=0.02246, audio_tagging_loss=0.01026, over 3046875.42 frames. ], batch size: 64, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:05:58,114 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=770466.6666666666, ans=0.125 2023-11-19 14:06:15,641 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=770600.0, ans=0.125 2023-11-19 14:06:36,273 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7400, loss[loss=0.08163, simple_loss=0.1042, pruned_loss=0.02104, audio_tagging_loss=0.008484, over 15565.00 frames. ], tot_loss[loss=0.08471, simple_loss=0.1042, pruned_loss=0.02247, audio_tagging_loss=0.01016, over 3042302.64 frames. ], batch size: 59, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:06:48,035 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=770800.0, ans=0.0 2023-11-19 14:06:54,605 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.661e+01 9.537e+01 1.042e+02 1.641e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-19 14:06:59,450 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=12.0 2023-11-19 14:07:03,402 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=770866.6666666666, ans=0.125 2023-11-19 14:07:11,013 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.91 vs. limit=22.5 2023-11-19 14:07:31,159 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7450, loss[loss=0.08413, simple_loss=0.1131, pruned_loss=0.01866, audio_tagging_loss=0.008923, over 15689.00 frames. ], tot_loss[loss=0.08495, simple_loss=0.1045, pruned_loss=0.02259, audio_tagging_loss=0.01012, over 3039433.34 frames. ], batch size: 57, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:07:41,556 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=771133.3333333334, ans=0.125 2023-11-19 14:07:42,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=771133.3333333334, ans=0.0 2023-11-19 14:07:46,762 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=771133.3333333334, ans=0.0 2023-11-19 14:07:59,930 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=771200.0, ans=0.0 2023-11-19 14:08:12,602 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=771266.6666666666, ans=0.125 2023-11-19 14:08:19,522 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=771333.3333333334, ans=0.0 2023-11-19 14:08:26,734 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7500, loss[loss=0.08522, simple_loss=0.1145, pruned_loss=0.02116, audio_tagging_loss=0.006817, over 14689.00 frames. ], tot_loss[loss=0.0847, simple_loss=0.1043, pruned_loss=0.0224, audio_tagging_loss=0.01017, over 3045366.07 frames. ], batch size: 54, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:08:29,105 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=771400.0, ans=0.125 2023-11-19 14:08:30,081 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:08:46,250 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.223e+01 8.899e+01 9.673e+01 1.181e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 14:08:55,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=771533.3333333334, ans=0.0 2023-11-19 14:08:57,029 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=771533.3333333334, ans=0.0 2023-11-19 14:09:12,281 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=771666.6666666666, ans=0.125 2023-11-19 14:09:13,872 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-19 14:09:23,261 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7550, loss[loss=0.09331, simple_loss=0.114, pruned_loss=0.02436, audio_tagging_loss=0.01195, over 15132.00 frames. ], tot_loss[loss=0.08502, simple_loss=0.1047, pruned_loss=0.02247, audio_tagging_loss=0.01019, over 3047298.43 frames. ], batch size: 55, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:09:30,191 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=15.0 2023-11-19 14:09:30,932 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=771733.3333333334, ans=0.09899494936611666 2023-11-19 14:09:32,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=771800.0, ans=0.125 2023-11-19 14:09:36,250 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=771800.0, ans=0.125 2023-11-19 14:10:13,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=772000.0, ans=0.125 2023-11-19 14:10:14,853 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=772000.0, ans=0.1 2023-11-19 14:10:17,843 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7600, loss[loss=0.07732, simple_loss=0.08875, pruned_loss=0.01937, audio_tagging_loss=0.01358, over 15004.00 frames. ], tot_loss[loss=0.0846, simple_loss=0.1042, pruned_loss=0.0223, audio_tagging_loss=0.01022, over 3045001.90 frames. ], batch size: 56, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:10:26,188 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2023-11-19 14:10:36,186 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.972e+01 8.387e+01 9.154e+01 1.016e+02 1.447e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 14:10:37,420 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=772133.3333333334, ans=0.125 2023-11-19 14:11:03,540 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=772333.3333333334, ans=0.1 2023-11-19 14:11:13,460 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7650, loss[loss=0.059, simple_loss=0.0812, pruned_loss=0.01089, audio_tagging_loss=0.007508, over 15862.00 frames. ], tot_loss[loss=0.08432, simple_loss=0.1037, pruned_loss=0.02221, audio_tagging_loss=0.01024, over 3047894.19 frames. ], batch size: 63, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 14:11:18,246 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.01 vs. limit=15.0 2023-11-19 14:11:30,577 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-19 14:11:41,033 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-19 14:12:02,321 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=772666.6666666666, ans=0.0 2023-11-19 14:12:04,380 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=772666.6666666666, ans=0.0 2023-11-19 14:12:08,892 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7700, loss[loss=0.09372, simple_loss=0.113, pruned_loss=0.02511, audio_tagging_loss=0.01209, over 15603.00 frames. ], tot_loss[loss=0.08498, simple_loss=0.1042, pruned_loss=0.02255, audio_tagging_loss=0.0103, over 3049713.21 frames. ], batch size: 61, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 14:12:24,842 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=772800.0, ans=0.0 2023-11-19 14:12:27,993 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=772800.0, ans=0.0 2023-11-19 14:12:29,297 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.117e+01 8.040e+01 8.605e+01 9.398e+01 1.279e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-19 14:12:56,067 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=773000.0, ans=0.0 2023-11-19 14:13:03,478 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=773066.6666666666, ans=0.0 2023-11-19 14:13:04,316 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7750, loss[loss=0.09867, simple_loss=0.1257, pruned_loss=0.02498, audio_tagging_loss=0.01082, over 14909.00 frames. ], tot_loss[loss=0.08553, simple_loss=0.1051, pruned_loss=0.02271, audio_tagging_loss=0.01029, over 3041225.22 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 14:13:06,662 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=773066.6666666666, ans=0.1 2023-11-19 14:13:15,991 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.61 vs. limit=10.0 2023-11-19 14:13:18,936 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=773133.3333333334, ans=0.125 2023-11-19 14:13:39,797 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2023-11-19 14:13:52,159 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=773333.3333333334, ans=0.0 2023-11-19 14:14:01,957 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7800, loss[loss=0.0698, simple_loss=0.07973, pruned_loss=0.01665, audio_tagging_loss=0.01329, over 14350.00 frames. ], tot_loss[loss=0.08586, simple_loss=0.1053, pruned_loss=0.02273, audio_tagging_loss=0.01046, over 3043765.30 frames. ], batch size: 55, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 14:14:05,439 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=773400.0, ans=0.125 2023-11-19 14:14:10,664 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=773400.0, ans=0.0 2023-11-19 14:14:23,107 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.774e+01 8.739e+01 9.425e+01 1.048e+02 2.167e+02, threshold=1.885e+02, percent-clipped=1.0 2023-11-19 14:14:23,362 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773533.3333333334, ans=0.1 2023-11-19 14:14:40,090 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.16 vs. limit=15.0 2023-11-19 14:14:57,131 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7850, loss[loss=0.07856, simple_loss=0.104, pruned_loss=0.01785, audio_tagging_loss=0.00873, over 14377.00 frames. ], tot_loss[loss=0.08527, simple_loss=0.1044, pruned_loss=0.0226, audio_tagging_loss=0.01045, over 3036035.98 frames. ], batch size: 55, lr: 6.88e-03, grad_scale: 8.0 2023-11-19 14:15:00,466 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=773733.3333333334, ans=0.125 2023-11-19 14:15:02,024 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=773733.3333333334, ans=0.0 2023-11-19 14:15:06,822 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=773733.3333333334, ans=0.0 2023-11-19 14:15:30,557 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=773933.3333333334, ans=0.2 2023-11-19 14:15:53,098 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7900, loss[loss=0.07671, simple_loss=0.09214, pruned_loss=0.0158, audio_tagging_loss=0.01484, over 16260.00 frames. ], tot_loss[loss=0.08555, simple_loss=0.1046, pruned_loss=0.02277, audio_tagging_loss=0.01049, over 3038983.19 frames. ], batch size: 62, lr: 6.88e-03, grad_scale: 8.0 2023-11-19 14:16:00,725 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=774066.6666666666, ans=0.0 2023-11-19 14:16:02,236 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=774066.6666666666, ans=0.0 2023-11-19 14:16:13,450 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.211e+01 8.938e+01 9.745e+01 1.285e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-19 14:16:26,234 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=774266.6666666666, ans=0.125 2023-11-19 14:16:28,382 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=774266.6666666666, ans=0.125 2023-11-19 14:16:37,798 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2023-11-19 14:16:48,192 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 7950, loss[loss=0.07785, simple_loss=0.09535, pruned_loss=0.01898, audio_tagging_loss=0.0112, over 16783.00 frames. ], tot_loss[loss=0.08549, simple_loss=0.1044, pruned_loss=0.02271, audio_tagging_loss=0.0106, over 3043608.49 frames. ], batch size: 63, lr: 6.88e-03, grad_scale: 8.0 2023-11-19 14:16:49,590 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=774400.0, ans=0.125 2023-11-19 14:16:51,616 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774400.0, ans=0.1 2023-11-19 14:16:52,764 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=774400.0, ans=0.125 2023-11-19 14:16:59,890 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 14:17:01,140 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=774466.6666666666, ans=0.0 2023-11-19 14:17:04,906 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=774466.6666666666, ans=0.125 2023-11-19 14:17:05,967 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=774466.6666666666, ans=0.1 2023-11-19 14:17:07,064 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=774466.6666666666, ans=0.1 2023-11-19 14:17:09,644 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=774533.3333333334, ans=0.125 2023-11-19 14:17:16,565 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=774533.3333333334, ans=0.0 2023-11-19 14:17:33,356 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=774666.6666666666, ans=0.0 2023-11-19 14:17:43,696 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8000, loss[loss=0.09606, simple_loss=0.1209, pruned_loss=0.02631, audio_tagging_loss=0.009286, over 15330.00 frames. ], tot_loss[loss=0.08468, simple_loss=0.1031, pruned_loss=0.02239, audio_tagging_loss=0.01072, over 3048459.91 frames. ], batch size: 58, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 14:17:46,652 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=774733.3333333334, ans=0.0 2023-11-19 14:17:53,450 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=774733.3333333334, ans=0.1 2023-11-19 14:17:56,965 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.36 vs. limit=10.0 2023-11-19 14:18:05,287 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.487e+01 8.348e+01 9.114e+01 9.901e+01 1.524e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-19 14:18:15,603 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=774866.6666666666, ans=0.125 2023-11-19 14:18:33,779 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=775000.0, ans=0.04949747468305833 2023-11-19 14:18:39,864 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8050, loss[loss=0.07258, simple_loss=0.07857, pruned_loss=0.01975, audio_tagging_loss=0.01355, over 14738.00 frames. ], tot_loss[loss=0.08493, simple_loss=0.1033, pruned_loss=0.02244, audio_tagging_loss=0.01085, over 3041527.68 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 16.0 2023-11-19 14:18:40,034 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=775066.6666666666, ans=0.125 2023-11-19 14:19:16,425 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2023-11-19 14:19:23,409 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=775333.3333333334, ans=0.2 2023-11-19 14:19:25,185 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775333.3333333334, ans=0.1 2023-11-19 14:19:36,033 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8100, loss[loss=0.08264, simple_loss=0.1057, pruned_loss=0.02104, audio_tagging_loss=0.008744, over 14910.00 frames. ], tot_loss[loss=0.08444, simple_loss=0.1028, pruned_loss=0.02229, audio_tagging_loss=0.01075, over 3047415.36 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 16.0 2023-11-19 14:19:40,306 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=775400.0, ans=0.125 2023-11-19 14:19:44,580 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=775400.0, ans=0.125 2023-11-19 14:19:51,919 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=775466.6666666666, ans=0.125 2023-11-19 14:19:56,602 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.298e+01 8.945e+01 9.680e+01 1.266e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-19 14:20:27,150 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=775666.6666666666, ans=0.125 2023-11-19 14:20:31,235 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8150, loss[loss=0.111, simple_loss=0.1378, pruned_loss=0.03431, audio_tagging_loss=0.007774, over 14954.00 frames. ], tot_loss[loss=0.08372, simple_loss=0.1019, pruned_loss=0.02217, audio_tagging_loss=0.01061, over 3039886.24 frames. ], batch size: 54, lr: 6.87e-03, grad_scale: 8.0 2023-11-19 14:20:41,071 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=775733.3333333334, ans=0.07 2023-11-19 14:20:46,256 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=775800.0, ans=0.125 2023-11-19 14:20:59,179 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=775866.6666666666, ans=0.125 2023-11-19 14:21:04,456 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=775933.3333333334, ans=0.125 2023-11-19 14:21:18,925 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=776000.0, ans=0.0 2023-11-19 14:21:25,810 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=776000.0, ans=0.125 2023-11-19 14:21:26,667 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 14:21:27,677 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8200, loss[loss=0.1079, simple_loss=0.1139, pruned_loss=0.03972, audio_tagging_loss=0.01128, over 14988.00 frames. ], tot_loss[loss=0.08386, simple_loss=0.1022, pruned_loss=0.02225, audio_tagging_loss=0.01053, over 3041399.36 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 8.0 2023-11-19 14:21:49,753 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.524e+01 9.249e+01 1.061e+02 1.477e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-19 14:21:55,163 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=776200.0, ans=0.125 2023-11-19 14:22:18,412 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=776333.3333333334, ans=0.0 2023-11-19 14:22:23,481 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8250, loss[loss=0.08127, simple_loss=0.101, pruned_loss=0.02185, audio_tagging_loss=0.008927, over 14209.00 frames. ], tot_loss[loss=0.0842, simple_loss=0.103, pruned_loss=0.0224, audio_tagging_loss=0.01031, over 3039535.77 frames. ], batch size: 54, lr: 6.87e-03, grad_scale: 8.0 2023-11-19 14:22:33,147 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=776466.6666666666, ans=0.0 2023-11-19 14:22:38,813 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=15.0 2023-11-19 14:22:49,145 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=776533.3333333334, ans=0.1 2023-11-19 14:22:51,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=776533.3333333334, ans=0.125 2023-11-19 14:22:53,529 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.15 vs. limit=22.5 2023-11-19 14:23:08,013 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=776666.6666666666, ans=0.0 2023-11-19 14:23:12,177 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=776666.6666666666, ans=0.125 2023-11-19 14:23:18,348 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8300, loss[loss=0.06803, simple_loss=0.08145, pruned_loss=0.01623, audio_tagging_loss=0.01107, over 15440.00 frames. ], tot_loss[loss=0.08348, simple_loss=0.1021, pruned_loss=0.02215, audio_tagging_loss=0.01029, over 3043896.86 frames. ], batch size: 59, lr: 6.87e-03, grad_scale: 8.0 2023-11-19 14:23:30,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.10 vs. limit=22.5 2023-11-19 14:23:40,362 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.122e+01 8.080e+01 8.886e+01 9.811e+01 1.458e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-19 14:24:03,938 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=777000.0, ans=0.1 2023-11-19 14:24:08,160 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-19 14:24:10,134 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.30 vs. limit=22.5 2023-11-19 14:24:12,457 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=777000.0, ans=0.1 2023-11-19 14:24:14,327 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8350, loss[loss=0.07708, simple_loss=0.09118, pruned_loss=0.02018, audio_tagging_loss=0.01131, over 15774.00 frames. ], tot_loss[loss=0.08368, simple_loss=0.1026, pruned_loss=0.02215, audio_tagging_loss=0.01026, over 3052679.19 frames. ], batch size: 60, lr: 6.86e-03, grad_scale: 8.0 2023-11-19 14:24:14,512 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=777066.6666666666, ans=0.95 2023-11-19 14:24:31,948 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.82 vs. limit=22.5 2023-11-19 14:24:34,842 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.97 vs. limit=10.0 2023-11-19 14:24:47,722 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=777266.6666666666, ans=0.125 2023-11-19 14:24:51,261 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=777266.6666666666, ans=15.0 2023-11-19 14:25:09,774 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8400, loss[loss=0.1003, simple_loss=0.1169, pruned_loss=0.03154, audio_tagging_loss=0.01026, over 14922.00 frames. ], tot_loss[loss=0.08382, simple_loss=0.1024, pruned_loss=0.02224, audio_tagging_loss=0.01036, over 3049733.85 frames. ], batch size: 58, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 14:25:30,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=777466.6666666666, ans=0.2 2023-11-19 14:25:32,116 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.696e+01 8.445e+01 9.359e+01 1.034e+02 1.708e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 14:25:53,538 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=777666.6666666666, ans=0.0 2023-11-19 14:25:55,038 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2023-11-19 14:26:01,377 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=777666.6666666666, ans=0.0 2023-11-19 14:26:05,400 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8450, loss[loss=0.1016, simple_loss=0.1268, pruned_loss=0.02531, audio_tagging_loss=0.01294, over 14577.00 frames. ], tot_loss[loss=0.08445, simple_loss=0.1031, pruned_loss=0.02254, audio_tagging_loss=0.01034, over 3038392.90 frames. ], batch size: 54, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 14:26:26,831 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=777866.6666666666, ans=0.0 2023-11-19 14:26:27,942 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=777866.6666666666, ans=0.0 2023-11-19 14:26:30,973 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=777866.6666666666, ans=0.125 2023-11-19 14:26:50,524 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.48 vs. limit=15.0 2023-11-19 14:26:53,266 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=778000.0, ans=0.025 2023-11-19 14:26:54,251 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=778000.0, ans=0.125 2023-11-19 14:26:57,950 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=778000.0, ans=0.05 2023-11-19 14:26:58,883 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=778000.0, ans=0.125 2023-11-19 14:27:01,299 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8500, loss[loss=0.1012, simple_loss=0.125, pruned_loss=0.02952, audio_tagging_loss=0.009164, over 14741.00 frames. ], tot_loss[loss=0.08478, simple_loss=0.1036, pruned_loss=0.0226, audio_tagging_loss=0.01036, over 3034754.71 frames. ], batch size: 56, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 14:27:04,850 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2023-11-19 14:27:13,951 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=778133.3333333334, ans=0.125 2023-11-19 14:27:24,285 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.947e+01 8.482e+01 9.526e+01 1.059e+02 1.313e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-19 14:27:25,655 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=778200.0, ans=0.0 2023-11-19 14:27:29,174 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.17 vs. limit=15.0 2023-11-19 14:27:38,820 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=778266.6666666666, ans=0.125 2023-11-19 14:27:44,553 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=778333.3333333334, ans=0.125 2023-11-19 14:27:45,838 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.25 vs. limit=10.0 2023-11-19 14:27:47,738 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=778333.3333333334, ans=0.2 2023-11-19 14:27:56,066 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8550, loss[loss=0.0673, simple_loss=0.07501, pruned_loss=0.01687, audio_tagging_loss=0.01292, over 15592.00 frames. ], tot_loss[loss=0.08443, simple_loss=0.1033, pruned_loss=0.02237, audio_tagging_loss=0.0104, over 3033752.02 frames. ], batch size: 61, lr: 6.86e-03, grad_scale: 8.0 2023-11-19 14:27:59,912 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=778400.0, ans=0.1 2023-11-19 14:28:13,437 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2023-11-19 14:28:21,309 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=778533.3333333334, ans=0.125 2023-11-19 14:28:51,693 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=778733.3333333334, ans=0.125 2023-11-19 14:28:52,521 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8600, loss[loss=0.07221, simple_loss=0.08705, pruned_loss=0.01861, audio_tagging_loss=0.01008, over 14151.00 frames. ], tot_loss[loss=0.08484, simple_loss=0.1038, pruned_loss=0.02255, audio_tagging_loss=0.0104, over 3037793.09 frames. ], batch size: 58, lr: 6.86e-03, grad_scale: 8.0 2023-11-19 14:28:52,773 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=778733.3333333334, ans=0.125 2023-11-19 14:29:00,023 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:29:05,296 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.14 vs. limit=15.0 2023-11-19 14:29:14,933 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:29:15,656 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 5.831e+01 8.133e+01 8.812e+01 9.832e+01 1.513e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-19 14:29:16,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=778866.6666666666, ans=0.1 2023-11-19 14:29:17,012 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=778866.6666666666, ans=0.2 2023-11-19 14:29:24,756 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=778933.3333333334, ans=0.0 2023-11-19 14:29:27,490 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.79 vs. limit=15.0 2023-11-19 14:29:31,089 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=778933.3333333334, ans=0.125 2023-11-19 14:29:38,251 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.45 vs. limit=12.0 2023-11-19 14:29:47,747 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8650, loss[loss=0.08892, simple_loss=0.1031, pruned_loss=0.02508, audio_tagging_loss=0.01228, over 15250.00 frames. ], tot_loss[loss=0.08534, simple_loss=0.1044, pruned_loss=0.02263, audio_tagging_loss=0.01051, over 3043397.81 frames. ], batch size: 58, lr: 6.86e-03, grad_scale: 8.0 2023-11-19 14:29:59,042 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=779133.3333333334, ans=0.2 2023-11-19 14:30:00,009 INFO [scaling.py:1118] (1/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:30:16,588 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=779200.0, ans=0.0 2023-11-19 14:30:24,493 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=779266.6666666666, ans=0.0 2023-11-19 14:30:43,883 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8700, loss[loss=0.08175, simple_loss=0.09971, pruned_loss=0.02146, audio_tagging_loss=0.01043, over 16393.00 frames. ], tot_loss[loss=0.08568, simple_loss=0.1048, pruned_loss=0.02281, audio_tagging_loss=0.01048, over 3041289.57 frames. ], batch size: 63, lr: 6.85e-03, grad_scale: 8.0 2023-11-19 14:30:46,305 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=779400.0, ans=0.0 2023-11-19 14:30:59,481 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=779466.6666666666, ans=0.125 2023-11-19 14:31:04,555 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=779466.6666666666, ans=0.2 2023-11-19 14:31:07,600 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.174e+01 9.057e+01 1.021e+02 2.200e+02, threshold=1.811e+02, percent-clipped=1.0 2023-11-19 14:31:17,391 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=779600.0, ans=0.0 2023-11-19 14:31:39,901 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8750, loss[loss=0.06877, simple_loss=0.08974, pruned_loss=0.01461, audio_tagging_loss=0.009284, over 15485.00 frames. ], tot_loss[loss=0.08614, simple_loss=0.1056, pruned_loss=0.02287, audio_tagging_loss=0.01046, over 3045998.57 frames. ], batch size: 57, lr: 6.85e-03, grad_scale: 8.0 2023-11-19 14:31:54,080 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=779800.0, ans=0.2 2023-11-19 14:32:07,197 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=779866.6666666666, ans=0.0 2023-11-19 14:32:18,135 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=779933.3333333334, ans=0.125 2023-11-19 14:32:27,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=780000.0, ans=0.125 2023-11-19 14:32:36,132 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8800, loss[loss=0.1089, simple_loss=0.1407, pruned_loss=0.02975, audio_tagging_loss=0.00877, over 15716.00 frames. ], tot_loss[loss=0.08649, simple_loss=0.1057, pruned_loss=0.02295, audio_tagging_loss=0.01069, over 3049568.19 frames. ], batch size: 56, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:32:43,330 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=780066.6666666666, ans=0.125 2023-11-19 14:32:53,873 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=780133.3333333334, ans=0.0 2023-11-19 14:32:59,412 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.658e+01 9.293e+01 1.017e+02 2.957e+02, threshold=1.859e+02, percent-clipped=2.0 2023-11-19 14:33:27,573 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=780333.3333333334, ans=0.125 2023-11-19 14:33:31,585 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8850, loss[loss=0.0781, simple_loss=0.08941, pruned_loss=0.02029, audio_tagging_loss=0.0131, over 15170.00 frames. ], tot_loss[loss=0.08705, simple_loss=0.1061, pruned_loss=0.02322, audio_tagging_loss=0.01078, over 3051927.90 frames. ], batch size: 60, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:33:40,484 WARNING [train_asr.py:1319] (1/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 14:33:40,700 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=780400.0, ans=0.05 2023-11-19 14:34:19,535 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=780666.6666666666, ans=0.125 2023-11-19 14:34:20,576 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=780666.6666666666, ans=0.0 2023-11-19 14:34:20,665 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=780666.6666666666, ans=0.1 2023-11-19 14:34:26,784 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8900, loss[loss=0.07248, simple_loss=0.07833, pruned_loss=0.02067, audio_tagging_loss=0.01265, over 14168.00 frames. ], tot_loss[loss=0.08737, simple_loss=0.1071, pruned_loss=0.02337, audio_tagging_loss=0.01047, over 3053044.04 frames. ], batch size: 58, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:34:32,217 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=780733.3333333334, ans=0.125 2023-11-19 14:34:44,904 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=780800.0, ans=0.05 2023-11-19 14:34:50,416 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 6.845e+01 8.412e+01 9.119e+01 1.005e+02 1.451e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 14:35:01,956 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=780933.3333333334, ans=0.2 2023-11-19 14:35:08,661 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=12.0 2023-11-19 14:35:16,714 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=781000.0, ans=0.2 2023-11-19 14:35:22,294 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 8950, loss[loss=0.1044, simple_loss=0.1333, pruned_loss=0.03199, audio_tagging_loss=0.005758, over 15018.00 frames. ], tot_loss[loss=0.08656, simple_loss=0.1063, pruned_loss=0.02321, audio_tagging_loss=0.0102, over 3054709.52 frames. ], batch size: 55, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:35:42,606 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=781133.3333333334, ans=0.1 2023-11-19 14:35:49,880 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=781200.0, ans=0.125 2023-11-19 14:35:50,970 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=781200.0, ans=0.0 2023-11-19 14:35:51,966 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=781200.0, ans=0.125 2023-11-19 14:35:53,104 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=781200.0, ans=0.0 2023-11-19 14:35:58,668 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=12.0 2023-11-19 14:36:00,413 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=781266.6666666666, ans=0.035 2023-11-19 14:36:05,968 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=781333.3333333334, ans=0.0 2023-11-19 14:36:07,326 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=781333.3333333334, ans=0.125 2023-11-19 14:36:08,791 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=22.5 2023-11-19 14:36:13,182 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=781333.3333333334, ans=0.0 2023-11-19 14:36:14,260 INFO [scaling.py:213] (1/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=781333.3333333334, ans=0.125 2023-11-19 14:36:18,254 INFO [train_asr.py:1115] (1/4) Epoch 10, batch 9000, loss[loss=0.07883, simple_loss=0.09757, pruned_loss=0.01929, audio_tagging_loss=0.01075, over 15895.00 frames. ], tot_loss[loss=0.08604, simple_loss=0.1057, pruned_loss=0.02299, audio_tagging_loss=0.01019, over 3059323.54 frames. ], batch size: 59, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:36:18,255 INFO [train_asr.py:1138] (1/4) Computing validation loss 2023-11-19 14:36:45,420 INFO [zipformer.py:1873] (1/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6513, 3.4811, 3.6870, 3.3544], device='cuda:1') 2023-11-19 14:36:58,298 INFO [train_asr.py:1147] (1/4) Epoch 10, validation: loss=0.06535, simple_loss=0.05527, pruned_loss=0.006386, audio_tagging_loss=0.03133, over 4681554.00 frames. 2023-11-19 14:36:58,299 INFO [train_asr.py:1148] (1/4) Maximum memory allocated so far is 25225MB 2023-11-19 14:36:59,004 INFO [scaling.py:1022] (1/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2023-11-19 14:37:29,689 INFO [optim.py:476] (1/4) Clipping_scale=2.0, grad-norm quartiles 7.012e+01 8.251e+01 8.814e+01 9.719e+01 1.451e+02, threshold=1.763e+02, percent-clipped=0.0