2023-11-27 11:49:48,604 INFO [train_asr.py:1303] (2/4) Training started 2023-11-27 11:49:48,604 INFO [train_asr.py:1313] (2/4) Device: cuda:2 2023-11-27 11:49:48,606 INFO [train_asr.py:1325] (2/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': 'a9ea720f-dirty', 'icefall-git-date': 'Wed Nov 22 17:48:49 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-2-0423201334-6587bbc68d-tn554', 'IP address': '10.177.74.211'}, 'world_size': 4, 'master_port': 13490, 'tensorboard': True, 'num_epochs': 60, 'start_epoch': 39, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'stop_early': False, 'do_finetune': False, 'init_modules': None, 'freeze_modules': None, 'finetune_ckpt': None, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'use_encoder_projection': False, 'encoder_projection_dim': -1, 'freeze_encoder': False, 'freezing_encoder_layer_index': '-1', 'freeze_encoder_steps': -1, 'encoder_lr_scale': 1.0, 'beats_label': False, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-27 11:49:48,607 INFO [train_asr.py:1334] (2/4) About to create model 2023-11-27 11:49:49,325 INFO [train_asr.py:1338] (2/4) Number of model parameters: 65819362 2023-11-27 11:49:49,325 INFO [train_asr.py:1362] (2/4) Using CED labels! 2023-11-27 11:49:49,325 INFO [checkpoint.py:112] (2/4) Loading checkpoint from multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-38.pt 2023-11-27 11:49:52,701 INFO [train_asr.py:1370] (2/4) Setting the lr scale of parameters in encoder and encoder_embed to 1.0 2023-11-27 11:49:55,363 INFO [train_asr.py:1379] (2/4) Using DDP 2023-11-27 11:49:55,805 INFO [train_asr.py:1402] (2/4) Loading optimizer state dict 2023-11-27 11:49:56,431 INFO [train_asr.py:1410] (2/4) Loading scheduler state dict 2023-11-27 11:49:56,440 INFO [train_asr.py:1432] (2/4) Getting audioset cuts 2023-11-27 11:49:56,440 INFO [kd_datamodule.py:784] (2/4) About to get the audioset cuts. 2023-11-27 11:49:56,444 INFO [train_asr.py:1438] (2/4) Using mux to combine Librispeech with audioset 2023-11-27 11:49:56,445 INFO [train_asr.py:1449] (2/4) CutSet(len=2748469) [underlying data type: ] 2023-11-27 11:50:05,336 INFO [kd_datamodule.py:396] (2/4) Enable MUSAN 2023-11-27 11:50:05,336 INFO [kd_datamodule.py:397] (2/4) About to get Musan cuts 2023-11-27 11:50:08,204 INFO [kd_datamodule.py:427] (2/4) Enable SpecAugment 2023-11-27 11:50:08,204 INFO [kd_datamodule.py:428] (2/4) Time warp factor: 80 2023-11-27 11:50:08,205 INFO [kd_datamodule.py:438] (2/4) Num frame mask: 10 2023-11-27 11:50:08,205 INFO [kd_datamodule.py:451] (2/4) About to create train dataset 2023-11-27 11:50:08,206 INFO [kd_datamodule.py:487] (2/4) Using SimpleCutSampler 2023-11-27 11:50:08,206 INFO [kd_datamodule.py:495] (2/4) About to create train dataloader 2023-11-27 11:50:08,208 INFO [kd_datamodule.py:802] (2/4) About to get the audioset eval cuts. 2023-11-27 11:50:08,209 INFO [train_asr.py:1513] (2/4) CutSet(len=20681) [underlying data type: ] 2023-11-27 11:50:08,263 INFO [kd_datamodule.py:529] (2/4) About to create dev dataset 2023-11-27 11:50:08,706 INFO [kd_datamodule.py:550] (2/4) About to create dev dataloader 2023-11-27 11:50:08,706 INFO [train_asr.py:1527] (2/4) Loading grad scaler state dict 2023-11-27 11:50:28,612 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 0, loss[loss=0.06886, simple_loss=0.07771, pruned_loss=0.008435, audio_tagging_loss=0.02158, over 15299.00 frames. ], tot_loss[loss=0.06886, simple_loss=0.07771, pruned_loss=0.008435, audio_tagging_loss=0.02158, over 15299.00 frames. ], batch size: 56, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:50:28,613 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 11:51:02,910 INFO [train_asr.py:1267] (2/4) Epoch 39, validation: loss=0.0578, simple_loss=0.05083, pruned_loss=0.005245, audio_tagging_loss=0.02714, over 4681554.00 frames. 2023-11-27 11:51:02,911 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 11:51:11,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3046020.0, ans=0.0 2023-11-27 11:51:24,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3046086.6666666665, ans=0.0 2023-11-27 11:51:35,323 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.62 vs. limit=12.0 2023-11-27 11:51:45,338 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.50 vs. limit=15.0 2023-11-27 11:51:55,804 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 456950 2023-11-27 11:52:01,316 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 50, loss[loss=0.1012, simple_loss=0.1353, pruned_loss=0.02151, audio_tagging_loss=0.01209, over 15310.00 frames. ], tot_loss[loss=0.0757, simple_loss=0.09225, pruned_loss=0.013, audio_tagging_loss=0.01658, over 685981.29 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:52:01,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3046353.3333333335, ans=0.125 2023-11-27 11:52:25,570 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.249e+01 9.464e+01 1.034e+02 1.107e+02 1.312e+02, threshold=2.068e+02, percent-clipped=0.0 2023-11-27 11:52:39,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3046553.3333333335, ans=0.1 2023-11-27 11:52:47,072 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=12.0 2023-11-27 11:52:48,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3046620.0, ans=0.1 2023-11-27 11:52:54,176 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457000 2023-11-27 11:53:00,047 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 100, loss[loss=0.07443, simple_loss=0.102, pruned_loss=0.009596, audio_tagging_loss=0.01385, over 16062.00 frames. ], tot_loss[loss=0.07368, simple_loss=0.09006, pruned_loss=0.01239, audio_tagging_loss=0.01625, over 1208058.76 frames. ], batch size: 61, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:53:23,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3046820.0, ans=0.2 2023-11-27 11:53:32,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3046886.6666666665, ans=0.0 2023-11-27 11:53:34,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3046886.6666666665, ans=0.2 2023-11-27 11:53:51,124 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457050 2023-11-27 11:53:56,568 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 150, loss[loss=0.07248, simple_loss=0.0955, pruned_loss=0.01181, audio_tagging_loss=0.01292, over 14670.00 frames. ], tot_loss[loss=0.07171, simple_loss=0.08982, pruned_loss=0.01221, audio_tagging_loss=0.01459, over 1614530.96 frames. ], batch size: 54, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:54:02,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3047020.0, ans=0.0 2023-11-27 11:54:06,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3047086.6666666665, ans=0.125 2023-11-27 11:54:19,300 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.988e+01 9.589e+01 1.001e+02 1.163e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-27 11:54:47,776 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457100 2023-11-27 11:54:53,335 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 200, loss[loss=0.06765, simple_loss=0.09552, pruned_loss=0.01125, audio_tagging_loss=0.008641, over 16290.00 frames. ], tot_loss[loss=0.07071, simple_loss=0.09047, pruned_loss=0.01261, audio_tagging_loss=0.01286, over 1935536.91 frames. ], batch size: 59, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:55:02,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3047353.3333333335, ans=0.125 2023-11-27 11:55:13,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3047420.0, ans=0.0 2023-11-27 11:55:31,803 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.59 vs. limit=15.0 2023-11-27 11:55:33,872 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-27 11:55:37,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3047553.3333333335, ans=0.0 2023-11-27 11:55:45,177 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457150 2023-11-27 11:55:47,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3047620.0, ans=0.125 2023-11-27 11:55:51,229 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 250, loss[loss=0.06374, simple_loss=0.08833, pruned_loss=0.01237, audio_tagging_loss=0.00721, over 14672.00 frames. ], tot_loss[loss=0.06927, simple_loss=0.09017, pruned_loss=0.01252, audio_tagging_loss=0.01167, over 2180284.07 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 32.0 2023-11-27 11:55:54,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3047686.6666666665, ans=0.1 2023-11-27 11:55:58,002 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 11:56:07,904 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 11:56:14,333 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.936e+01 9.538e+01 1.043e+02 1.286e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 11:56:19,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=3047820.0, ans=0.02 2023-11-27 11:56:24,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3047886.6666666665, ans=0.125 2023-11-27 11:56:28,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3047886.6666666665, ans=0.125 2023-11-27 11:56:40,458 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2023-11-27 11:56:42,228 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457200 2023-11-27 11:56:47,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3048020.0, ans=0.0 2023-11-27 11:56:48,611 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 300, loss[loss=0.04855, simple_loss=0.05985, pruned_loss=0.009173, audio_tagging_loss=0.009455, over 15012.00 frames. ], tot_loss[loss=0.0683, simple_loss=0.08983, pruned_loss=0.0126, audio_tagging_loss=0.01079, over 2374051.94 frames. ], batch size: 58, lr: 1.75e-03, grad_scale: 16.0 2023-11-27 11:57:13,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3048153.3333333335, ans=0.0 2023-11-27 11:57:22,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3048220.0, ans=0.0 2023-11-27 11:57:23,550 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.51 vs. limit=15.0 2023-11-27 11:57:27,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3048220.0, ans=0.015 2023-11-27 11:57:27,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3048220.0, ans=0.125 2023-11-27 11:57:28,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3048220.0, ans=0.1 2023-11-27 11:57:39,464 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457250 2023-11-27 11:57:44,918 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 350, loss[loss=0.05401, simple_loss=0.07104, pruned_loss=0.00749, audio_tagging_loss=0.011, over 14794.00 frames. ], tot_loss[loss=0.06791, simple_loss=0.09015, pruned_loss=0.01251, audio_tagging_loss=0.01033, over 2520952.46 frames. ], batch size: 60, lr: 1.75e-03, grad_scale: 8.0 2023-11-27 11:57:51,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3048353.3333333335, ans=0.0 2023-11-27 11:57:51,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3048353.3333333335, ans=0.0 2023-11-27 11:58:00,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3048420.0, ans=0.125 2023-11-27 11:58:10,951 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 8.638e+01 9.224e+01 9.880e+01 1.297e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-27 11:58:14,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3048486.6666666665, ans=0.07 2023-11-27 11:58:20,455 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.58 vs. limit=15.0 2023-11-27 11:58:21,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3048553.3333333335, ans=0.0 2023-11-27 11:58:23,479 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2023-11-27 11:58:36,112 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457300 2023-11-27 11:58:42,171 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 400, loss[loss=0.04516, simple_loss=0.05329, pruned_loss=0.006483, audio_tagging_loss=0.01203, over 14786.00 frames. ], tot_loss[loss=0.06859, simple_loss=0.09177, pruned_loss=0.01283, audio_tagging_loss=0.009869, over 2641419.22 frames. ], batch size: 60, lr: 1.75e-03, grad_scale: 16.0 2023-11-27 11:59:21,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3048886.6666666665, ans=0.125 2023-11-27 11:59:25,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3048886.6666666665, ans=0.09899494936611666 2023-11-27 11:59:31,166 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=22.5 2023-11-27 11:59:32,736 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457350 2023-11-27 11:59:38,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3049020.0, ans=0.125 2023-11-27 11:59:38,807 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 450, loss[loss=0.06944, simple_loss=0.09547, pruned_loss=0.01559, audio_tagging_loss=0.006112, over 15044.00 frames. ], tot_loss[loss=0.0679, simple_loss=0.09098, pruned_loss=0.01286, audio_tagging_loss=0.009543, over 2723781.80 frames. ], batch size: 55, lr: 1.75e-03, grad_scale: 16.0 2023-11-27 11:59:39,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3049020.0, ans=0.0 2023-11-27 11:59:51,593 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.76 vs. limit=15.0 2023-11-27 11:59:52,558 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2023-11-27 11:59:57,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.13 vs. limit=15.0 2023-11-27 12:00:00,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3049153.3333333335, ans=0.0 2023-11-27 12:00:02,788 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.975e+01 8.448e+01 9.046e+01 9.688e+01 1.234e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-27 12:00:05,241 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=22.5 2023-11-27 12:00:13,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3049220.0, ans=0.1 2023-11-27 12:00:29,376 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457400 2023-11-27 12:00:34,830 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2023-11-27 12:00:35,329 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 500, loss[loss=0.06975, simple_loss=0.09777, pruned_loss=0.01124, audio_tagging_loss=0.009624, over 13470.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.0902, pruned_loss=0.01277, audio_tagging_loss=0.009399, over 2800505.92 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:00:42,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3049353.3333333335, ans=0.125 2023-11-27 12:00:42,351 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=22.5 2023-11-27 12:00:51,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3049420.0, ans=0.0 2023-11-27 12:00:51,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3049420.0, ans=0.125 2023-11-27 12:01:22,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3049620.0, ans=0.2 2023-11-27 12:01:25,883 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457450 2023-11-27 12:01:32,370 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 550, loss[loss=0.05533, simple_loss=0.07206, pruned_loss=0.007568, audio_tagging_loss=0.01173, over 16123.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.08949, pruned_loss=0.01264, audio_tagging_loss=0.009373, over 2854282.35 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:01:32,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3049686.6666666665, ans=0.125 2023-11-27 12:01:40,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3049686.6666666665, ans=0.0 2023-11-27 12:01:53,166 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.87 vs. limit=6.0 2023-11-27 12:01:54,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3049820.0, ans=0.125 2023-11-27 12:01:56,927 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.005e+01 8.636e+01 9.349e+01 1.005e+02 1.321e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-27 12:02:02,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3049820.0, ans=0.2 2023-11-27 12:02:02,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3049820.0, ans=0.1 2023-11-27 12:02:15,309 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=12.0 2023-11-27 12:02:23,311 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457500 2023-11-27 12:02:26,057 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2023-11-27 12:02:28,715 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 600, loss[loss=0.05583, simple_loss=0.07059, pruned_loss=0.009912, audio_tagging_loss=0.01062, over 15105.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08861, pruned_loss=0.01241, audio_tagging_loss=0.009293, over 2900703.91 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:02:31,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3050020.0, ans=0.2 2023-11-27 12:02:38,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3050020.0, ans=0.2 2023-11-27 12:02:52,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3050153.3333333335, ans=0.125 2023-11-27 12:03:10,451 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2023-11-27 12:03:12,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3050220.0, ans=0.0 2023-11-27 12:03:18,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3050286.6666666665, ans=0.2 2023-11-27 12:03:20,417 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457550 2023-11-27 12:03:25,683 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 650, loss[loss=0.07708, simple_loss=0.09599, pruned_loss=0.01717, audio_tagging_loss=0.01192, over 15733.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.08998, pruned_loss=0.01268, audio_tagging_loss=0.009247, over 2936293.31 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:03:28,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3050353.3333333335, ans=0.2 2023-11-27 12:03:32,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3050353.3333333335, ans=0.125 2023-11-27 12:03:38,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3050420.0, ans=0.0 2023-11-27 12:03:44,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3050420.0, ans=0.1 2023-11-27 12:03:46,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3050420.0, ans=0.1 2023-11-27 12:03:50,716 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.22 vs. limit=10.0 2023-11-27 12:03:52,313 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 8.673e+01 9.304e+01 1.013e+02 1.216e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 12:03:53,970 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.26 vs. limit=12.0 2023-11-27 12:04:05,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3050553.3333333335, ans=0.0 2023-11-27 12:04:12,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3050620.0, ans=0.125 2023-11-27 12:04:17,008 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457600 2023-11-27 12:04:22,964 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 700, loss[loss=0.05863, simple_loss=0.08468, pruned_loss=0.008958, audio_tagging_loss=0.007332, over 16839.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.08975, pruned_loss=0.01254, audio_tagging_loss=0.009193, over 2967223.13 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:04:32,326 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2023-11-27 12:04:35,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3050753.3333333335, ans=0.0 2023-11-27 12:04:46,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3050820.0, ans=0.125 2023-11-27 12:04:53,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3050820.0, ans=0.0 2023-11-27 12:05:00,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3050886.6666666665, ans=0.2 2023-11-27 12:05:06,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3050886.6666666665, ans=0.125 2023-11-27 12:05:15,139 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457650 2023-11-27 12:05:20,550 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 750, loss[loss=0.06119, simple_loss=0.07082, pruned_loss=0.01345, audio_tagging_loss=0.01233, over 14600.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.08961, pruned_loss=0.01246, audio_tagging_loss=0.009192, over 2986450.03 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:05:33,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3051086.6666666665, ans=0.125 2023-11-27 12:05:34,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3051086.6666666665, ans=0.125 2023-11-27 12:05:34,903 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.76 vs. limit=6.0 2023-11-27 12:05:46,326 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 8.728e+01 9.485e+01 1.039e+02 1.301e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 12:06:01,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3051220.0, ans=0.125 2023-11-27 12:06:11,673 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457700 2023-11-27 12:06:18,042 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 800, loss[loss=0.05834, simple_loss=0.08608, pruned_loss=0.005332, audio_tagging_loss=0.009968, over 16595.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09086, pruned_loss=0.01267, audio_tagging_loss=0.009139, over 2998583.11 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:06:22,904 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.75 vs. limit=15.0 2023-11-27 12:06:25,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3051353.3333333335, ans=0.0 2023-11-27 12:06:32,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3051420.0, ans=0.09899494936611666 2023-11-27 12:06:58,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3051553.3333333335, ans=0.125 2023-11-27 12:07:03,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3051620.0, ans=0.2 2023-11-27 12:07:09,785 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457750 2023-11-27 12:07:15,110 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 850, loss[loss=0.05267, simple_loss=0.06625, pruned_loss=0.008698, audio_tagging_loss=0.01085, over 14347.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09118, pruned_loss=0.01266, audio_tagging_loss=0.009167, over 3011797.53 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:07:41,536 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.800e+01 8.486e+01 9.072e+01 9.874e+01 1.616e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-27 12:07:44,168 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.19 vs. limit=15.0 2023-11-27 12:07:46,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3051820.0, ans=0.035 2023-11-27 12:07:50,713 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2023-11-27 12:08:07,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3051953.3333333335, ans=0.125 2023-11-27 12:08:08,043 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457800 2023-11-27 12:08:14,014 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 900, loss[loss=0.07318, simple_loss=0.101, pruned_loss=0.01291, audio_tagging_loss=0.009749, over 15051.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.0903, pruned_loss=0.01258, audio_tagging_loss=0.009206, over 3017978.99 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:08:35,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3052153.3333333335, ans=10.0 2023-11-27 12:08:40,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3052153.3333333335, ans=0.125 2023-11-27 12:08:50,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3052220.0, ans=0.125 2023-11-27 12:08:58,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3052220.0, ans=0.125 2023-11-27 12:09:02,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3052286.6666666665, ans=0.05 2023-11-27 12:09:05,802 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457850 2023-11-27 12:09:11,248 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 950, loss[loss=0.0471, simple_loss=0.0617, pruned_loss=0.00695, audio_tagging_loss=0.009302, over 15517.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09043, pruned_loss=0.0126, audio_tagging_loss=0.009076, over 3018939.77 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:09:19,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3052353.3333333335, ans=0.1 2023-11-27 12:09:20,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3052353.3333333335, ans=0.0 2023-11-27 12:09:28,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3052420.0, ans=0.2 2023-11-27 12:09:38,365 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.523e+01 8.734e+01 9.231e+01 1.017e+02 1.316e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 12:09:43,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3052486.6666666665, ans=0.1 2023-11-27 12:09:47,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3052553.3333333335, ans=0.0 2023-11-27 12:09:48,757 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2023-11-27 12:09:57,289 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.08 vs. limit=22.5 2023-11-27 12:09:59,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3052620.0, ans=0.125 2023-11-27 12:10:02,992 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457900 2023-11-27 12:10:03,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3052620.0, ans=0.125 2023-11-27 12:10:05,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3052620.0, ans=0.0 2023-11-27 12:10:08,363 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1000, loss[loss=0.05842, simple_loss=0.08187, pruned_loss=0.009733, audio_tagging_loss=0.007749, over 15651.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.0896, pruned_loss=0.01252, audio_tagging_loss=0.00899, over 3021697.26 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:10:19,335 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2023-11-27 12:10:34,297 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:10:38,922 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:11:00,103 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 457950 2023-11-27 12:11:05,653 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1050, loss[loss=0.05945, simple_loss=0.07596, pruned_loss=0.01238, audio_tagging_loss=0.009091, over 14486.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08918, pruned_loss=0.01238, audio_tagging_loss=0.008918, over 3021647.50 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:11:08,277 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-11-27 12:11:12,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3053020.0, ans=0.2 2023-11-27 12:11:32,619 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.119e+01 8.482e+01 9.051e+01 9.992e+01 1.169e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-27 12:11:45,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3053220.0, ans=0.5 2023-11-27 12:11:51,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3053286.6666666665, ans=0.125 2023-11-27 12:11:54,860 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=15.0 2023-11-27 12:11:56,562 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458000 2023-11-27 12:11:57,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3053286.6666666665, ans=0.125 2023-11-27 12:12:01,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3053353.3333333335, ans=0.1 2023-11-27 12:12:02,646 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1100, loss[loss=0.06524, simple_loss=0.09356, pruned_loss=0.01271, audio_tagging_loss=0.005754, over 14938.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08905, pruned_loss=0.01259, audio_tagging_loss=0.008859, over 3023542.22 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:12:02,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3053353.3333333335, ans=0.125 2023-11-27 12:12:07,033 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:12:20,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3053420.0, ans=0.0 2023-11-27 12:12:21,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3053420.0, ans=0.0 2023-11-27 12:12:24,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3053486.6666666665, ans=0.125 2023-11-27 12:12:28,271 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2023-11-27 12:12:33,755 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2023-11-27 12:12:34,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3053486.6666666665, ans=0.125 2023-11-27 12:12:36,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3053553.3333333335, ans=0.2 2023-11-27 12:12:37,158 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=22.5 2023-11-27 12:12:51,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3053620.0, ans=0.07 2023-11-27 12:12:54,099 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458050 2023-11-27 12:12:59,556 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1150, loss[loss=0.05734, simple_loss=0.07597, pruned_loss=0.01188, audio_tagging_loss=0.007477, over 14098.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08907, pruned_loss=0.01269, audio_tagging_loss=0.008833, over 3023715.92 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 12:13:04,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3053686.6666666665, ans=0.1 2023-11-27 12:13:25,599 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.53 vs. limit=22.5 2023-11-27 12:13:27,298 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.562e+01 9.025e+01 9.989e+01 1.460e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-27 12:13:28,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3053820.0, ans=0.1 2023-11-27 12:13:37,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3053886.6666666665, ans=0.125 2023-11-27 12:13:47,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3053953.3333333335, ans=0.125 2023-11-27 12:13:49,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3053953.3333333335, ans=0.125 2023-11-27 12:13:50,832 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458100 2023-11-27 12:13:52,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3053953.3333333335, ans=0.0 2023-11-27 12:13:57,001 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1200, loss[loss=0.07376, simple_loss=0.09978, pruned_loss=0.01597, audio_tagging_loss=0.007902, over 15916.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08937, pruned_loss=0.01278, audio_tagging_loss=0.008679, over 3025769.01 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:14:01,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3054020.0, ans=0.125 2023-11-27 12:14:47,900 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458150 2023-11-27 12:14:50,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3054286.6666666665, ans=0.1 2023-11-27 12:14:53,244 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1250, loss[loss=0.06306, simple_loss=0.07964, pruned_loss=0.01258, audio_tagging_loss=0.01066, over 15299.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08853, pruned_loss=0.01254, audio_tagging_loss=0.008726, over 3027016.40 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:15:00,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3054353.3333333335, ans=0.2 2023-11-27 12:15:03,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3054420.0, ans=0.125 2023-11-27 12:15:14,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3054486.6666666665, ans=0.0 2023-11-27 12:15:17,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3054486.6666666665, ans=0.0 2023-11-27 12:15:20,999 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 8.636e+01 9.164e+01 9.922e+01 1.354e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-27 12:15:37,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3054620.0, ans=0.2 2023-11-27 12:15:44,071 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458200 2023-11-27 12:15:49,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3054686.6666666665, ans=0.035 2023-11-27 12:15:50,098 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1300, loss[loss=0.06713, simple_loss=0.08703, pruned_loss=0.0167, audio_tagging_loss=0.006916, over 14812.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08842, pruned_loss=0.01258, audio_tagging_loss=0.008754, over 3019838.64 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:16:02,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3054753.3333333335, ans=0.125 2023-11-27 12:16:24,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3054886.6666666665, ans=0.2 2023-11-27 12:16:25,735 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.68 vs. limit=15.0 2023-11-27 12:16:28,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3054886.6666666665, ans=0.125 2023-11-27 12:16:32,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3054886.6666666665, ans=0.0 2023-11-27 12:16:40,676 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458250 2023-11-27 12:16:46,971 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1350, loss[loss=0.07228, simple_loss=0.1015, pruned_loss=0.0139, audio_tagging_loss=0.007646, over 15435.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08931, pruned_loss=0.01251, audio_tagging_loss=0.008691, over 3024376.44 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:16:47,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3055020.0, ans=0.2 2023-11-27 12:16:53,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3055020.0, ans=0.0 2023-11-27 12:17:10,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3055153.3333333335, ans=0.125 2023-11-27 12:17:13,426 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.626e+01 9.162e+01 9.999e+01 1.247e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 12:17:31,786 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:17:34,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3055286.6666666665, ans=0.015 2023-11-27 12:17:38,507 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458300 2023-11-27 12:17:43,923 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1400, loss[loss=0.06953, simple_loss=0.09053, pruned_loss=0.01443, audio_tagging_loss=0.009835, over 15463.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08996, pruned_loss=0.0126, audio_tagging_loss=0.008713, over 3029591.66 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:18:23,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3055553.3333333335, ans=0.025 2023-11-27 12:18:35,145 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458350 2023-11-27 12:18:40,525 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1450, loss[loss=0.06235, simple_loss=0.08138, pruned_loss=0.01105, audio_tagging_loss=0.01061, over 15736.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09012, pruned_loss=0.01267, audio_tagging_loss=0.008764, over 3035114.47 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:18:47,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3055686.6666666665, ans=0.125 2023-11-27 12:18:55,545 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2023-11-27 12:19:03,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3055820.0, ans=0.0 2023-11-27 12:19:08,517 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.606e+01 9.355e+01 1.005e+02 1.686e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 12:19:17,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3055886.6666666665, ans=0.125 2023-11-27 12:19:25,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3055953.3333333335, ans=0.125 2023-11-27 12:19:29,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3055953.3333333335, ans=0.125 2023-11-27 12:19:31,290 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458400 2023-11-27 12:19:37,787 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1500, loss[loss=0.06366, simple_loss=0.08195, pruned_loss=0.01313, audio_tagging_loss=0.009557, over 15455.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08974, pruned_loss=0.01269, audio_tagging_loss=0.008916, over 3039811.51 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:20:05,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3056153.3333333335, ans=0.0 2023-11-27 12:20:08,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3056153.3333333335, ans=0.125 2023-11-27 12:20:30,022 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458450 2023-11-27 12:20:31,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3056286.6666666665, ans=0.0 2023-11-27 12:20:35,425 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1550, loss[loss=0.0913, simple_loss=0.1305, pruned_loss=0.0183, audio_tagging_loss=0.007754, over 16000.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09017, pruned_loss=0.01282, audio_tagging_loss=0.008955, over 3038285.61 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:20:35,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3056353.3333333335, ans=0.125 2023-11-27 12:21:01,714 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.639e+01 9.120e+01 9.883e+01 1.538e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 12:21:01,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3056486.6666666665, ans=0.0 2023-11-27 12:21:23,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3056620.0, ans=0.125 2023-11-27 12:21:24,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3056620.0, ans=0.125 2023-11-27 12:21:26,515 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458500 2023-11-27 12:21:28,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3056620.0, ans=0.2 2023-11-27 12:21:31,974 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1600, loss[loss=0.04636, simple_loss=0.06715, pruned_loss=0.004301, audio_tagging_loss=0.008482, over 13830.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09061, pruned_loss=0.01274, audio_tagging_loss=0.009003, over 3048204.76 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:21:45,766 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=12.0 2023-11-27 12:21:45,964 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.11 vs. limit=15.0 2023-11-27 12:21:53,542 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=15.0 2023-11-27 12:22:13,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3056886.6666666665, ans=0.125 2023-11-27 12:22:23,396 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458550 2023-11-27 12:22:28,722 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1650, loss[loss=0.06403, simple_loss=0.08233, pruned_loss=0.01514, audio_tagging_loss=0.007722, over 16141.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08938, pruned_loss=0.01251, audio_tagging_loss=0.009145, over 3050292.54 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:22:37,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3057020.0, ans=0.1 2023-11-27 12:22:39,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3057020.0, ans=0.2 2023-11-27 12:22:54,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3057153.3333333335, ans=0.1 2023-11-27 12:22:57,409 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 8.745e+01 9.326e+01 9.935e+01 1.288e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 12:23:21,645 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.48 vs. limit=22.5 2023-11-27 12:23:22,265 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458600 2023-11-27 12:23:29,012 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1700, loss[loss=0.05973, simple_loss=0.07966, pruned_loss=0.01288, audio_tagging_loss=0.007027, over 13873.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08926, pruned_loss=0.01266, audio_tagging_loss=0.009223, over 3050788.22 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:23:31,977 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.74 vs. limit=10.0 2023-11-27 12:23:55,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3057486.6666666665, ans=0.5 2023-11-27 12:24:03,972 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-27 12:24:16,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3057620.0, ans=0.125 2023-11-27 12:24:20,644 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458650 2023-11-27 12:24:26,125 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1750, loss[loss=0.09876, simple_loss=0.135, pruned_loss=0.02386, audio_tagging_loss=0.007387, over 15572.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.08925, pruned_loss=0.01269, audio_tagging_loss=0.009046, over 3049944.68 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:24:28,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3057686.6666666665, ans=0.125 2023-11-27 12:24:28,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3057686.6666666665, ans=0.125 2023-11-27 12:24:35,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3057686.6666666665, ans=0.125 2023-11-27 12:24:39,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3057753.3333333335, ans=0.1 2023-11-27 12:24:54,373 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.466e+01 9.121e+01 9.743e+01 1.211e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 12:24:57,166 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2023-11-27 12:25:12,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3057953.3333333335, ans=0.125 2023-11-27 12:25:17,840 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458700 2023-11-27 12:25:19,340 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=22.5 2023-11-27 12:25:23,180 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1800, loss[loss=0.08801, simple_loss=0.1208, pruned_loss=0.02003, audio_tagging_loss=0.007566, over 15291.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09009, pruned_loss=0.01291, audio_tagging_loss=0.008909, over 3044254.54 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:25:45,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3058086.6666666665, ans=0.125 2023-11-27 12:25:46,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3058153.3333333335, ans=0.1 2023-11-27 12:25:53,551 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=15.0 2023-11-27 12:26:06,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3058220.0, ans=0.125 2023-11-27 12:26:08,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3058286.6666666665, ans=0.0 2023-11-27 12:26:16,807 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458750 2023-11-27 12:26:22,256 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1850, loss[loss=0.0651, simple_loss=0.08411, pruned_loss=0.01258, audio_tagging_loss=0.01046, over 14995.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.08991, pruned_loss=0.01296, audio_tagging_loss=0.008871, over 3048494.86 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:26:32,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-27 12:26:36,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3058420.0, ans=0.125 2023-11-27 12:26:47,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3058486.6666666665, ans=0.125 2023-11-27 12:26:50,671 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.344e+01 8.879e+01 9.441e+01 1.010e+02 1.422e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-27 12:27:00,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3058553.3333333335, ans=0.125 2023-11-27 12:27:02,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3058553.3333333335, ans=0.0 2023-11-27 12:27:12,077 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.92 vs. limit=15.0 2023-11-27 12:27:13,945 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458800 2023-11-27 12:27:20,578 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1900, loss[loss=0.05666, simple_loss=0.08334, pruned_loss=0.008162, audio_tagging_loss=0.006826, over 17430.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09019, pruned_loss=0.01273, audio_tagging_loss=0.008784, over 3055991.92 frames. ], batch size: 66, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:27:20,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3058686.6666666665, ans=0.125 2023-11-27 12:27:51,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3058820.0, ans=0.0 2023-11-27 12:27:51,563 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2023-11-27 12:28:05,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3058886.6666666665, ans=0.05 2023-11-27 12:28:06,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2023-11-27 12:28:10,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.53 vs. limit=6.0 2023-11-27 12:28:12,442 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458850 2023-11-27 12:28:17,893 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 1950, loss[loss=0.08431, simple_loss=0.1168, pruned_loss=0.01656, audio_tagging_loss=0.009367, over 15346.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.0889, pruned_loss=0.01246, audio_tagging_loss=0.008767, over 3059351.67 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:28:46,476 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 8.494e+01 8.981e+01 9.864e+01 1.306e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-27 12:28:57,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3059220.0, ans=10.0 2023-11-27 12:29:11,150 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458900 2023-11-27 12:29:12,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3059286.6666666665, ans=0.0 2023-11-27 12:29:16,528 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2000, loss[loss=0.06779, simple_loss=0.08867, pruned_loss=0.01357, audio_tagging_loss=0.00988, over 15862.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08878, pruned_loss=0.0125, audio_tagging_loss=0.008771, over 3051859.81 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:29:37,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3059420.0, ans=0.0 2023-11-27 12:29:58,002 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.74 vs. limit=22.5 2023-11-27 12:30:08,134 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 458950 2023-11-27 12:30:08,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=15.0 2023-11-27 12:30:12,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3059686.6666666665, ans=0.125 2023-11-27 12:30:13,675 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2050, loss[loss=0.05841, simple_loss=0.07255, pruned_loss=0.01148, audio_tagging_loss=0.01066, over 14942.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08883, pruned_loss=0.01261, audio_tagging_loss=0.008748, over 3036624.36 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:30:22,941 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=22.5 2023-11-27 12:30:39,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3059820.0, ans=0.1 2023-11-27 12:30:41,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3059820.0, ans=0.125 2023-11-27 12:30:43,235 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.669e+01 9.305e+01 9.867e+01 1.531e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 12:31:06,128 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459000 2023-11-27 12:31:12,187 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2100, loss[loss=0.06913, simple_loss=0.09165, pruned_loss=0.01297, audio_tagging_loss=0.01034, over 14598.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08863, pruned_loss=0.01262, audio_tagging_loss=0.008809, over 3040218.90 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:31:14,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3060020.0, ans=0.1 2023-11-27 12:31:21,277 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.88 vs. limit=15.0 2023-11-27 12:31:21,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3060020.0, ans=0.125 2023-11-27 12:31:32,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3060086.6666666665, ans=0.1 2023-11-27 12:31:36,130 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2023-11-27 12:31:39,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3060153.3333333335, ans=0.125 2023-11-27 12:32:04,703 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459050 2023-11-27 12:32:10,947 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2150, loss[loss=0.07784, simple_loss=0.1065, pruned_loss=0.01556, audio_tagging_loss=0.009037, over 15589.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08875, pruned_loss=0.01268, audio_tagging_loss=0.008922, over 3044301.99 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:32:13,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3060353.3333333335, ans=0.0 2023-11-27 12:32:17,023 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.22 vs. limit=15.0 2023-11-27 12:32:25,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3060420.0, ans=0.125 2023-11-27 12:32:39,024 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.225e+01 8.486e+01 9.121e+01 9.969e+01 1.223e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 12:32:41,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3060486.6666666665, ans=0.125 2023-11-27 12:32:47,748 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:32:53,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3060553.3333333335, ans=0.05 2023-11-27 12:33:02,292 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459100 2023-11-27 12:33:07,676 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2200, loss[loss=0.05079, simple_loss=0.06375, pruned_loss=0.009385, audio_tagging_loss=0.00953, over 14736.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08863, pruned_loss=0.0125, audio_tagging_loss=0.008865, over 3044073.64 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:33:09,522 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.57 vs. limit=15.0 2023-11-27 12:33:13,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3060686.6666666665, ans=0.125 2023-11-27 12:33:18,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3060753.3333333335, ans=0.125 2023-11-27 12:33:45,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3060886.6666666665, ans=0.0 2023-11-27 12:33:59,432 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459150 2023-11-27 12:34:01,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3060953.3333333335, ans=0.125 2023-11-27 12:34:02,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3060953.3333333335, ans=0.125 2023-11-27 12:34:05,524 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2250, loss[loss=0.06282, simple_loss=0.09191, pruned_loss=0.01015, audio_tagging_loss=0.006707, over 15152.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08798, pruned_loss=0.01236, audio_tagging_loss=0.008914, over 3044276.53 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:34:06,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3061020.0, ans=0.125 2023-11-27 12:34:09,483 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2023-11-27 12:34:21,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3061086.6666666665, ans=0.125 2023-11-27 12:34:36,083 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.598e+01 9.189e+01 9.920e+01 1.225e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-27 12:34:40,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3061220.0, ans=0.125 2023-11-27 12:34:51,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3061286.6666666665, ans=0.125 2023-11-27 12:34:55,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3061286.6666666665, ans=0.0 2023-11-27 12:34:55,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3061286.6666666665, ans=0.5 2023-11-27 12:34:57,841 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459200 2023-11-27 12:34:58,286 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=12.0 2023-11-27 12:35:05,897 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2300, loss[loss=0.08066, simple_loss=0.1121, pruned_loss=0.0156, audio_tagging_loss=0.009035, over 14070.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08816, pruned_loss=0.01249, audio_tagging_loss=0.009028, over 3035918.56 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:35:08,780 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2023-11-27 12:35:19,347 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:35:23,671 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:35:31,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3061486.6666666665, ans=0.0 2023-11-27 12:35:35,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3061486.6666666665, ans=0.0 2023-11-27 12:35:38,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3061553.3333333335, ans=0.125 2023-11-27 12:35:42,067 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.88 vs. limit=22.5 2023-11-27 12:35:56,968 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.10 vs. limit=22.5 2023-11-27 12:35:57,630 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459250 2023-11-27 12:35:59,774 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:36:03,040 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2350, loss[loss=0.07275, simple_loss=0.08948, pruned_loss=0.0201, audio_tagging_loss=0.007907, over 14461.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08862, pruned_loss=0.01271, audio_tagging_loss=0.009011, over 3029670.61 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:36:03,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3061686.6666666665, ans=0.125 2023-11-27 12:36:12,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3061686.6666666665, ans=0.0 2023-11-27 12:36:32,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3061820.0, ans=0.125 2023-11-27 12:36:34,192 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.011e+01 8.777e+01 9.429e+01 1.022e+02 1.253e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 12:36:39,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3061886.6666666665, ans=0.2 2023-11-27 12:36:49,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3061953.3333333335, ans=0.1 2023-11-27 12:36:55,284 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459300 2023-11-27 12:37:00,897 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2400, loss[loss=0.08104, simple_loss=0.1126, pruned_loss=0.01716, audio_tagging_loss=0.007596, over 14934.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.08977, pruned_loss=0.01281, audio_tagging_loss=0.009086, over 3037232.21 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:37:13,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3062086.6666666665, ans=0.125 2023-11-27 12:37:27,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3062153.3333333335, ans=0.0 2023-11-27 12:37:32,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3062153.3333333335, ans=0.1 2023-11-27 12:37:36,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3062220.0, ans=0.1 2023-11-27 12:37:36,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3062220.0, ans=0.1 2023-11-27 12:37:37,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3062220.0, ans=0.1 2023-11-27 12:37:47,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3062286.6666666665, ans=0.2 2023-11-27 12:37:52,986 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459350 2023-11-27 12:37:59,591 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2450, loss[loss=0.06485, simple_loss=0.08318, pruned_loss=0.01398, audio_tagging_loss=0.009282, over 14958.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09002, pruned_loss=0.01284, audio_tagging_loss=0.009113, over 3043449.77 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:38:28,622 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.819e+01 8.321e+01 9.410e+01 9.969e+01 1.274e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 12:38:31,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3062486.6666666665, ans=0.125 2023-11-27 12:38:51,234 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459400 2023-11-27 12:38:55,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3062620.0, ans=0.125 2023-11-27 12:38:56,774 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.58 vs. limit=15.0 2023-11-27 12:38:57,350 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2500, loss[loss=0.06481, simple_loss=0.09114, pruned_loss=0.01234, audio_tagging_loss=0.006901, over 14072.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09008, pruned_loss=0.01267, audio_tagging_loss=0.009021, over 3043409.87 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:39:08,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3062753.3333333335, ans=0.125 2023-11-27 12:39:17,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3062753.3333333335, ans=0.125 2023-11-27 12:39:17,581 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.21 vs. limit=10.0 2023-11-27 12:39:19,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3062820.0, ans=0.125 2023-11-27 12:39:45,499 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2023-11-27 12:39:48,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3062953.3333333335, ans=0.125 2023-11-27 12:39:49,332 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459450 2023-11-27 12:39:49,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3062953.3333333335, ans=0.2 2023-11-27 12:39:54,751 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2550, loss[loss=0.07333, simple_loss=0.1048, pruned_loss=0.01097, audio_tagging_loss=0.009939, over 16720.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09054, pruned_loss=0.01275, audio_tagging_loss=0.008929, over 3043949.33 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:40:02,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3063020.0, ans=0.2 2023-11-27 12:40:15,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3063086.6666666665, ans=0.2 2023-11-27 12:40:22,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3063153.3333333335, ans=0.125 2023-11-27 12:40:24,363 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.94 vs. limit=10.0 2023-11-27 12:40:26,721 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.678e+01 9.247e+01 1.003e+02 1.223e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 12:40:27,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3063153.3333333335, ans=0.0 2023-11-27 12:40:35,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3063220.0, ans=0.125 2023-11-27 12:40:46,500 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459500 2023-11-27 12:40:46,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3063286.6666666665, ans=0.0 2023-11-27 12:40:51,929 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2600, loss[loss=0.07351, simple_loss=0.09353, pruned_loss=0.01846, audio_tagging_loss=0.008288, over 14625.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09026, pruned_loss=0.0128, audio_tagging_loss=0.008809, over 3036831.51 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:40:53,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3063353.3333333335, ans=0.125 2023-11-27 12:40:58,782 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.60 vs. limit=22.5 2023-11-27 12:40:59,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3063353.3333333335, ans=0.125 2023-11-27 12:40:59,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3063353.3333333335, ans=0.1 2023-11-27 12:41:20,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3063486.6666666665, ans=0.1 2023-11-27 12:41:31,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3063553.3333333335, ans=0.125 2023-11-27 12:41:38,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3063620.0, ans=0.0 2023-11-27 12:41:45,361 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459550 2023-11-27 12:41:50,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3063686.6666666665, ans=0.025 2023-11-27 12:41:50,872 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2650, loss[loss=0.08123, simple_loss=0.1165, pruned_loss=0.01536, audio_tagging_loss=0.007612, over 15504.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08993, pruned_loss=0.01266, audio_tagging_loss=0.008772, over 3034515.74 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:41:53,555 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.71 vs. limit=15.0 2023-11-27 12:41:57,874 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=12.0 2023-11-27 12:42:03,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3063753.3333333335, ans=0.125 2023-11-27 12:42:03,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3063753.3333333335, ans=0.0 2023-11-27 12:42:20,392 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.348e+01 9.301e+01 1.026e+02 1.495e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 12:42:42,212 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459600 2023-11-27 12:42:48,205 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2700, loss[loss=0.07558, simple_loss=0.1006, pruned_loss=0.01804, audio_tagging_loss=0.00723, over 13469.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08942, pruned_loss=0.01239, audio_tagging_loss=0.008777, over 3039878.53 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:42:50,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3064020.0, ans=0.2 2023-11-27 12:42:51,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3064020.0, ans=0.035 2023-11-27 12:42:52,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3064020.0, ans=0.0 2023-11-27 12:42:52,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3064020.0, ans=0.125 2023-11-27 12:42:59,731 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.88 vs. limit=22.5 2023-11-27 12:43:04,063 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.18 vs. limit=15.0 2023-11-27 12:43:19,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3064153.3333333335, ans=0.125 2023-11-27 12:43:39,927 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459650 2023-11-27 12:43:45,272 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2750, loss[loss=0.06609, simple_loss=0.09569, pruned_loss=0.009009, audio_tagging_loss=0.009234, over 14937.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08888, pruned_loss=0.01228, audio_tagging_loss=0.008853, over 3040377.92 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:44:01,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3064420.0, ans=0.125 2023-11-27 12:44:03,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3064420.0, ans=0.125 2023-11-27 12:44:04,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3064420.0, ans=0.2 2023-11-27 12:44:17,403 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.367e+01 8.947e+01 9.823e+01 1.478e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-27 12:44:38,954 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:44:38,993 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459700 2023-11-27 12:44:44,995 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2800, loss[loss=0.06727, simple_loss=0.08686, pruned_loss=0.01483, audio_tagging_loss=0.009008, over 14450.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08955, pruned_loss=0.01255, audio_tagging_loss=0.008726, over 3048575.09 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:44:47,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3064686.6666666665, ans=0.1 2023-11-27 12:44:57,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3064753.3333333335, ans=0.125 2023-11-27 12:44:58,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3064753.3333333335, ans=0.125 2023-11-27 12:44:59,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3064753.3333333335, ans=0.125 2023-11-27 12:45:03,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3064753.3333333335, ans=0.1 2023-11-27 12:45:14,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3064820.0, ans=0.1 2023-11-27 12:45:22,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3064886.6666666665, ans=0.0 2023-11-27 12:45:23,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3064886.6666666665, ans=0.125 2023-11-27 12:45:35,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3064953.3333333335, ans=0.1 2023-11-27 12:45:36,811 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459750 2023-11-27 12:45:41,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3065020.0, ans=0.125 2023-11-27 12:45:42,194 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2850, loss[loss=0.07147, simple_loss=0.1063, pruned_loss=0.01118, audio_tagging_loss=0.007153, over 14702.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09004, pruned_loss=0.0127, audio_tagging_loss=0.008634, over 3040839.59 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:45:42,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3065020.0, ans=0.125 2023-11-27 12:45:54,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2023-11-27 12:46:00,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3065086.6666666665, ans=0.2 2023-11-27 12:46:14,415 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.447e+01 9.117e+01 9.906e+01 1.324e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-27 12:46:24,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3065220.0, ans=0.125 2023-11-27 12:46:30,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3065286.6666666665, ans=0.125 2023-11-27 12:46:32,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3065286.6666666665, ans=0.2 2023-11-27 12:46:34,155 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459800 2023-11-27 12:46:37,716 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.45 vs. limit=15.0 2023-11-27 12:46:40,262 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2900, loss[loss=0.07191, simple_loss=0.09712, pruned_loss=0.01152, audio_tagging_loss=0.01182, over 16236.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09009, pruned_loss=0.0127, audio_tagging_loss=0.008693, over 3040149.06 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:46:47,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3065353.3333333335, ans=0.05 2023-11-27 12:46:57,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3065420.0, ans=0.09899494936611666 2023-11-27 12:47:08,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3065486.6666666665, ans=0.2 2023-11-27 12:47:10,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3065486.6666666665, ans=0.125 2023-11-27 12:47:33,577 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459850 2023-11-27 12:47:39,682 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 2950, loss[loss=0.0779, simple_loss=0.1014, pruned_loss=0.02004, audio_tagging_loss=0.007145, over 15467.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09045, pruned_loss=0.0128, audio_tagging_loss=0.008752, over 3046590.08 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:47:47,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3065686.6666666665, ans=0.09899494936611666 2023-11-27 12:47:53,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3065753.3333333335, ans=0.125 2023-11-27 12:47:53,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3065753.3333333335, ans=0.05 2023-11-27 12:47:54,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3065753.3333333335, ans=0.125 2023-11-27 12:47:58,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3065753.3333333335, ans=0.125 2023-11-27 12:48:00,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3065753.3333333335, ans=0.0 2023-11-27 12:48:10,986 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.648e+01 9.263e+01 1.004e+02 1.641e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-27 12:48:18,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3065886.6666666665, ans=0.125 2023-11-27 12:48:32,112 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459900 2023-11-27 12:48:37,509 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3000, loss[loss=0.06657, simple_loss=0.09339, pruned_loss=0.0113, audio_tagging_loss=0.008579, over 15322.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09154, pruned_loss=0.01296, audio_tagging_loss=0.008773, over 3049922.66 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:48:37,510 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 12:49:11,858 INFO [train_asr.py:1267] (2/4) Epoch 39, validation: loss=0.05767, simple_loss=0.05074, pruned_loss=0.005233, audio_tagging_loss=0.02707, over 4681554.00 frames. 2023-11-27 12:49:11,859 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 12:49:34,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3066153.3333333335, ans=0.025 2023-11-27 12:50:05,468 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 459950 2023-11-27 12:50:10,847 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3050, loss[loss=0.0642, simple_loss=0.08939, pruned_loss=0.01013, audio_tagging_loss=0.009368, over 15459.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.09223, pruned_loss=0.0129, audio_tagging_loss=0.008669, over 3046668.76 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:50:15,793 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.40 vs. limit=10.0 2023-11-27 12:50:22,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3066420.0, ans=0.125 2023-11-27 12:50:25,530 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2023-11-27 12:50:38,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3066486.6666666665, ans=0.125 2023-11-27 12:50:42,418 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.816e+01 9.327e+01 1.004e+02 1.225e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 12:50:45,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3066553.3333333335, ans=0.0 2023-11-27 12:50:45,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3066553.3333333335, ans=0.125 2023-11-27 12:50:47,030 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:50:49,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3066553.3333333335, ans=0.125 2023-11-27 12:51:03,720 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460000 2023-11-27 12:51:11,793 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3100, loss[loss=0.07838, simple_loss=0.1098, pruned_loss=0.01543, audio_tagging_loss=0.008056, over 14725.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.09225, pruned_loss=0.01277, audio_tagging_loss=0.008706, over 3047448.39 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:51:53,426 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.92 vs. limit=15.0 2023-11-27 12:52:04,048 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460050 2023-11-27 12:52:09,584 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3150, loss[loss=0.05414, simple_loss=0.07089, pruned_loss=0.008005, audio_tagging_loss=0.01069, over 15706.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09166, pruned_loss=0.01283, audio_tagging_loss=0.008837, over 3050166.34 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 12:52:42,634 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 8.556e+01 9.152e+01 9.802e+01 1.189e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 12:53:02,614 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460100 2023-11-27 12:53:08,849 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3200, loss[loss=0.06161, simple_loss=0.08984, pruned_loss=0.008679, audio_tagging_loss=0.008012, over 14790.00 frames. ], tot_loss[loss=0.06778, simple_loss=0.0917, pruned_loss=0.01295, audio_tagging_loss=0.008983, over 3049937.08 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:53:11,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3067353.3333333335, ans=0.0 2023-11-27 12:53:21,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3067420.0, ans=0.125 2023-11-27 12:53:22,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3067420.0, ans=0.125 2023-11-27 12:53:33,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3067486.6666666665, ans=0.125 2023-11-27 12:53:37,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3067486.6666666665, ans=0.125 2023-11-27 12:53:44,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3067553.3333333335, ans=0.2 2023-11-27 12:53:48,660 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2023-11-27 12:53:55,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3067620.0, ans=0.0 2023-11-27 12:54:01,019 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460150 2023-11-27 12:54:06,394 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3250, loss[loss=0.05655, simple_loss=0.06907, pruned_loss=0.01025, audio_tagging_loss=0.01177, over 15359.00 frames. ], tot_loss[loss=0.06776, simple_loss=0.09154, pruned_loss=0.01302, audio_tagging_loss=0.008973, over 3053324.43 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:54:08,656 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.00 vs. limit=15.0 2023-11-27 12:54:24,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3067753.3333333335, ans=0.0 2023-11-27 12:54:39,775 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 8.843e+01 9.410e+01 1.014e+02 1.334e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 12:54:41,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3067886.6666666665, ans=0.125 2023-11-27 12:54:48,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3067886.6666666665, ans=0.2 2023-11-27 12:54:54,961 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:54:59,304 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460200 2023-11-27 12:54:59,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3067953.3333333335, ans=0.025 2023-11-27 12:55:02,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3067953.3333333335, ans=0.1 2023-11-27 12:55:03,316 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=22.5 2023-11-27 12:55:05,132 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3300, loss[loss=0.05082, simple_loss=0.07183, pruned_loss=0.008555, audio_tagging_loss=0.006352, over 14982.00 frames. ], tot_loss[loss=0.06836, simple_loss=0.09206, pruned_loss=0.01325, audio_tagging_loss=0.009077, over 3051667.59 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:55:07,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3068020.0, ans=0.05 2023-11-27 12:55:14,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3068020.0, ans=0.05 2023-11-27 12:55:22,216 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=8.0 2023-11-27 12:55:38,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3068153.3333333335, ans=0.125 2023-11-27 12:55:42,481 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2023-11-27 12:55:43,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3068220.0, ans=0.125 2023-11-27 12:55:58,017 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460250 2023-11-27 12:56:01,969 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 12:56:04,631 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3350, loss[loss=0.04042, simple_loss=0.04642, pruned_loss=0.005869, audio_tagging_loss=0.01134, over 14780.00 frames. ], tot_loss[loss=0.06838, simple_loss=0.09221, pruned_loss=0.01326, audio_tagging_loss=0.009014, over 3048335.88 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:56:07,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3068353.3333333335, ans=0.125 2023-11-27 12:56:36,189 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.408e+01 8.562e+01 9.285e+01 9.934e+01 1.474e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 12:56:41,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3068553.3333333335, ans=0.125 2023-11-27 12:56:56,885 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460300 2023-11-27 12:57:02,296 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3400, loss[loss=0.06038, simple_loss=0.08113, pruned_loss=0.01183, audio_tagging_loss=0.007991, over 15442.00 frames. ], tot_loss[loss=0.06859, simple_loss=0.09279, pruned_loss=0.01336, audio_tagging_loss=0.008835, over 3057998.24 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:57:11,305 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=15.0 2023-11-27 12:57:13,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3068753.3333333335, ans=0.125 2023-11-27 12:57:17,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3068753.3333333335, ans=0.2 2023-11-27 12:57:20,293 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.38 vs. limit=12.0 2023-11-27 12:57:33,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3068820.0, ans=0.2 2023-11-27 12:57:45,864 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.04 vs. limit=10.0 2023-11-27 12:57:51,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.91 vs. limit=15.0 2023-11-27 12:57:52,490 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=12.0 2023-11-27 12:57:54,064 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460350 2023-11-27 12:58:00,443 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3450, loss[loss=0.0751, simple_loss=0.105, pruned_loss=0.01555, audio_tagging_loss=0.007045, over 15301.00 frames. ], tot_loss[loss=0.06785, simple_loss=0.09203, pruned_loss=0.0131, audio_tagging_loss=0.008736, over 3057472.39 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:58:08,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3069020.0, ans=0.125 2023-11-27 12:58:13,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3069086.6666666665, ans=0.02 2023-11-27 12:58:32,868 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 8.564e+01 9.034e+01 9.987e+01 1.555e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-27 12:58:37,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.25 vs. limit=15.0 2023-11-27 12:58:39,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3069220.0, ans=0.125 2023-11-27 12:58:46,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3069286.6666666665, ans=0.1 2023-11-27 12:58:46,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3069286.6666666665, ans=0.2 2023-11-27 12:58:50,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3069286.6666666665, ans=0.09899494936611666 2023-11-27 12:58:52,132 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460400 2023-11-27 12:58:58,878 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3500, loss[loss=0.06197, simple_loss=0.07075, pruned_loss=0.01553, audio_tagging_loss=0.01107, over 14723.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09157, pruned_loss=0.01291, audio_tagging_loss=0.008647, over 3050172.48 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 12:59:03,402 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.08 vs. limit=15.0 2023-11-27 12:59:18,521 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.13 vs. limit=22.5 2023-11-27 12:59:21,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.82 vs. limit=10.0 2023-11-27 12:59:27,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3069486.6666666665, ans=0.125 2023-11-27 12:59:30,592 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 12:59:31,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3069553.3333333335, ans=0.025 2023-11-27 12:59:51,323 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460450 2023-11-27 12:59:56,774 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3550, loss[loss=0.06125, simple_loss=0.08572, pruned_loss=0.009878, audio_tagging_loss=0.008514, over 15774.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09101, pruned_loss=0.01271, audio_tagging_loss=0.008627, over 3040943.94 frames. ], batch size: 60, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:00:18,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3069820.0, ans=0.0 2023-11-27 13:00:26,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3069820.0, ans=0.1 2023-11-27 13:00:30,924 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.573e+01 9.051e+01 9.738e+01 1.451e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-27 13:00:42,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3069953.3333333335, ans=0.125 2023-11-27 13:00:47,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3069953.3333333335, ans=0.125 2023-11-27 13:00:48,470 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460500 2023-11-27 13:00:54,039 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3600, loss[loss=0.07369, simple_loss=0.1021, pruned_loss=0.01617, audio_tagging_loss=0.006457, over 14989.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09116, pruned_loss=0.01278, audio_tagging_loss=0.008605, over 3040580.95 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:01:03,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3070020.0, ans=0.0 2023-11-27 13:01:39,805 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.93 vs. limit=10.0 2023-11-27 13:01:46,308 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460550 2023-11-27 13:01:50,747 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2023-11-27 13:01:51,968 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.40 vs. limit=15.0 2023-11-27 13:01:52,369 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3650, loss[loss=0.04318, simple_loss=0.05078, pruned_loss=0.006695, audio_tagging_loss=0.0111, over 14634.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09077, pruned_loss=0.01273, audio_tagging_loss=0.008631, over 3041558.94 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:02:00,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3070353.3333333335, ans=0.125 2023-11-27 13:02:04,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3070420.0, ans=0.1 2023-11-27 13:02:07,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3070420.0, ans=0.0 2023-11-27 13:02:07,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3070420.0, ans=0.125 2023-11-27 13:02:09,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3070420.0, ans=0.125 2023-11-27 13:02:19,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3070486.6666666665, ans=0.2 2023-11-27 13:02:25,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.646e+01 9.266e+01 1.020e+02 1.594e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-27 13:02:27,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3070553.3333333335, ans=0.0 2023-11-27 13:02:43,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3070620.0, ans=0.125 2023-11-27 13:02:45,659 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460600 2023-11-27 13:02:51,399 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3700, loss[loss=0.07827, simple_loss=0.1133, pruned_loss=0.01589, audio_tagging_loss=0.00574, over 15449.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09077, pruned_loss=0.01266, audio_tagging_loss=0.008659, over 3044413.52 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:03:07,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3070753.3333333335, ans=0.0 2023-11-27 13:03:12,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3070820.0, ans=0.0 2023-11-27 13:03:43,347 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460650 2023-11-27 13:03:45,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3070953.3333333335, ans=0.0 2023-11-27 13:03:48,832 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3750, loss[loss=0.06839, simple_loss=0.08584, pruned_loss=0.01237, audio_tagging_loss=0.01311, over 14528.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09198, pruned_loss=0.01293, audio_tagging_loss=0.008655, over 3048758.10 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:03:53,772 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.91 vs. limit=15.0 2023-11-27 13:04:23,264 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.737e+01 9.313e+01 1.018e+02 1.236e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 13:04:25,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3071220.0, ans=0.0 2023-11-27 13:04:30,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3071220.0, ans=0.125 2023-11-27 13:04:31,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3071220.0, ans=0.07 2023-11-27 13:04:32,094 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:04:37,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3071286.6666666665, ans=0.125 2023-11-27 13:04:40,832 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460700 2023-11-27 13:04:42,702 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.49 vs. limit=15.0 2023-11-27 13:04:46,901 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3800, loss[loss=0.06843, simple_loss=0.1007, pruned_loss=0.009802, audio_tagging_loss=0.008303, over 15769.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09186, pruned_loss=0.0129, audio_tagging_loss=0.008732, over 3049951.23 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:04:47,999 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2023-11-27 13:04:51,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3071353.3333333335, ans=0.0 2023-11-27 13:05:15,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3071486.6666666665, ans=0.125 2023-11-27 13:05:35,377 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.79 vs. limit=22.5 2023-11-27 13:05:36,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3071620.0, ans=0.0 2023-11-27 13:05:40,408 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460750 2023-11-27 13:05:42,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3071620.0, ans=0.0 2023-11-27 13:05:45,986 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3850, loss[loss=0.07883, simple_loss=0.1057, pruned_loss=0.0154, audio_tagging_loss=0.01059, over 15732.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.0914, pruned_loss=0.01289, audio_tagging_loss=0.008891, over 3052311.56 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:06:05,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3071753.3333333335, ans=0.125 2023-11-27 13:06:15,904 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.94 vs. limit=15.0 2023-11-27 13:06:16,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3071820.0, ans=0.1 2023-11-27 13:06:17,742 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:06:18,576 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.582e+01 9.212e+01 9.899e+01 1.418e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 13:06:32,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3071953.3333333335, ans=0.125 2023-11-27 13:06:37,374 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460800 2023-11-27 13:06:42,681 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2023-11-27 13:06:43,155 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3900, loss[loss=0.07517, simple_loss=0.1112, pruned_loss=0.01464, audio_tagging_loss=0.004947, over 15693.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.0906, pruned_loss=0.01265, audio_tagging_loss=0.008933, over 3047300.19 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:07:12,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3072153.3333333335, ans=0.2 2023-11-27 13:07:19,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3072220.0, ans=0.0 2023-11-27 13:07:34,944 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460850 2023-11-27 13:07:40,278 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 3950, loss[loss=0.0694, simple_loss=0.08908, pruned_loss=0.01266, audio_tagging_loss=0.0122, over 15097.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.08983, pruned_loss=0.01247, audio_tagging_loss=0.009161, over 3039827.25 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:07:45,546 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:07:51,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3072353.3333333335, ans=0.0 2023-11-27 13:07:52,209 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:07:52,455 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=22.5 2023-11-27 13:07:55,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3072420.0, ans=0.0 2023-11-27 13:07:55,444 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-11-27 13:08:01,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3072420.0, ans=0.125 2023-11-27 13:08:03,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3072486.6666666665, ans=0.1 2023-11-27 13:08:15,679 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.964e+01 9.425e+01 1.000e+02 1.456e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-27 13:08:18,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3072553.3333333335, ans=0.125 2023-11-27 13:08:21,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3072553.3333333335, ans=0.125 2023-11-27 13:08:25,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3072620.0, ans=10.0 2023-11-27 13:08:33,493 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460900 2023-11-27 13:08:38,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3072686.6666666665, ans=0.125 2023-11-27 13:08:39,772 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4000, loss[loss=0.06992, simple_loss=0.09655, pruned_loss=0.01288, audio_tagging_loss=0.008769, over 15297.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.08993, pruned_loss=0.01254, audio_tagging_loss=0.009171, over 3044762.99 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:08:55,592 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=22.5 2023-11-27 13:09:31,987 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 460950 2023-11-27 13:09:32,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3072953.3333333335, ans=0.125 2023-11-27 13:09:34,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3072953.3333333335, ans=0.125 2023-11-27 13:09:37,345 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4050, loss[loss=0.05601, simple_loss=0.07568, pruned_loss=0.007477, audio_tagging_loss=0.01069, over 15836.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09116, pruned_loss=0.01273, audio_tagging_loss=0.0091, over 3041237.09 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:09:43,851 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:09:58,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3073153.3333333335, ans=0.125 2023-11-27 13:10:08,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3073153.3333333335, ans=0.0 2023-11-27 13:10:08,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3073153.3333333335, ans=0.95 2023-11-27 13:10:11,419 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:10:13,378 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 8.613e+01 9.165e+01 1.034e+02 1.408e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-27 13:10:19,498 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-27 13:10:28,748 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461000 2023-11-27 13:10:34,539 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4100, loss[loss=0.06957, simple_loss=0.0907, pruned_loss=0.01635, audio_tagging_loss=0.007869, over 14253.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09095, pruned_loss=0.01273, audio_tagging_loss=0.009029, over 3036407.73 frames. ], batch size: 54, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:10:45,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3073420.0, ans=0.2 2023-11-27 13:10:47,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3073420.0, ans=0.125 2023-11-27 13:11:15,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3073553.3333333335, ans=0.125 2023-11-27 13:11:18,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3073553.3333333335, ans=0.0 2023-11-27 13:11:22,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3073620.0, ans=0.05 2023-11-27 13:11:22,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3073620.0, ans=0.1 2023-11-27 13:11:26,786 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461050 2023-11-27 13:11:33,399 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4150, loss[loss=0.07192, simple_loss=0.1011, pruned_loss=0.0139, audio_tagging_loss=0.007462, over 15018.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09082, pruned_loss=0.01267, audio_tagging_loss=0.008904, over 3038449.76 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:11:33,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3073686.6666666665, ans=0.1 2023-11-27 13:11:46,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3073753.3333333335, ans=0.1 2023-11-27 13:11:48,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3073753.3333333335, ans=0.125 2023-11-27 13:11:52,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3073753.3333333335, ans=0.125 2023-11-27 13:11:55,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3073820.0, ans=0.125 2023-11-27 13:11:56,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3073820.0, ans=0.125 2023-11-27 13:12:08,196 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.828e+01 8.439e+01 9.039e+01 1.003e+02 1.334e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-27 13:12:09,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3073886.6666666665, ans=0.0 2023-11-27 13:12:09,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3073886.6666666665, ans=0.125 2023-11-27 13:12:18,757 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:12:25,953 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461100 2023-11-27 13:12:31,243 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4200, loss[loss=0.06941, simple_loss=0.09218, pruned_loss=0.01447, audio_tagging_loss=0.00885, over 15500.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09047, pruned_loss=0.01275, audio_tagging_loss=0.008732, over 3040654.53 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:12:40,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3074020.0, ans=0.125 2023-11-27 13:12:44,016 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.72 vs. limit=10.0 2023-11-27 13:12:45,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3074086.6666666665, ans=0.125 2023-11-27 13:13:06,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3074220.0, ans=0.2 2023-11-27 13:13:06,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3074220.0, ans=0.0 2023-11-27 13:13:07,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3074220.0, ans=15.0 2023-11-27 13:13:23,357 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461150 2023-11-27 13:13:28,870 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4250, loss[loss=0.06164, simple_loss=0.08508, pruned_loss=0.01082, audio_tagging_loss=0.008288, over 14665.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08964, pruned_loss=0.0126, audio_tagging_loss=0.008731, over 3037950.39 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:13:58,203 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.02 vs. limit=22.5 2023-11-27 13:14:01,487 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.44 vs. limit=22.5 2023-11-27 13:14:03,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3074553.3333333335, ans=0.1 2023-11-27 13:14:06,425 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.652e+01 9.233e+01 9.893e+01 1.518e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-27 13:14:08,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3074553.3333333335, ans=0.125 2023-11-27 13:14:08,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3074553.3333333335, ans=0.125 2023-11-27 13:14:13,462 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2023-11-27 13:14:16,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3074620.0, ans=0.04949747468305833 2023-11-27 13:14:16,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3074620.0, ans=0.1 2023-11-27 13:14:17,024 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.31 vs. limit=22.5 2023-11-27 13:14:20,874 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461200 2023-11-27 13:14:22,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3074620.0, ans=0.125 2023-11-27 13:14:27,648 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4300, loss[loss=0.05363, simple_loss=0.07261, pruned_loss=0.009736, audio_tagging_loss=0.00759, over 14658.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08962, pruned_loss=0.01244, audio_tagging_loss=0.008694, over 3036584.45 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:14:38,792 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.53 vs. limit=10.0 2023-11-27 13:14:45,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3074753.3333333335, ans=0.0 2023-11-27 13:15:03,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3074886.6666666665, ans=0.2 2023-11-27 13:15:18,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3074953.3333333335, ans=0.05 2023-11-27 13:15:19,940 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461250 2023-11-27 13:15:25,855 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4350, loss[loss=0.05671, simple_loss=0.07285, pruned_loss=0.008268, audio_tagging_loss=0.01202, over 14767.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09056, pruned_loss=0.01259, audio_tagging_loss=0.008683, over 3036644.95 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:15:32,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3075020.0, ans=0.0 2023-11-27 13:15:48,189 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:16:02,969 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 8.774e+01 9.357e+01 9.883e+01 1.317e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 13:16:03,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3075220.0, ans=0.05 2023-11-27 13:16:05,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3075220.0, ans=0.1 2023-11-27 13:16:16,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3075286.6666666665, ans=0.2 2023-11-27 13:16:18,136 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461300 2023-11-27 13:16:23,505 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4400, loss[loss=0.08458, simple_loss=0.1132, pruned_loss=0.01895, audio_tagging_loss=0.009029, over 15046.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09149, pruned_loss=0.0129, audio_tagging_loss=0.00873, over 3031348.14 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:16:26,430 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=22.5 2023-11-27 13:16:28,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3075353.3333333335, ans=0.0 2023-11-27 13:16:47,056 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.94 vs. limit=15.0 2023-11-27 13:17:06,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3075553.3333333335, ans=0.0 2023-11-27 13:17:14,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3075620.0, ans=0.0 2023-11-27 13:17:14,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3075620.0, ans=0.2 2023-11-27 13:17:15,616 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461350 2023-11-27 13:17:21,465 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4450, loss[loss=0.04253, simple_loss=0.05577, pruned_loss=0.00471, audio_tagging_loss=0.009938, over 15512.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09171, pruned_loss=0.01278, audio_tagging_loss=0.008572, over 3036140.19 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:17:36,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3075753.3333333335, ans=0.1 2023-11-27 13:17:44,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3075820.0, ans=0.1 2023-11-27 13:17:58,562 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 8.640e+01 9.339e+01 1.014e+02 1.202e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 13:18:02,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3075886.6666666665, ans=0.125 2023-11-27 13:18:04,132 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.22 vs. limit=22.5 2023-11-27 13:18:10,798 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:18:12,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3075953.3333333335, ans=0.0 2023-11-27 13:18:14,601 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461400 2023-11-27 13:18:15,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3075953.3333333335, ans=0.0 2023-11-27 13:18:20,234 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4500, loss[loss=0.04918, simple_loss=0.06288, pruned_loss=0.004832, audio_tagging_loss=0.0129, over 14662.00 frames. ], tot_loss[loss=0.06722, simple_loss=0.09164, pruned_loss=0.01281, audio_tagging_loss=0.008596, over 3033776.79 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:18:47,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3076153.3333333335, ans=0.2 2023-11-27 13:18:51,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3076153.3333333335, ans=0.125 2023-11-27 13:18:54,154 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2023-11-27 13:18:57,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3076220.0, ans=0.1 2023-11-27 13:19:08,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3076286.6666666665, ans=0.125 2023-11-27 13:19:11,559 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461450 2023-11-27 13:19:11,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3076286.6666666665, ans=0.1 2023-11-27 13:19:15,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3076286.6666666665, ans=0.1 2023-11-27 13:19:17,599 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4550, loss[loss=0.09175, simple_loss=0.1234, pruned_loss=0.0228, audio_tagging_loss=0.007255, over 15565.00 frames. ], tot_loss[loss=0.0675, simple_loss=0.09169, pruned_loss=0.01299, audio_tagging_loss=0.008668, over 3040458.72 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:19:20,329 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.26 vs. limit=15.0 2023-11-27 13:19:27,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3076420.0, ans=0.125 2023-11-27 13:19:38,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3076420.0, ans=0.125 2023-11-27 13:19:45,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3076486.6666666665, ans=0.0 2023-11-27 13:19:53,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3076553.3333333335, ans=0.0 2023-11-27 13:19:54,261 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.721e+01 8.672e+01 9.431e+01 1.029e+02 1.211e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 13:19:54,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3076553.3333333335, ans=0.2 2023-11-27 13:20:00,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3076553.3333333335, ans=0.0 2023-11-27 13:20:04,916 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:20:09,207 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461500 2023-11-27 13:20:14,515 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4600, loss[loss=0.0551, simple_loss=0.07411, pruned_loss=0.009229, audio_tagging_loss=0.008817, over 15045.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09174, pruned_loss=0.01297, audio_tagging_loss=0.008725, over 3040518.15 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:20:21,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3076686.6666666665, ans=0.125 2023-11-27 13:20:21,534 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.42 vs. limit=15.0 2023-11-27 13:20:33,994 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-11-27 13:21:08,104 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461550 2023-11-27 13:21:13,505 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4650, loss[loss=0.0588, simple_loss=0.08256, pruned_loss=0.009427, audio_tagging_loss=0.008094, over 14775.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09058, pruned_loss=0.01274, audio_tagging_loss=0.008831, over 3035768.48 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:21:22,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3077020.0, ans=0.0 2023-11-27 13:21:43,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3077153.3333333335, ans=0.125 2023-11-27 13:21:49,995 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.494e+01 8.746e+01 9.217e+01 9.897e+01 1.196e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-27 13:21:51,632 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.91 vs. limit=22.5 2023-11-27 13:22:04,851 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461600 2023-11-27 13:22:10,671 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4700, loss[loss=0.07333, simple_loss=0.09327, pruned_loss=0.01789, audio_tagging_loss=0.008803, over 15462.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.08976, pruned_loss=0.01273, audio_tagging_loss=0.008949, over 3042463.15 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:22:12,287 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.01 vs. limit=10.0 2023-11-27 13:22:13,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3077353.3333333335, ans=0.125 2023-11-27 13:22:16,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3077353.3333333335, ans=0.0 2023-11-27 13:23:02,732 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461650 2023-11-27 13:23:08,071 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4750, loss[loss=0.07697, simple_loss=0.09678, pruned_loss=0.01953, audio_tagging_loss=0.009052, over 14596.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08891, pruned_loss=0.01257, audio_tagging_loss=0.009029, over 3037345.31 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:23:09,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3077686.6666666665, ans=0.0 2023-11-27 13:23:19,137 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:23:45,248 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 8.817e+01 9.459e+01 1.019e+02 1.212e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-27 13:23:56,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3077953.3333333335, ans=0.2 2023-11-27 13:24:00,643 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461700 2023-11-27 13:24:06,652 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4800, loss[loss=0.05946, simple_loss=0.07195, pruned_loss=0.01196, audio_tagging_loss=0.01153, over 14385.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08942, pruned_loss=0.01264, audio_tagging_loss=0.009118, over 3041423.11 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:24:26,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3078086.6666666665, ans=0.125 2023-11-27 13:24:32,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3078153.3333333335, ans=0.125 2023-11-27 13:24:35,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3078153.3333333335, ans=0.1 2023-11-27 13:24:43,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3078220.0, ans=0.125 2023-11-27 13:24:57,837 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461750 2023-11-27 13:25:01,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3078286.6666666665, ans=0.0 2023-11-27 13:25:03,194 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4850, loss[loss=0.05101, simple_loss=0.06396, pruned_loss=0.008945, audio_tagging_loss=0.01009, over 16064.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.08985, pruned_loss=0.01278, audio_tagging_loss=0.009152, over 3044087.32 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:25:06,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3078353.3333333335, ans=0.1 2023-11-27 13:25:08,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3078353.3333333335, ans=0.125 2023-11-27 13:25:13,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3078420.0, ans=0.2 2023-11-27 13:25:20,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3078420.0, ans=0.125 2023-11-27 13:25:29,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3078486.6666666665, ans=0.125 2023-11-27 13:25:36,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3078486.6666666665, ans=0.0 2023-11-27 13:25:37,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3078553.3333333335, ans=0.1 2023-11-27 13:25:40,269 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.650e+01 9.327e+01 1.023e+02 1.195e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 13:25:54,472 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461800 2023-11-27 13:25:58,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3078620.0, ans=0.0 2023-11-27 13:26:00,786 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4900, loss[loss=0.08199, simple_loss=0.108, pruned_loss=0.01607, audio_tagging_loss=0.0119, over 14600.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.0897, pruned_loss=0.01268, audio_tagging_loss=0.00906, over 3043430.54 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:26:06,019 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-11-27 13:26:25,707 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2023-11-27 13:26:29,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3078820.0, ans=0.125 2023-11-27 13:26:52,018 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=15.0 2023-11-27 13:26:52,501 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461850 2023-11-27 13:26:58,534 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 4950, loss[loss=0.07248, simple_loss=0.1012, pruned_loss=0.01504, audio_tagging_loss=0.00683, over 15755.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08926, pruned_loss=0.01257, audio_tagging_loss=0.008922, over 3044920.79 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:27:05,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3079020.0, ans=10.0 2023-11-27 13:27:07,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3079020.0, ans=0.0 2023-11-27 13:27:15,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3079086.6666666665, ans=0.1 2023-11-27 13:27:22,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3079153.3333333335, ans=0.0 2023-11-27 13:27:34,977 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 8.478e+01 9.080e+01 9.742e+01 1.240e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-27 13:27:48,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3079286.6666666665, ans=0.125 2023-11-27 13:27:48,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3079286.6666666665, ans=0.125 2023-11-27 13:27:48,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3079286.6666666665, ans=0.125 2023-11-27 13:27:50,466 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461900 2023-11-27 13:27:55,868 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5000, loss[loss=0.06428, simple_loss=0.08934, pruned_loss=0.01195, audio_tagging_loss=0.007662, over 16140.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08933, pruned_loss=0.01248, audio_tagging_loss=0.008765, over 3037389.40 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:28:02,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3079353.3333333335, ans=0.1 2023-11-27 13:28:10,657 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2023-11-27 13:28:14,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3079420.0, ans=0.2 2023-11-27 13:28:18,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3079486.6666666665, ans=0.2 2023-11-27 13:28:21,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3079486.6666666665, ans=0.0 2023-11-27 13:28:42,609 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2023-11-27 13:28:47,582 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 461950 2023-11-27 13:28:49,178 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-11-27 13:28:51,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3079620.0, ans=0.1 2023-11-27 13:28:52,928 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5050, loss[loss=0.05958, simple_loss=0.08409, pruned_loss=0.01098, audio_tagging_loss=0.006553, over 15246.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.0902, pruned_loss=0.01276, audio_tagging_loss=0.008733, over 3045224.36 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:28:53,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3079686.6666666665, ans=0.2 2023-11-27 13:28:56,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3079686.6666666665, ans=0.1 2023-11-27 13:29:03,478 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:29:27,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3079886.6666666665, ans=0.0 2023-11-27 13:29:29,239 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.036e+01 8.853e+01 9.456e+01 1.016e+02 1.610e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 13:29:31,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3079886.6666666665, ans=0.125 2023-11-27 13:29:41,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3079953.3333333335, ans=0.025 2023-11-27 13:29:43,971 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462000 2023-11-27 13:29:50,852 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5100, loss[loss=0.04903, simple_loss=0.05797, pruned_loss=0.0114, audio_tagging_loss=0.008645, over 14312.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09019, pruned_loss=0.01286, audio_tagging_loss=0.008701, over 3049394.28 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:29:51,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3080020.0, ans=0.0 2023-11-27 13:30:11,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3080086.6666666665, ans=0.0 2023-11-27 13:30:21,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3080153.3333333335, ans=0.1 2023-11-27 13:30:23,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3080220.0, ans=0.1 2023-11-27 13:30:42,759 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462050 2023-11-27 13:30:46,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3080286.6666666665, ans=0.2 2023-11-27 13:30:48,254 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5150, loss[loss=0.05436, simple_loss=0.07152, pruned_loss=0.008977, audio_tagging_loss=0.009626, over 16064.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08995, pruned_loss=0.0128, audio_tagging_loss=0.008737, over 3046410.38 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:30:55,270 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.21 vs. limit=10.0 2023-11-27 13:31:00,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3080420.0, ans=0.95 2023-11-27 13:31:07,399 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2023-11-27 13:31:25,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3080553.3333333335, ans=0.0 2023-11-27 13:31:26,837 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.025e+01 8.502e+01 9.223e+01 9.833e+01 1.321e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-27 13:31:39,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3080620.0, ans=0.2 2023-11-27 13:31:39,939 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462100 2023-11-27 13:31:45,303 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5200, loss[loss=0.06901, simple_loss=0.0955, pruned_loss=0.01413, audio_tagging_loss=0.007121, over 15006.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08874, pruned_loss=0.01259, audio_tagging_loss=0.008722, over 3050364.15 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 32.0 2023-11-27 13:31:52,546 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.79 vs. limit=10.0 2023-11-27 13:32:08,386 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.68 vs. limit=22.5 2023-11-27 13:32:09,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3080820.0, ans=0.5 2023-11-27 13:32:11,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3080820.0, ans=0.2 2023-11-27 13:32:17,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3080820.0, ans=0.0 2023-11-27 13:32:36,117 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462150 2023-11-27 13:32:42,059 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5250, loss[loss=0.06125, simple_loss=0.08234, pruned_loss=0.009468, audio_tagging_loss=0.01061, over 14402.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08958, pruned_loss=0.01277, audio_tagging_loss=0.008678, over 3041717.11 frames. ], batch size: 55, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:32:45,395 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.06 vs. limit=22.5 2023-11-27 13:33:02,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3081086.6666666665, ans=0.0 2023-11-27 13:33:20,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.787e+01 9.211e+01 1.001e+02 1.224e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 13:33:28,424 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=12.0 2023-11-27 13:33:28,530 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.75 vs. limit=22.5 2023-11-27 13:33:34,946 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462200 2023-11-27 13:33:40,669 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5300, loss[loss=0.07994, simple_loss=0.1143, pruned_loss=0.01414, audio_tagging_loss=0.008632, over 14316.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.08982, pruned_loss=0.01281, audio_tagging_loss=0.008683, over 3032885.27 frames. ], batch size: 53, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:33:51,288 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=15.0 2023-11-27 13:34:12,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3081486.6666666665, ans=0.09899494936611666 2023-11-27 13:34:21,963 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2023-11-27 13:34:26,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3081620.0, ans=0.0 2023-11-27 13:34:28,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3081620.0, ans=0.2 2023-11-27 13:34:32,341 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462250 2023-11-27 13:34:36,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3081686.6666666665, ans=0.125 2023-11-27 13:34:36,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3081686.6666666665, ans=0.125 2023-11-27 13:34:37,735 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5350, loss[loss=0.04998, simple_loss=0.05657, pruned_loss=0.009004, audio_tagging_loss=0.01269, over 15550.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.08994, pruned_loss=0.01276, audio_tagging_loss=0.008704, over 3025002.24 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:34:46,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3081686.6666666665, ans=0.0 2023-11-27 13:34:52,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3081753.3333333335, ans=0.0 2023-11-27 13:35:00,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3081820.0, ans=0.2 2023-11-27 13:35:01,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3081820.0, ans=0.0 2023-11-27 13:35:06,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3081820.0, ans=0.125 2023-11-27 13:35:07,729 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2023-11-27 13:35:16,879 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 8.714e+01 9.357e+01 9.937e+01 1.176e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 13:35:19,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3081886.6666666665, ans=0.125 2023-11-27 13:35:29,091 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462300 2023-11-27 13:35:34,973 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5400, loss[loss=0.06264, simple_loss=0.0861, pruned_loss=0.0106, audio_tagging_loss=0.008992, over 15976.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09016, pruned_loss=0.01289, audio_tagging_loss=0.008804, over 3036907.01 frames. ], batch size: 59, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:35:41,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.43 vs. limit=22.5 2023-11-27 13:35:48,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3082086.6666666665, ans=0.125 2023-11-27 13:36:01,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3082153.3333333335, ans=0.0 2023-11-27 13:36:08,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3082220.0, ans=0.125 2023-11-27 13:36:27,732 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462350 2023-11-27 13:36:33,144 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5450, loss[loss=0.05668, simple_loss=0.07312, pruned_loss=0.01034, audio_tagging_loss=0.009781, over 15485.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09058, pruned_loss=0.01289, audio_tagging_loss=0.008811, over 3040082.33 frames. ], batch size: 62, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:36:35,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3082353.3333333335, ans=0.125 2023-11-27 13:36:44,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3082420.0, ans=0.125 2023-11-27 13:36:44,290 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.88 vs. limit=15.0 2023-11-27 13:36:48,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3082420.0, ans=0.1 2023-11-27 13:37:12,772 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.623e+01 9.313e+01 1.025e+02 1.327e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 13:37:22,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3082620.0, ans=0.125 2023-11-27 13:37:25,508 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462400 2023-11-27 13:37:27,034 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2023-11-27 13:37:31,088 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5500, loss[loss=0.05974, simple_loss=0.07794, pruned_loss=0.01106, audio_tagging_loss=0.009717, over 13874.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09069, pruned_loss=0.01289, audio_tagging_loss=0.008839, over 3045124.92 frames. ], batch size: 52, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:37:31,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3082686.6666666665, ans=0.1 2023-11-27 13:37:37,076 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2023-11-27 13:37:40,230 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2023-11-27 13:37:59,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3082820.0, ans=0.125 2023-11-27 13:38:09,236 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.92 vs. limit=12.0 2023-11-27 13:38:22,991 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462450 2023-11-27 13:38:24,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3082953.3333333335, ans=0.0 2023-11-27 13:38:25,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3082953.3333333335, ans=0.125 2023-11-27 13:38:28,405 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5550, loss[loss=0.07159, simple_loss=0.09413, pruned_loss=0.01466, audio_tagging_loss=0.009863, over 16613.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09075, pruned_loss=0.01276, audio_tagging_loss=0.008967, over 3052223.13 frames. ], batch size: 61, lr: 1.74e-03, grad_scale: 8.0 2023-11-27 13:38:36,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3083020.0, ans=0.125 2023-11-27 13:38:57,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3083153.3333333335, ans=0.0 2023-11-27 13:39:00,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3083153.3333333335, ans=0.025 2023-11-27 13:39:09,190 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.682e+01 9.087e+01 1.004e+02 1.719e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-27 13:39:21,217 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462500 2023-11-27 13:39:27,244 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5600, loss[loss=0.07024, simple_loss=0.08472, pruned_loss=0.01516, audio_tagging_loss=0.01272, over 15153.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09043, pruned_loss=0.01279, audio_tagging_loss=0.009126, over 3060265.51 frames. ], batch size: 58, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:39:40,349 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:39:41,645 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2023-11-27 13:39:48,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3083420.0, ans=0.125 2023-11-27 13:40:12,338 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:40:19,094 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462550 2023-11-27 13:40:25,088 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5650, loss[loss=0.05528, simple_loss=0.08167, pruned_loss=0.007828, audio_tagging_loss=0.006623, over 14835.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09007, pruned_loss=0.01279, audio_tagging_loss=0.009143, over 3057501.52 frames. ], batch size: 57, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:40:36,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3083753.3333333335, ans=0.125 2023-11-27 13:40:37,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3083753.3333333335, ans=0.0 2023-11-27 13:41:05,319 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.523e+01 8.985e+01 9.871e+01 1.258e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-27 13:41:05,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3083886.6666666665, ans=0.125 2023-11-27 13:41:16,895 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462600 2023-11-27 13:41:22,701 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5700, loss[loss=0.05023, simple_loss=0.05862, pruned_loss=0.008396, audio_tagging_loss=0.01252, over 15894.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.0903, pruned_loss=0.0129, audio_tagging_loss=0.009118, over 3059063.05 frames. ], batch size: 63, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:41:33,090 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2023-11-27 13:41:36,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3084086.6666666665, ans=0.2 2023-11-27 13:41:58,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3084220.0, ans=0.125 2023-11-27 13:42:14,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3084286.6666666665, ans=0.125 2023-11-27 13:42:14,963 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462650 2023-11-27 13:42:21,536 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5750, loss[loss=0.06042, simple_loss=0.07681, pruned_loss=0.01124, audio_tagging_loss=0.01077, over 14739.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09062, pruned_loss=0.01309, audio_tagging_loss=0.008998, over 3056382.79 frames. ], batch size: 56, lr: 1.74e-03, grad_scale: 16.0 2023-11-27 13:43:01,845 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 8.392e+01 9.189e+01 1.014e+02 1.266e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-27 13:43:06,135 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=22.5 2023-11-27 13:43:13,291 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462700 2023-11-27 13:43:14,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3084620.0, ans=0.0 2023-11-27 13:43:18,804 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5800, loss[loss=0.058, simple_loss=0.08039, pruned_loss=0.01126, audio_tagging_loss=0.006548, over 16577.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09054, pruned_loss=0.0131, audio_tagging_loss=0.008917, over 3057045.92 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:43:30,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3084753.3333333335, ans=0.125 2023-11-27 13:43:33,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3084753.3333333335, ans=0.07 2023-11-27 13:43:40,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3084753.3333333335, ans=0.125 2023-11-27 13:43:40,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3084753.3333333335, ans=0.125 2023-11-27 13:44:10,076 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=22.5 2023-11-27 13:44:11,174 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462750 2023-11-27 13:44:11,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3084953.3333333335, ans=0.0 2023-11-27 13:44:14,583 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:44:15,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3085020.0, ans=0.1 2023-11-27 13:44:16,470 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5850, loss[loss=0.04536, simple_loss=0.05618, pruned_loss=0.005202, audio_tagging_loss=0.01207, over 13493.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09033, pruned_loss=0.01301, audio_tagging_loss=0.008883, over 3048943.45 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:44:27,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3085086.6666666665, ans=0.0 2023-11-27 13:44:28,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3085086.6666666665, ans=0.125 2023-11-27 13:44:41,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3085153.3333333335, ans=0.0 2023-11-27 13:44:45,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3085153.3333333335, ans=0.125 2023-11-27 13:44:57,005 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.148e+01 8.603e+01 9.114e+01 9.888e+01 1.396e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-27 13:44:57,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3085220.0, ans=0.0 2023-11-27 13:45:08,513 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462800 2023-11-27 13:45:10,196 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.72 vs. limit=15.0 2023-11-27 13:45:11,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3085286.6666666665, ans=0.0 2023-11-27 13:45:14,804 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5900, loss[loss=0.08176, simple_loss=0.1134, pruned_loss=0.01673, audio_tagging_loss=0.00834, over 15715.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09054, pruned_loss=0.01293, audio_tagging_loss=0.008823, over 3046939.79 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:45:27,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3085420.0, ans=0.0 2023-11-27 13:46:07,467 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462850 2023-11-27 13:46:08,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3085620.0, ans=0.125 2023-11-27 13:46:12,866 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 5950, loss[loss=0.07459, simple_loss=0.1024, pruned_loss=0.01223, audio_tagging_loss=0.01114, over 14854.00 frames. ], tot_loss[loss=0.06757, simple_loss=0.09151, pruned_loss=0.01305, audio_tagging_loss=0.008767, over 3057679.53 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:46:14,405 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2023-11-27 13:46:21,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3085686.6666666665, ans=0.125 2023-11-27 13:46:28,501 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.98 vs. limit=15.0 2023-11-27 13:46:53,851 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.517e+01 9.163e+01 1.018e+02 1.224e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-27 13:46:54,427 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.36 vs. limit=22.5 2023-11-27 13:47:00,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3085953.3333333335, ans=0.125 2023-11-27 13:47:04,825 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462900 2023-11-27 13:47:10,176 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6000, loss[loss=0.07229, simple_loss=0.09456, pruned_loss=0.01432, audio_tagging_loss=0.01068, over 15093.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.0913, pruned_loss=0.01314, audio_tagging_loss=0.008896, over 3057180.90 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:47:10,177 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 13:47:24,802 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4641, 3.0136, 4.2293, 3.1278], device='cuda:2') 2023-11-27 13:47:42,860 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4668, 3.8463, 4.4196, 3.4134], device='cuda:2') 2023-11-27 13:47:44,804 INFO [train_asr.py:1267] (2/4) Epoch 39, validation: loss=0.05766, simple_loss=0.05076, pruned_loss=0.005225, audio_tagging_loss=0.02706, over 4681554.00 frames. 2023-11-27 13:47:44,804 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 13:48:04,029 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=12.0 2023-11-27 13:48:29,912 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 13:48:30,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3086286.6666666665, ans=0.2 2023-11-27 13:48:36,703 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 462950 2023-11-27 13:48:42,316 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6050, loss[loss=0.0489, simple_loss=0.06233, pruned_loss=0.008348, audio_tagging_loss=0.00939, over 15248.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09099, pruned_loss=0.01291, audio_tagging_loss=0.008871, over 3060936.99 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:48:48,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3086353.3333333335, ans=0.1 2023-11-27 13:49:20,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3086553.3333333335, ans=0.0 2023-11-27 13:49:24,233 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.673e+01 9.372e+01 1.019e+02 1.327e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 13:49:27,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3086620.0, ans=0.0 2023-11-27 13:49:34,246 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463000 2023-11-27 13:49:35,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3086620.0, ans=0.125 2023-11-27 13:49:36,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3086620.0, ans=0.125 2023-11-27 13:49:39,510 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.38 vs. limit=6.0 2023-11-27 13:49:40,139 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6100, loss[loss=0.07198, simple_loss=0.1126, pruned_loss=0.01019, audio_tagging_loss=0.005503, over 15469.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.0898, pruned_loss=0.01262, audio_tagging_loss=0.008845, over 3059001.82 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:49:40,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3086686.6666666665, ans=0.125 2023-11-27 13:49:41,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3086686.6666666665, ans=0.125 2023-11-27 13:49:47,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3086686.6666666665, ans=0.125 2023-11-27 13:50:13,381 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.82 vs. limit=10.0 2023-11-27 13:50:23,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3086886.6666666665, ans=0.0 2023-11-27 13:50:32,436 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463050 2023-11-27 13:50:36,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3086953.3333333335, ans=0.0 2023-11-27 13:50:38,826 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6150, loss[loss=0.05156, simple_loss=0.06991, pruned_loss=0.008045, audio_tagging_loss=0.00856, over 15141.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08955, pruned_loss=0.01271, audio_tagging_loss=0.008905, over 3043083.16 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:50:41,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3087020.0, ans=0.2 2023-11-27 13:50:57,383 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 13:50:58,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3087086.6666666665, ans=0.125 2023-11-27 13:50:59,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3087086.6666666665, ans=0.1 2023-11-27 13:51:03,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3087153.3333333335, ans=0.125 2023-11-27 13:51:08,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3087153.3333333335, ans=0.2 2023-11-27 13:51:20,157 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.580e+01 9.298e+01 1.013e+02 1.298e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 13:51:23,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3087220.0, ans=0.2 2023-11-27 13:51:29,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3087286.6666666665, ans=0.0 2023-11-27 13:51:31,266 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463100 2023-11-27 13:51:36,705 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6200, loss[loss=0.07672, simple_loss=0.1018, pruned_loss=0.01897, audio_tagging_loss=0.006838, over 14758.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09048, pruned_loss=0.01272, audio_tagging_loss=0.008796, over 3049196.62 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:51:42,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3087353.3333333335, ans=0.0 2023-11-27 13:52:03,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3087486.6666666665, ans=0.125 2023-11-27 13:52:13,587 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.47 vs. limit=22.5 2023-11-27 13:52:28,822 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463150 2023-11-27 13:52:34,177 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6250, loss[loss=0.06001, simple_loss=0.07599, pruned_loss=0.01263, audio_tagging_loss=0.009388, over 16007.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09008, pruned_loss=0.01259, audio_tagging_loss=0.00885, over 3050087.84 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:52:56,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3087753.3333333335, ans=0.0 2023-11-27 13:52:56,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3087753.3333333335, ans=0.0 2023-11-27 13:52:57,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3087820.0, ans=0.2 2023-11-27 13:53:03,354 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=15.0 2023-11-27 13:53:16,190 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.844e+01 8.648e+01 9.152e+01 1.003e+02 1.287e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 13:53:20,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3087953.3333333335, ans=0.125 2023-11-27 13:53:26,224 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463200 2023-11-27 13:53:32,901 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6300, loss[loss=0.07876, simple_loss=0.1093, pruned_loss=0.01516, audio_tagging_loss=0.008958, over 15229.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09125, pruned_loss=0.01268, audio_tagging_loss=0.00891, over 3054828.63 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:53:34,621 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.36 vs. limit=22.5 2023-11-27 13:53:47,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3088086.6666666665, ans=0.0 2023-11-27 13:54:00,258 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=22.5 2023-11-27 13:54:04,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3088153.3333333335, ans=0.125 2023-11-27 13:54:05,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3088153.3333333335, ans=0.125 2023-11-27 13:54:06,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3088220.0, ans=0.125 2023-11-27 13:54:14,991 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.47 vs. limit=12.0 2023-11-27 13:54:25,944 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463250 2023-11-27 13:54:31,597 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6350, loss[loss=0.05534, simple_loss=0.06792, pruned_loss=0.01086, audio_tagging_loss=0.01052, over 15751.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09076, pruned_loss=0.01265, audio_tagging_loss=0.008884, over 3050780.49 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 13:54:39,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3088353.3333333335, ans=0.2 2023-11-27 13:54:40,143 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.36 vs. limit=8.0 2023-11-27 13:54:40,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3088353.3333333335, ans=0.125 2023-11-27 13:54:59,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3088486.6666666665, ans=0.125 2023-11-27 13:55:05,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3088553.3333333335, ans=0.125 2023-11-27 13:55:13,234 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.449e+01 8.528e+01 9.081e+01 9.909e+01 1.480e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-27 13:55:23,346 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463300 2023-11-27 13:55:28,791 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6400, loss[loss=0.04031, simple_loss=0.04475, pruned_loss=0.006193, audio_tagging_loss=0.01174, over 16104.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09037, pruned_loss=0.01256, audio_tagging_loss=0.009027, over 3048272.03 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:55:30,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3088686.6666666665, ans=0.2 2023-11-27 13:55:34,991 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2023-11-27 13:55:46,386 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2023-11-27 13:56:00,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3088820.0, ans=0.125 2023-11-27 13:56:13,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3088953.3333333335, ans=0.2 2023-11-27 13:56:17,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3088953.3333333335, ans=0.2 2023-11-27 13:56:20,292 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463350 2023-11-27 13:56:20,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3088953.3333333335, ans=0.1 2023-11-27 13:56:25,844 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6450, loss[loss=0.06808, simple_loss=0.08679, pruned_loss=0.01584, audio_tagging_loss=0.00884, over 15333.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09033, pruned_loss=0.01271, audio_tagging_loss=0.009098, over 3042022.64 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:56:45,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3089086.6666666665, ans=0.125 2023-11-27 13:57:07,844 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 8.614e+01 9.257e+01 9.847e+01 1.533e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-27 13:57:18,894 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2023-11-27 13:57:19,459 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463400 2023-11-27 13:57:24,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3089353.3333333335, ans=0.2 2023-11-27 13:57:25,800 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6500, loss[loss=0.07053, simple_loss=0.09865, pruned_loss=0.01298, audio_tagging_loss=0.008221, over 14982.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.08939, pruned_loss=0.01265, audio_tagging_loss=0.009116, over 3041623.05 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:57:31,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3089353.3333333335, ans=0.07 2023-11-27 13:57:35,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3089353.3333333335, ans=0.0 2023-11-27 13:57:48,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3089486.6666666665, ans=0.025 2023-11-27 13:57:53,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3089486.6666666665, ans=0.0 2023-11-27 13:58:06,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3089553.3333333335, ans=0.1 2023-11-27 13:58:17,159 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463450 2023-11-27 13:58:22,720 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6550, loss[loss=0.06108, simple_loss=0.08409, pruned_loss=0.008929, audio_tagging_loss=0.0101, over 16196.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08957, pruned_loss=0.01257, audio_tagging_loss=0.008941, over 3046395.65 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:59:04,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.565e+01 9.161e+01 9.963e+01 1.311e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 13:59:14,303 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463500 2023-11-27 13:59:19,697 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6600, loss[loss=0.06013, simple_loss=0.08159, pruned_loss=0.01099, audio_tagging_loss=0.008338, over 15080.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08882, pruned_loss=0.01246, audio_tagging_loss=0.00883, over 3049726.94 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 13:59:44,155 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.66 vs. limit=22.5 2023-11-27 14:00:00,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3090220.0, ans=0.1 2023-11-27 14:00:02,997 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2023-11-27 14:00:03,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3090220.0, ans=0.125 2023-11-27 14:00:11,624 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463550 2023-11-27 14:00:17,669 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6650, loss[loss=0.06765, simple_loss=0.0941, pruned_loss=0.01197, audio_tagging_loss=0.008631, over 15191.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08888, pruned_loss=0.01247, audio_tagging_loss=0.008762, over 3050740.72 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:00:58,784 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.791e+01 9.442e+01 1.009e+02 1.378e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-27 14:01:09,399 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463600 2023-11-27 14:01:15,148 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6700, loss[loss=0.06501, simple_loss=0.0882, pruned_loss=0.01052, audio_tagging_loss=0.01039, over 15055.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08917, pruned_loss=0.0125, audio_tagging_loss=0.008761, over 3044527.88 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:01:17,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3090686.6666666665, ans=10.0 2023-11-27 14:01:29,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3090753.3333333335, ans=0.125 2023-11-27 14:02:06,790 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463650 2023-11-27 14:02:09,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3090953.3333333335, ans=0.0 2023-11-27 14:02:10,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3090953.3333333335, ans=0.125 2023-11-27 14:02:12,088 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6750, loss[loss=0.07551, simple_loss=0.1027, pruned_loss=0.01765, audio_tagging_loss=0.006506, over 15598.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08983, pruned_loss=0.0126, audio_tagging_loss=0.008703, over 3046470.08 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:02:18,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3091020.0, ans=10.0 2023-11-27 14:02:25,820 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.47 vs. limit=15.0 2023-11-27 14:02:35,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3091153.3333333335, ans=0.0 2023-11-27 14:02:36,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3091153.3333333335, ans=0.125 2023-11-27 14:02:45,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3091153.3333333335, ans=0.125 2023-11-27 14:02:53,464 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.955e+01 8.433e+01 9.032e+01 9.783e+01 1.125e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-27 14:03:03,902 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463700 2023-11-27 14:03:10,135 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6800, loss[loss=0.05325, simple_loss=0.06528, pruned_loss=0.01018, audio_tagging_loss=0.01043, over 15074.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08948, pruned_loss=0.0126, audio_tagging_loss=0.008736, over 3037979.99 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:03:14,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3091353.3333333335, ans=0.0 2023-11-27 14:03:17,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3091353.3333333335, ans=0.125 2023-11-27 14:03:30,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3091420.0, ans=0.0 2023-11-27 14:03:45,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3091553.3333333335, ans=0.125 2023-11-27 14:03:50,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.39 vs. limit=12.0 2023-11-27 14:03:51,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3091553.3333333335, ans=0.125 2023-11-27 14:03:59,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3091620.0, ans=0.125 2023-11-27 14:04:01,341 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463750 2023-11-27 14:04:01,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3091620.0, ans=0.125 2023-11-27 14:04:06,920 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6850, loss[loss=0.06759, simple_loss=0.09234, pruned_loss=0.01345, audio_tagging_loss=0.007963, over 14862.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08942, pruned_loss=0.01272, audio_tagging_loss=0.008667, over 3031909.50 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:04:11,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3091686.6666666665, ans=0.0 2023-11-27 14:04:12,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3091686.6666666665, ans=0.125 2023-11-27 14:04:19,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3091753.3333333335, ans=0.09899494936611666 2023-11-27 14:04:27,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3091753.3333333335, ans=0.1 2023-11-27 14:04:28,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3091820.0, ans=0.125 2023-11-27 14:04:32,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3091820.0, ans=0.125 2023-11-27 14:04:49,936 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.738e+01 9.106e+01 9.965e+01 1.501e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-27 14:04:59,334 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463800 2023-11-27 14:05:05,183 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6900, loss[loss=0.07667, simple_loss=0.1163, pruned_loss=0.01184, audio_tagging_loss=0.0067, over 15602.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08972, pruned_loss=0.0129, audio_tagging_loss=0.008629, over 3036704.38 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:05:08,846 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.65 vs. limit=22.5 2023-11-27 14:05:22,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3092086.6666666665, ans=0.07 2023-11-27 14:05:23,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3092086.6666666665, ans=0.0 2023-11-27 14:05:48,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3092220.0, ans=0.0 2023-11-27 14:05:53,795 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:05:57,189 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463850 2023-11-27 14:06:03,935 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 6950, loss[loss=0.06236, simple_loss=0.09367, pruned_loss=0.009361, audio_tagging_loss=0.006164, over 15147.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08958, pruned_loss=0.01285, audio_tagging_loss=0.008683, over 3030888.36 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:06:18,022 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.50 vs. limit=10.0 2023-11-27 14:06:20,952 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:06:24,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3092420.0, ans=0.2 2023-11-27 14:06:35,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3092486.6666666665, ans=0.0 2023-11-27 14:06:37,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3092553.3333333335, ans=0.05 2023-11-27 14:06:39,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3092553.3333333335, ans=0.0 2023-11-27 14:06:46,185 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.400e+01 9.204e+01 9.755e+01 1.289e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-27 14:06:55,639 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463900 2023-11-27 14:06:55,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3092620.0, ans=0.0 2023-11-27 14:06:58,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3092620.0, ans=0.1 2023-11-27 14:07:01,083 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7000, loss[loss=0.06736, simple_loss=0.09016, pruned_loss=0.0139, audio_tagging_loss=0.008384, over 15156.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08947, pruned_loss=0.01267, audio_tagging_loss=0.008725, over 3032908.83 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:07:02,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3092686.6666666665, ans=0.125 2023-11-27 14:07:18,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3092753.3333333335, ans=0.125 2023-11-27 14:07:38,985 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=22.5 2023-11-27 14:07:42,234 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.97 vs. limit=22.5 2023-11-27 14:07:52,706 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 463950 2023-11-27 14:07:58,830 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7050, loss[loss=0.05979, simple_loss=0.0846, pruned_loss=0.008545, audio_tagging_loss=0.00895, over 16048.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08887, pruned_loss=0.01259, audio_tagging_loss=0.008896, over 3027736.72 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:08:21,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3093153.3333333335, ans=0.125 2023-11-27 14:08:26,983 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.41 vs. limit=15.0 2023-11-27 14:08:34,466 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.18 vs. limit=10.0 2023-11-27 14:08:41,266 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.528e+01 9.043e+01 9.552e+01 1.279e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-27 14:08:50,137 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464000 2023-11-27 14:08:58,751 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7100, loss[loss=0.05848, simple_loss=0.06942, pruned_loss=0.01144, audio_tagging_loss=0.01233, over 15266.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08915, pruned_loss=0.01257, audio_tagging_loss=0.009074, over 3031051.43 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:09:15,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3093420.0, ans=0.5 2023-11-27 14:09:29,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3093486.6666666665, ans=0.0 2023-11-27 14:09:33,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3093553.3333333335, ans=0.125 2023-11-27 14:09:41,732 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=12.0 2023-11-27 14:09:50,873 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464050 2023-11-27 14:09:56,361 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7150, loss[loss=0.07611, simple_loss=0.1067, pruned_loss=0.016, audio_tagging_loss=0.006769, over 16105.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09009, pruned_loss=0.01273, audio_tagging_loss=0.00894, over 3037376.65 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:10:14,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3093753.3333333335, ans=0.1 2023-11-27 14:10:20,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3093820.0, ans=0.2 2023-11-27 14:10:21,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3093820.0, ans=0.125 2023-11-27 14:10:38,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3093886.6666666665, ans=0.125 2023-11-27 14:10:39,817 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.709e+01 9.080e+01 1.002e+02 1.169e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-27 14:10:44,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3093953.3333333335, ans=0.125 2023-11-27 14:10:47,556 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464100 2023-11-27 14:10:47,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3093953.3333333335, ans=0.2 2023-11-27 14:10:48,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3093953.3333333335, ans=0.2 2023-11-27 14:10:53,057 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7200, loss[loss=0.04973, simple_loss=0.06234, pruned_loss=0.007179, audio_tagging_loss=0.01138, over 13959.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.08992, pruned_loss=0.01281, audio_tagging_loss=0.009065, over 3038391.94 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:11:25,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3094153.3333333335, ans=0.2 2023-11-27 14:11:36,001 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.06 vs. limit=15.0 2023-11-27 14:11:45,190 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464150 2023-11-27 14:11:48,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3094286.6666666665, ans=0.0 2023-11-27 14:11:50,682 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7250, loss[loss=0.05323, simple_loss=0.0768, pruned_loss=0.006595, audio_tagging_loss=0.008237, over 15945.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08886, pruned_loss=0.01249, audio_tagging_loss=0.009152, over 3038807.18 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:12:03,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3094420.0, ans=0.1 2023-11-27 14:12:06,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3094420.0, ans=0.125 2023-11-27 14:12:25,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3094553.3333333335, ans=0.125 2023-11-27 14:12:29,508 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.83 vs. limit=10.0 2023-11-27 14:12:34,348 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.560e+01 9.107e+01 9.786e+01 1.290e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-27 14:12:43,286 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464200 2023-11-27 14:12:49,031 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7300, loss[loss=0.06184, simple_loss=0.09287, pruned_loss=0.009366, audio_tagging_loss=0.006037, over 15053.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08896, pruned_loss=0.01243, audio_tagging_loss=0.009058, over 3037644.54 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:13:01,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3094753.3333333335, ans=0.2 2023-11-27 14:13:16,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3094820.0, ans=0.0 2023-11-27 14:13:40,364 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464250 2023-11-27 14:13:43,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3094953.3333333335, ans=0.0 2023-11-27 14:13:45,828 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7350, loss[loss=0.06732, simple_loss=0.0977, pruned_loss=0.0112, audio_tagging_loss=0.007271, over 13390.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09066, pruned_loss=0.01284, audio_tagging_loss=0.008901, over 3043200.02 frames. ], batch size: 52, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:13:46,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3095020.0, ans=0.09899494936611666 2023-11-27 14:13:53,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3095020.0, ans=0.2 2023-11-27 14:14:01,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3095086.6666666665, ans=0.125 2023-11-27 14:14:23,664 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=22.5 2023-11-27 14:14:29,929 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.691e+01 9.417e+01 9.998e+01 1.354e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-27 14:14:30,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3095220.0, ans=0.125 2023-11-27 14:14:37,797 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464300 2023-11-27 14:14:43,847 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7400, loss[loss=0.07004, simple_loss=0.1048, pruned_loss=0.01039, audio_tagging_loss=0.007247, over 15503.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09041, pruned_loss=0.01276, audio_tagging_loss=0.008777, over 3051289.68 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:14:49,564 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=12.0 2023-11-27 14:15:04,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3095420.0, ans=0.125 2023-11-27 14:15:13,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3095486.6666666665, ans=0.125 2023-11-27 14:15:25,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3095553.3333333335, ans=0.125 2023-11-27 14:15:30,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3095620.0, ans=0.125 2023-11-27 14:15:36,763 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464350 2023-11-27 14:15:36,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3095620.0, ans=0.1 2023-11-27 14:15:42,170 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7450, loss[loss=0.05109, simple_loss=0.06865, pruned_loss=0.00881, audio_tagging_loss=0.007961, over 14801.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09022, pruned_loss=0.01275, audio_tagging_loss=0.00875, over 3049243.38 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:15:57,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3095753.3333333335, ans=0.95 2023-11-27 14:16:21,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=3095886.6666666665, ans=15.0 2023-11-27 14:16:25,277 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.09 vs. limit=15.0 2023-11-27 14:16:25,874 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 8.639e+01 9.279e+01 9.819e+01 1.205e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-27 14:16:29,737 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=15.0 2023-11-27 14:16:32,951 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.11 vs. limit=22.5 2023-11-27 14:16:33,611 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464400 2023-11-27 14:16:33,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3095953.3333333335, ans=0.5 2023-11-27 14:16:39,345 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7500, loss[loss=0.07182, simple_loss=0.09119, pruned_loss=0.01347, audio_tagging_loss=0.01276, over 14785.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09062, pruned_loss=0.01286, audio_tagging_loss=0.008748, over 3052993.77 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:16:46,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3096020.0, ans=0.125 2023-11-27 14:16:47,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3096020.0, ans=0.1 2023-11-27 14:16:54,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3096086.6666666665, ans=0.125 2023-11-27 14:16:54,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3096086.6666666665, ans=0.2 2023-11-27 14:16:58,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3096086.6666666665, ans=0.125 2023-11-27 14:17:16,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3096220.0, ans=0.125 2023-11-27 14:17:20,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3096220.0, ans=0.1 2023-11-27 14:17:31,934 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464450 2023-11-27 14:17:37,443 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7550, loss[loss=0.06812, simple_loss=0.09935, pruned_loss=0.0115, audio_tagging_loss=0.00694, over 14382.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09052, pruned_loss=0.01286, audio_tagging_loss=0.008747, over 3049785.52 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:17:39,057 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-11-27 14:17:47,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3096353.3333333335, ans=0.0 2023-11-27 14:18:12,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3096553.3333333335, ans=0.125 2023-11-27 14:18:16,798 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2023-11-27 14:18:22,770 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.787e+01 9.439e+01 1.010e+02 1.313e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-27 14:18:25,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3096620.0, ans=0.0 2023-11-27 14:18:31,285 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464500 2023-11-27 14:18:31,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3096620.0, ans=0.1 2023-11-27 14:18:37,278 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7600, loss[loss=0.06382, simple_loss=0.09568, pruned_loss=0.01038, audio_tagging_loss=0.0056, over 15653.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09089, pruned_loss=0.01287, audio_tagging_loss=0.008661, over 3048614.54 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:19:06,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3096820.0, ans=0.015 2023-11-27 14:19:06,709 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-27 14:19:12,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=3096886.6666666665, ans=12.0 2023-11-27 14:19:27,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3096953.3333333335, ans=0.125 2023-11-27 14:19:28,829 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464550 2023-11-27 14:19:34,234 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7650, loss[loss=0.07474, simple_loss=0.1065, pruned_loss=0.01465, audio_tagging_loss=0.00682, over 15495.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09061, pruned_loss=0.0126, audio_tagging_loss=0.008615, over 3048342.72 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:19:37,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3097020.0, ans=0.2 2023-11-27 14:19:41,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3097020.0, ans=0.1 2023-11-27 14:19:45,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3097086.6666666665, ans=0.125 2023-11-27 14:19:47,280 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.77 vs. limit=22.5 2023-11-27 14:20:18,859 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.470e+01 8.990e+01 9.726e+01 1.372e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-27 14:20:25,461 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464600 2023-11-27 14:20:31,206 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7700, loss[loss=0.05951, simple_loss=0.0796, pruned_loss=0.009379, audio_tagging_loss=0.01033, over 14550.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09033, pruned_loss=0.01264, audio_tagging_loss=0.008676, over 3044380.60 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:20:52,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3097420.0, ans=0.0 2023-11-27 14:20:54,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3097486.6666666665, ans=0.1 2023-11-27 14:20:56,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3097486.6666666665, ans=0.125 2023-11-27 14:21:02,639 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=12.0 2023-11-27 14:21:11,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3097553.3333333335, ans=0.125 2023-11-27 14:21:18,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3097620.0, ans=0.2 2023-11-27 14:21:23,388 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464650 2023-11-27 14:21:30,602 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7750, loss[loss=0.04906, simple_loss=0.06092, pruned_loss=0.007298, audio_tagging_loss=0.0113, over 15474.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09059, pruned_loss=0.01261, audio_tagging_loss=0.008763, over 3050320.14 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:21:33,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3097686.6666666665, ans=0.0 2023-11-27 14:21:56,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3097820.0, ans=0.0 2023-11-27 14:22:03,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3097886.6666666665, ans=0.5 2023-11-27 14:22:06,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3097886.6666666665, ans=0.07 2023-11-27 14:22:08,641 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.80 vs. limit=15.0 2023-11-27 14:22:15,502 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.225e+01 8.645e+01 9.369e+01 1.003e+02 1.399e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 14:22:17,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2023-11-27 14:22:22,165 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464700 2023-11-27 14:22:25,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3097953.3333333335, ans=0.125 2023-11-27 14:22:27,486 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7800, loss[loss=0.07525, simple_loss=0.1089, pruned_loss=0.01252, audio_tagging_loss=0.008294, over 14348.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09144, pruned_loss=0.01258, audio_tagging_loss=0.008804, over 3055828.08 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:22:40,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3098086.6666666665, ans=0.2 2023-11-27 14:22:48,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3098153.3333333335, ans=0.125 2023-11-27 14:23:07,539 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=22.5 2023-11-27 14:23:10,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3098220.0, ans=0.07 2023-11-27 14:23:14,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3098286.6666666665, ans=0.125 2023-11-27 14:23:17,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3098286.6666666665, ans=0.1 2023-11-27 14:23:19,324 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464750 2023-11-27 14:23:24,834 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7850, loss[loss=0.0625, simple_loss=0.09093, pruned_loss=0.01035, audio_tagging_loss=0.006687, over 15894.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09137, pruned_loss=0.01254, audio_tagging_loss=0.008843, over 3057860.02 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:23:42,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3098420.0, ans=0.05 2023-11-27 14:23:52,961 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=15.0 2023-11-27 14:24:10,076 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.600e+01 9.119e+01 9.772e+01 1.362e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 14:24:12,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3098620.0, ans=0.1 2023-11-27 14:24:17,267 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464800 2023-11-27 14:24:24,263 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7900, loss[loss=0.06277, simple_loss=0.08636, pruned_loss=0.009552, audio_tagging_loss=0.01004, over 15512.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.09149, pruned_loss=0.01258, audio_tagging_loss=0.008884, over 3063314.50 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:24:24,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3098686.6666666665, ans=0.1 2023-11-27 14:24:27,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3098686.6666666665, ans=0.125 2023-11-27 14:24:32,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3098686.6666666665, ans=0.035 2023-11-27 14:24:44,528 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-11-27 14:24:50,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3098820.0, ans=0.125 2023-11-27 14:25:16,360 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464850 2023-11-27 14:25:19,563 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=22.5 2023-11-27 14:25:22,380 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 7950, loss[loss=0.05201, simple_loss=0.06499, pruned_loss=0.01073, audio_tagging_loss=0.008783, over 15329.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09086, pruned_loss=0.01256, audio_tagging_loss=0.008916, over 3053755.43 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:25:30,651 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.57 vs. limit=10.0 2023-11-27 14:25:34,908 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.41 vs. limit=22.5 2023-11-27 14:25:38,869 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:25:51,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3099153.3333333335, ans=0.2 2023-11-27 14:25:56,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3099220.0, ans=0.125 2023-11-27 14:26:00,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3099220.0, ans=0.2 2023-11-27 14:26:05,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3099220.0, ans=0.125 2023-11-27 14:26:07,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3099220.0, ans=15.0 2023-11-27 14:26:07,811 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.621e+01 8.980e+01 9.722e+01 1.502e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-27 14:26:14,598 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464900 2023-11-27 14:26:20,175 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8000, loss[loss=0.06407, simple_loss=0.08815, pruned_loss=0.01246, audio_tagging_loss=0.007535, over 15401.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09006, pruned_loss=0.01245, audio_tagging_loss=0.009051, over 3055147.49 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:26:26,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3099353.3333333335, ans=0.1 2023-11-27 14:26:31,259 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=12.0 2023-11-27 14:26:37,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3099420.0, ans=0.0 2023-11-27 14:26:48,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3099486.6666666665, ans=0.125 2023-11-27 14:27:12,123 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 464950 2023-11-27 14:27:13,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3099620.0, ans=0.125 2023-11-27 14:27:18,047 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8050, loss[loss=0.05293, simple_loss=0.07636, pruned_loss=0.006571, audio_tagging_loss=0.008172, over 15040.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09, pruned_loss=0.01245, audio_tagging_loss=0.009088, over 3054230.64 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:27:28,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3099686.6666666665, ans=22.5 2023-11-27 14:27:37,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3099753.3333333335, ans=0.0 2023-11-27 14:28:00,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3099886.6666666665, ans=0.0 2023-11-27 14:28:04,067 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.865e+01 8.542e+01 9.133e+01 9.654e+01 1.190e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-27 14:28:07,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3099953.3333333335, ans=0.125 2023-11-27 14:28:11,413 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465000 2023-11-27 14:28:14,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3099953.3333333335, ans=0.1 2023-11-27 14:28:17,189 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8100, loss[loss=0.06719, simple_loss=0.09363, pruned_loss=0.01044, audio_tagging_loss=0.009928, over 15895.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09066, pruned_loss=0.01245, audio_tagging_loss=0.009038, over 3048564.70 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:28:22,668 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:29:02,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3100286.6666666665, ans=0.2 2023-11-27 14:29:09,954 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465050 2023-11-27 14:29:15,404 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8150, loss[loss=0.06247, simple_loss=0.08356, pruned_loss=0.0126, audio_tagging_loss=0.008092, over 15874.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.0901, pruned_loss=0.01245, audio_tagging_loss=0.008907, over 3046302.14 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:29:17,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3100353.3333333335, ans=0.2 2023-11-27 14:29:17,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3100353.3333333335, ans=0.1 2023-11-27 14:29:20,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3100353.3333333335, ans=0.125 2023-11-27 14:29:23,558 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.06 vs. limit=15.0 2023-11-27 14:29:24,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3100353.3333333335, ans=0.0 2023-11-27 14:29:45,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3100486.6666666665, ans=0.035 2023-11-27 14:29:59,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3100553.3333333335, ans=15.0 2023-11-27 14:30:01,438 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 8.694e+01 9.359e+01 9.958e+01 1.190e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-27 14:30:02,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3100620.0, ans=0.0 2023-11-27 14:30:07,003 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465100 2023-11-27 14:30:10,631 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=22.5 2023-11-27 14:30:12,880 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8200, loss[loss=0.07801, simple_loss=0.1123, pruned_loss=0.01428, audio_tagging_loss=0.007594, over 16305.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.0897, pruned_loss=0.01241, audio_tagging_loss=0.008874, over 3038150.96 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:30:17,811 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:30:22,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3100686.6666666665, ans=0.0 2023-11-27 14:30:22,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3100686.6666666665, ans=0.1 2023-11-27 14:30:33,206 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=12.0 2023-11-27 14:30:33,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3100753.3333333335, ans=0.125 2023-11-27 14:30:41,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3100820.0, ans=0.125 2023-11-27 14:30:46,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3100886.6666666665, ans=0.2 2023-11-27 14:30:52,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3100886.6666666665, ans=0.1 2023-11-27 14:30:53,337 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:31:05,954 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465150 2023-11-27 14:31:11,295 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8250, loss[loss=0.07532, simple_loss=0.1121, pruned_loss=0.01163, audio_tagging_loss=0.007617, over 16246.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08908, pruned_loss=0.01225, audio_tagging_loss=0.00889, over 3046149.51 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:31:20,879 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.60 vs. limit=8.0 2023-11-27 14:31:57,583 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.825e+01 8.477e+01 9.033e+01 1.006e+02 1.389e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-27 14:31:58,337 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.65 vs. limit=22.5 2023-11-27 14:32:03,119 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465200 2023-11-27 14:32:07,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3101353.3333333335, ans=0.125 2023-11-27 14:32:09,414 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8300, loss[loss=0.05991, simple_loss=0.07934, pruned_loss=0.01103, audio_tagging_loss=0.00921, over 15287.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08896, pruned_loss=0.0123, audio_tagging_loss=0.008896, over 3044264.62 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:32:11,823 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:32:11,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3101353.3333333335, ans=0.0 2023-11-27 14:32:16,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3101353.3333333335, ans=0.0 2023-11-27 14:32:44,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3101553.3333333335, ans=0.2 2023-11-27 14:33:01,345 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465250 2023-11-27 14:33:02,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3101620.0, ans=0.0 2023-11-27 14:33:06,778 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8350, loss[loss=0.07842, simple_loss=0.1021, pruned_loss=0.01834, audio_tagging_loss=0.009016, over 14898.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08927, pruned_loss=0.0124, audio_tagging_loss=0.008828, over 3052565.90 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:33:19,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3101753.3333333335, ans=0.1 2023-11-27 14:33:21,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3101753.3333333335, ans=0.1 2023-11-27 14:33:49,081 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:33:53,580 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 8.419e+01 8.984e+01 9.870e+01 1.325e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-27 14:33:59,673 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465300 2023-11-27 14:33:59,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3101953.3333333335, ans=0.0 2023-11-27 14:34:05,797 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8400, loss[loss=0.06254, simple_loss=0.09179, pruned_loss=0.008558, audio_tagging_loss=0.00808, over 14182.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08851, pruned_loss=0.01235, audio_tagging_loss=0.008728, over 3051282.94 frames. ], batch size: 52, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:34:23,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3102086.6666666665, ans=0.125 2023-11-27 14:34:27,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3102153.3333333335, ans=0.95 2023-11-27 14:34:27,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3102153.3333333335, ans=0.0 2023-11-27 14:34:37,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-27 14:34:39,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3102220.0, ans=0.125 2023-11-27 14:34:43,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2023-11-27 14:34:52,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3102286.6666666665, ans=0.125 2023-11-27 14:34:57,649 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465350 2023-11-27 14:35:03,066 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8450, loss[loss=0.04631, simple_loss=0.06418, pruned_loss=0.007861, audio_tagging_loss=0.006354, over 14062.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08976, pruned_loss=0.01281, audio_tagging_loss=0.008695, over 3052064.95 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:35:27,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3102486.6666666665, ans=0.0 2023-11-27 14:35:28,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3102486.6666666665, ans=0.125 2023-11-27 14:35:49,751 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.831e+01 8.838e+01 9.408e+01 1.009e+02 1.207e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 14:35:49,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3102620.0, ans=0.1 2023-11-27 14:35:51,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3102620.0, ans=0.125 2023-11-27 14:35:55,427 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465400 2023-11-27 14:36:01,366 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.33 vs. limit=22.5 2023-11-27 14:36:01,992 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8500, loss[loss=0.06552, simple_loss=0.09171, pruned_loss=0.01188, audio_tagging_loss=0.007783, over 14872.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08976, pruned_loss=0.01278, audio_tagging_loss=0.008692, over 3046825.68 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:36:02,652 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.40 vs. limit=10.0 2023-11-27 14:36:03,278 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:36:27,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3102820.0, ans=0.125 2023-11-27 14:36:36,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3102886.6666666665, ans=0.1 2023-11-27 14:36:53,803 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465450 2023-11-27 14:37:00,392 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8550, loss[loss=0.06319, simple_loss=0.08193, pruned_loss=0.01234, audio_tagging_loss=0.009878, over 14559.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08978, pruned_loss=0.01268, audio_tagging_loss=0.008755, over 3045508.51 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:37:26,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3103153.3333333335, ans=0.125 2023-11-27 14:37:34,002 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=15.0 2023-11-27 14:37:43,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3103220.0, ans=0.125 2023-11-27 14:37:47,512 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.683e+01 9.146e+01 9.913e+01 1.274e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-27 14:37:52,102 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465500 2023-11-27 14:37:57,533 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8600, loss[loss=0.06199, simple_loss=0.08567, pruned_loss=0.01153, audio_tagging_loss=0.007617, over 15382.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08972, pruned_loss=0.01261, audio_tagging_loss=0.008828, over 3042386.19 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:38:12,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3103420.0, ans=0.0 2023-11-27 14:38:16,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=12.0 2023-11-27 14:38:18,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3103420.0, ans=0.125 2023-11-27 14:38:22,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3103486.6666666665, ans=0.1 2023-11-27 14:38:23,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3103486.6666666665, ans=0.0 2023-11-27 14:38:26,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3103486.6666666665, ans=0.125 2023-11-27 14:38:49,633 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465550 2023-11-27 14:38:51,218 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.43 vs. limit=10.0 2023-11-27 14:38:55,045 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8650, loss[loss=0.09584, simple_loss=0.1305, pruned_loss=0.02274, audio_tagging_loss=0.00786, over 15442.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09023, pruned_loss=0.01275, audio_tagging_loss=0.008929, over 3040475.35 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:39:14,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0 2023-11-27 14:39:42,544 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.459e+01 9.176e+01 1.006e+02 1.194e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-27 14:39:48,287 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465600 2023-11-27 14:39:55,214 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8700, loss[loss=0.04804, simple_loss=0.05431, pruned_loss=0.007406, audio_tagging_loss=0.01347, over 15307.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09109, pruned_loss=0.01273, audio_tagging_loss=0.008997, over 3042486.42 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:39:55,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3104020.0, ans=0.1 2023-11-27 14:40:04,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3104020.0, ans=0.1 2023-11-27 14:40:11,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3104086.6666666665, ans=0.125 2023-11-27 14:40:19,519 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 14:40:24,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3104153.3333333335, ans=0.125 2023-11-27 14:40:46,527 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465650 2023-11-27 14:40:51,973 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8750, loss[loss=0.05099, simple_loss=0.06429, pruned_loss=0.009526, audio_tagging_loss=0.009322, over 15687.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09131, pruned_loss=0.0128, audio_tagging_loss=0.009021, over 3040515.68 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:40:55,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3104353.3333333335, ans=0.125 2023-11-27 14:40:56,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3104353.3333333335, ans=0.125 2023-11-27 14:40:56,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3104353.3333333335, ans=0.0 2023-11-27 14:40:56,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3104353.3333333335, ans=0.5 2023-11-27 14:40:58,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3104353.3333333335, ans=0.5 2023-11-27 14:41:04,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3104420.0, ans=0.1 2023-11-27 14:41:05,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3104420.0, ans=0.1 2023-11-27 14:41:07,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3104420.0, ans=0.0 2023-11-27 14:41:13,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3104486.6666666665, ans=0.2 2023-11-27 14:41:29,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3104553.3333333335, ans=0.125 2023-11-27 14:41:31,262 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.69 vs. limit=22.5 2023-11-27 14:41:32,438 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2023-11-27 14:41:35,382 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.87 vs. limit=15.0 2023-11-27 14:41:39,398 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.863e+01 8.832e+01 9.228e+01 1.004e+02 1.241e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 14:41:43,941 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465700 2023-11-27 14:41:49,416 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8800, loss[loss=0.04263, simple_loss=0.05368, pruned_loss=0.007098, audio_tagging_loss=0.008696, over 15197.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09099, pruned_loss=0.0127, audio_tagging_loss=0.009071, over 3039683.42 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:41:52,153 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.68 vs. limit=15.0 2023-11-27 14:42:41,314 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465750 2023-11-27 14:42:46,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3105020.0, ans=0.2 2023-11-27 14:42:47,784 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8850, loss[loss=0.0675, simple_loss=0.09321, pruned_loss=0.01419, audio_tagging_loss=0.006715, over 15589.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09036, pruned_loss=0.01259, audio_tagging_loss=0.009047, over 3047199.23 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:42:52,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3105020.0, ans=0.1 2023-11-27 14:42:52,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3105020.0, ans=0.125 2023-11-27 14:42:59,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3105086.6666666665, ans=0.07 2023-11-27 14:43:03,122 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:43:17,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3105153.3333333335, ans=0.2 2023-11-27 14:43:17,872 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-27 14:43:31,810 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.39 vs. limit=15.0 2023-11-27 14:43:35,669 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.774e+01 9.216e+01 1.007e+02 1.244e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-27 14:43:40,101 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465800 2023-11-27 14:43:40,452 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=12.0 2023-11-27 14:43:43,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3105286.6666666665, ans=0.5 2023-11-27 14:43:45,772 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8900, loss[loss=0.07547, simple_loss=0.1094, pruned_loss=0.015, audio_tagging_loss=0.005789, over 15265.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.08995, pruned_loss=0.01252, audio_tagging_loss=0.008929, over 3052577.91 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:43:49,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3105353.3333333335, ans=0.05 2023-11-27 14:44:06,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3105486.6666666665, ans=0.125 2023-11-27 14:44:07,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3105486.6666666665, ans=0.125 2023-11-27 14:44:17,042 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.32 vs. limit=6.0 2023-11-27 14:44:18,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3105486.6666666665, ans=0.2 2023-11-27 14:44:29,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3105553.3333333335, ans=0.0 2023-11-27 14:44:36,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3105620.0, ans=15.0 2023-11-27 14:44:36,780 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465850 2023-11-27 14:44:40,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3105620.0, ans=0.125 2023-11-27 14:44:42,239 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 8950, loss[loss=0.05754, simple_loss=0.07545, pruned_loss=0.01132, audio_tagging_loss=0.008492, over 15175.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09088, pruned_loss=0.01254, audio_tagging_loss=0.008704, over 3051527.62 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:44:51,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3105686.6666666665, ans=0.125 2023-11-27 14:44:55,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3105753.3333333335, ans=0.0 2023-11-27 14:45:29,484 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.308e+01 8.633e+01 9.411e+01 1.036e+02 1.341e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 14:45:34,024 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465900 2023-11-27 14:45:39,977 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9000, loss[loss=0.08568, simple_loss=0.1245, pruned_loss=0.01884, audio_tagging_loss=0.004583, over 15439.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09063, pruned_loss=0.01253, audio_tagging_loss=0.008657, over 3049923.53 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:45:39,978 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 14:45:53,168 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([4.0333, 3.3198, 3.0257, 3.7273, 3.3152, 3.4321, 3.6508, 3.1620], device='cuda:2') 2023-11-27 14:46:02,769 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4304, 3.8013, 3.0063, 3.7502], device='cuda:2') 2023-11-27 14:46:08,622 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1640, 3.9754, 3.8185, 3.2184], device='cuda:2') 2023-11-27 14:46:15,059 INFO [train_asr.py:1267] (2/4) Epoch 39, validation: loss=0.05878, simple_loss=0.0507, pruned_loss=0.005237, audio_tagging_loss=0.02819, over 4681554.00 frames. 2023-11-27 14:46:15,060 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 14:46:21,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3106020.0, ans=0.0 2023-11-27 14:46:24,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3106020.0, ans=0.125 2023-11-27 14:46:47,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3106153.3333333335, ans=0.0 2023-11-27 14:47:06,651 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 465950 2023-11-27 14:47:11,960 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9050, loss[loss=0.07771, simple_loss=0.1106, pruned_loss=0.01434, audio_tagging_loss=0.008086, over 16107.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09025, pruned_loss=0.01252, audio_tagging_loss=0.008616, over 3047912.99 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:47:36,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3106486.6666666665, ans=0.125 2023-11-27 14:47:48,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3106553.3333333335, ans=0.125 2023-11-27 14:47:55,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3106553.3333333335, ans=0.5 2023-11-27 14:47:59,627 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.800e+01 9.356e+01 9.893e+01 1.212e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-27 14:48:00,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3106620.0, ans=0.0 2023-11-27 14:48:04,076 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466000 2023-11-27 14:48:09,913 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.24 vs. limit=10.0 2023-11-27 14:48:10,414 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9100, loss[loss=0.08513, simple_loss=0.129, pruned_loss=0.01384, audio_tagging_loss=0.006782, over 15905.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09055, pruned_loss=0.01245, audio_tagging_loss=0.008568, over 3046193.86 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:48:37,987 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2023-11-27 14:48:49,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3106886.6666666665, ans=0.125 2023-11-27 14:49:03,700 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466050 2023-11-27 14:49:09,110 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9150, loss[loss=0.07803, simple_loss=0.1076, pruned_loss=0.01394, audio_tagging_loss=0.01027, over 16387.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08996, pruned_loss=0.01232, audio_tagging_loss=0.008603, over 3043256.85 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:49:09,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3107020.0, ans=0.125 2023-11-27 14:49:29,693 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.45 vs. limit=15.0 2023-11-27 14:49:33,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3107153.3333333335, ans=0.125 2023-11-27 14:49:36,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3107153.3333333335, ans=0.1 2023-11-27 14:49:48,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3107220.0, ans=0.0 2023-11-27 14:49:57,781 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.571e+01 9.032e+01 9.849e+01 1.366e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-27 14:50:01,176 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466100 2023-11-27 14:50:02,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3107286.6666666665, ans=0.125 2023-11-27 14:50:06,683 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9200, loss[loss=0.07797, simple_loss=0.1052, pruned_loss=0.01564, audio_tagging_loss=0.009732, over 14950.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08983, pruned_loss=0.01252, audio_tagging_loss=0.008655, over 3032659.33 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:50:21,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3107420.0, ans=0.035 2023-11-27 14:50:30,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3107486.6666666665, ans=0.125 2023-11-27 14:50:52,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3107620.0, ans=0.125 2023-11-27 14:50:53,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3107620.0, ans=0.125 2023-11-27 14:50:58,615 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466150 2023-11-27 14:51:04,608 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9250, loss[loss=0.07989, simple_loss=0.1096, pruned_loss=0.0177, audio_tagging_loss=0.007391, over 16035.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09038, pruned_loss=0.01266, audio_tagging_loss=0.008624, over 3035068.41 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:51:13,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3107686.6666666665, ans=0.2 2023-11-27 14:51:39,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3107886.6666666665, ans=0.125 2023-11-27 14:51:49,105 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2023-11-27 14:51:55,289 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.711e+01 9.233e+01 9.983e+01 1.314e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-27 14:51:57,568 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466200 2023-11-27 14:52:03,304 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9300, loss[loss=0.08176, simple_loss=0.1214, pruned_loss=0.01214, audio_tagging_loss=0.008904, over 14700.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09019, pruned_loss=0.01271, audio_tagging_loss=0.008629, over 3037347.27 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:52:17,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3108086.6666666665, ans=0.2 2023-11-27 14:52:28,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3108153.3333333335, ans=0.2 2023-11-27 14:52:30,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3108153.3333333335, ans=0.0 2023-11-27 14:52:32,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3108153.3333333335, ans=0.0 2023-11-27 14:52:40,227 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2023-11-27 14:52:52,253 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.60 vs. limit=15.0 2023-11-27 14:52:54,963 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466250 2023-11-27 14:53:00,962 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9350, loss[loss=0.05607, simple_loss=0.07954, pruned_loss=0.007435, audio_tagging_loss=0.008866, over 14735.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09008, pruned_loss=0.01269, audio_tagging_loss=0.008686, over 3028421.80 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:53:11,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3108420.0, ans=10.0 2023-11-27 14:53:27,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3108486.6666666665, ans=0.5 2023-11-27 14:53:43,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3108553.3333333335, ans=0.025 2023-11-27 14:53:49,986 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.258e+01 8.694e+01 9.307e+01 9.983e+01 1.185e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 14:53:52,257 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466300 2023-11-27 14:53:58,158 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9400, loss[loss=0.05897, simple_loss=0.07931, pruned_loss=0.01176, audio_tagging_loss=0.007556, over 15566.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08958, pruned_loss=0.01255, audio_tagging_loss=0.008856, over 3034277.13 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:54:09,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3108753.3333333335, ans=0.0 2023-11-27 14:54:35,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3108886.6666666665, ans=0.125 2023-11-27 14:54:37,848 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.87 vs. limit=15.0 2023-11-27 14:54:51,325 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466350 2023-11-27 14:54:56,779 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9450, loss[loss=0.06331, simple_loss=0.07414, pruned_loss=0.01513, audio_tagging_loss=0.01112, over 15212.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08982, pruned_loss=0.01256, audio_tagging_loss=0.008786, over 3035464.54 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:55:00,126 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 14:55:01,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3109020.0, ans=0.1 2023-11-27 14:55:10,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3109086.6666666665, ans=0.1 2023-11-27 14:55:17,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3109086.6666666665, ans=0.0 2023-11-27 14:55:29,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3109153.3333333335, ans=0.2 2023-11-27 14:55:41,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3109220.0, ans=0.125 2023-11-27 14:55:46,756 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.803e+01 9.221e+01 9.903e+01 1.293e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-27 14:55:49,041 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466400 2023-11-27 14:55:54,765 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9500, loss[loss=0.05512, simple_loss=0.06457, pruned_loss=0.01325, audio_tagging_loss=0.00958, over 15874.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09012, pruned_loss=0.01255, audio_tagging_loss=0.008838, over 3042047.44 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:56:03,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3109353.3333333335, ans=0.0 2023-11-27 14:56:11,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3109420.0, ans=0.04949747468305833 2023-11-27 14:56:15,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3109420.0, ans=0.125 2023-11-27 14:56:47,015 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466450 2023-11-27 14:56:47,576 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2023-11-27 14:56:52,505 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9550, loss[loss=0.07228, simple_loss=0.1025, pruned_loss=0.0152, audio_tagging_loss=0.005806, over 15555.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09029, pruned_loss=0.01249, audio_tagging_loss=0.008917, over 3049269.30 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 14:56:52,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3109686.6666666665, ans=0.125 2023-11-27 14:57:00,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3109686.6666666665, ans=0.125 2023-11-27 14:57:08,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3109753.3333333335, ans=0.125 2023-11-27 14:57:09,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3109753.3333333335, ans=0.125 2023-11-27 14:57:14,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3109753.3333333335, ans=0.0 2023-11-27 14:57:20,987 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.90 vs. limit=10.0 2023-11-27 14:57:42,551 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.757e+01 8.709e+01 9.251e+01 1.020e+02 1.249e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 14:57:45,227 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466500 2023-11-27 14:57:51,305 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9600, loss[loss=0.04762, simple_loss=0.06047, pruned_loss=0.00664, audio_tagging_loss=0.01074, over 15974.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09108, pruned_loss=0.01255, audio_tagging_loss=0.008947, over 3052921.56 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:57:53,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3110020.0, ans=0.0 2023-11-27 14:58:15,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3110153.3333333335, ans=0.125 2023-11-27 14:58:19,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3110153.3333333335, ans=0.0 2023-11-27 14:58:22,278 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2023-11-27 14:58:23,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3110153.3333333335, ans=0.125 2023-11-27 14:58:27,919 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2023-11-27 14:58:42,782 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466550 2023-11-27 14:58:48,147 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9650, loss[loss=0.06558, simple_loss=0.08187, pruned_loss=0.01639, audio_tagging_loss=0.008256, over 15861.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09054, pruned_loss=0.01266, audio_tagging_loss=0.00893, over 3044435.23 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 14:58:48,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3110353.3333333335, ans=0.125 2023-11-27 14:58:50,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3110353.3333333335, ans=0.1 2023-11-27 14:58:56,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3110353.3333333335, ans=0.125 2023-11-27 14:59:37,455 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.159e+01 8.987e+01 9.663e+01 1.056e+02 1.477e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-27 14:59:37,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3110620.0, ans=0.0 2023-11-27 14:59:39,681 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466600 2023-11-27 14:59:46,070 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9700, loss[loss=0.04491, simple_loss=0.05629, pruned_loss=0.0084, audio_tagging_loss=0.008361, over 14795.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09082, pruned_loss=0.01287, audio_tagging_loss=0.008781, over 3046640.86 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:00:11,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3110820.0, ans=0.0 2023-11-27 15:00:27,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3110886.6666666665, ans=0.0 2023-11-27 15:00:27,545 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:00:38,197 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466650 2023-11-27 15:00:42,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3110953.3333333335, ans=0.125 2023-11-27 15:00:44,711 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9750, loss[loss=0.05463, simple_loss=0.07589, pruned_loss=0.008458, audio_tagging_loss=0.008228, over 15812.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09082, pruned_loss=0.0127, audio_tagging_loss=0.008697, over 3049443.68 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:00:47,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3111020.0, ans=10.0 2023-11-27 15:00:48,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3111020.0, ans=0.125 2023-11-27 15:00:52,829 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.51 vs. limit=12.0 2023-11-27 15:01:03,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3111086.6666666665, ans=0.0 2023-11-27 15:01:06,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3111153.3333333335, ans=0.0 2023-11-27 15:01:10,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3111153.3333333335, ans=0.1 2023-11-27 15:01:35,382 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.839e+01 8.575e+01 9.224e+01 9.953e+01 1.254e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-27 15:01:36,582 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466700 2023-11-27 15:01:41,859 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9800, loss[loss=0.07081, simple_loss=0.08657, pruned_loss=0.01661, audio_tagging_loss=0.01091, over 14816.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09028, pruned_loss=0.01263, audio_tagging_loss=0.008641, over 3045717.86 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:01:50,112 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.80 vs. limit=10.0 2023-11-27 15:02:05,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3111486.6666666665, ans=0.125 2023-11-27 15:02:09,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3111486.6666666665, ans=0.1 2023-11-27 15:02:21,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3111553.3333333335, ans=0.1 2023-11-27 15:02:32,993 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466750 2023-11-27 15:02:36,200 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:02:38,370 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9850, loss[loss=0.05239, simple_loss=0.06687, pruned_loss=0.008497, audio_tagging_loss=0.01046, over 16662.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09059, pruned_loss=0.01265, audio_tagging_loss=0.008596, over 3041565.15 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:02:41,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3111686.6666666665, ans=6.0 2023-11-27 15:03:01,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3111820.0, ans=0.125 2023-11-27 15:03:05,762 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=12.0 2023-11-27 15:03:10,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3111820.0, ans=0.125 2023-11-27 15:03:29,428 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.745e+01 9.208e+01 1.002e+02 1.325e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 15:03:30,672 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466800 2023-11-27 15:03:36,883 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9900, loss[loss=0.07854, simple_loss=0.09695, pruned_loss=0.02145, audio_tagging_loss=0.008609, over 14642.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09071, pruned_loss=0.0128, audio_tagging_loss=0.008592, over 3044862.39 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:04:04,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3112153.3333333335, ans=0.1 2023-11-27 15:04:29,157 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466850 2023-11-27 15:04:29,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3112286.6666666665, ans=0.0 2023-11-27 15:04:34,670 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 9950, loss[loss=0.06338, simple_loss=0.08349, pruned_loss=0.01182, audio_tagging_loss=0.009818, over 15979.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09038, pruned_loss=0.01271, audio_tagging_loss=0.00864, over 3045058.62 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:04:35,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3112353.3333333335, ans=0.1 2023-11-27 15:04:49,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3112420.0, ans=0.0 2023-11-27 15:05:09,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3112553.3333333335, ans=0.1 2023-11-27 15:05:15,508 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=15.0 2023-11-27 15:05:21,106 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2023-11-27 15:05:24,777 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.583e+01 9.133e+01 9.835e+01 1.182e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-27 15:05:25,382 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.87 vs. limit=15.0 2023-11-27 15:05:25,943 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466900 2023-11-27 15:05:31,312 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10000, loss[loss=0.09243, simple_loss=0.1294, pruned_loss=0.02102, audio_tagging_loss=0.00672, over 15520.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09059, pruned_loss=0.01275, audio_tagging_loss=0.008614, over 3046221.99 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:05:32,638 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:06:23,623 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 466950 2023-11-27 15:06:25,297 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.16 vs. limit=10.0 2023-11-27 15:06:29,080 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10050, loss[loss=0.0502, simple_loss=0.06627, pruned_loss=0.008625, audio_tagging_loss=0.008444, over 15475.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08939, pruned_loss=0.0124, audio_tagging_loss=0.008694, over 3047429.15 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:06:43,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3113086.6666666665, ans=0.125 2023-11-27 15:06:56,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3113153.3333333335, ans=0.1 2023-11-27 15:07:06,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3113220.0, ans=0.125 2023-11-27 15:07:20,305 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.460e+01 9.017e+01 9.705e+01 1.338e+02, threshold=1.803e+02, percent-clipped=0.0 2023-11-27 15:07:21,476 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467000 2023-11-27 15:07:27,156 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10100, loss[loss=0.06502, simple_loss=0.08479, pruned_loss=0.01155, audio_tagging_loss=0.01107, over 14528.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09022, pruned_loss=0.01246, audio_tagging_loss=0.008654, over 3053361.38 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:07:44,522 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-27 15:07:53,139 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.01 vs. limit=22.5 2023-11-27 15:08:07,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3113553.3333333335, ans=0.1 2023-11-27 15:08:17,376 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:08:18,575 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467050 2023-11-27 15:08:20,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3113620.0, ans=0.0 2023-11-27 15:08:23,948 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10150, loss[loss=0.08147, simple_loss=0.1112, pruned_loss=0.01861, audio_tagging_loss=0.00728, over 16109.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09068, pruned_loss=0.01264, audio_tagging_loss=0.008712, over 3056396.43 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:08:48,873 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.20 vs. limit=15.0 2023-11-27 15:08:55,786 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:08:56,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3113820.0, ans=0.07 2023-11-27 15:09:00,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3113886.6666666665, ans=0.0 2023-11-27 15:09:14,315 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.492e+01 9.148e+01 9.869e+01 1.257e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 15:09:15,485 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467100 2023-11-27 15:09:20,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3114020.0, ans=0.125 2023-11-27 15:09:21,489 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10200, loss[loss=0.05871, simple_loss=0.07296, pruned_loss=0.01162, audio_tagging_loss=0.01062, over 14831.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09099, pruned_loss=0.01268, audio_tagging_loss=0.008741, over 3054591.63 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:09:22,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3114020.0, ans=0.015 2023-11-27 15:09:38,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3114086.6666666665, ans=0.125 2023-11-27 15:09:47,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3114153.3333333335, ans=0.1 2023-11-27 15:09:48,641 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:10:01,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3114220.0, ans=0.125 2023-11-27 15:10:09,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3114286.6666666665, ans=0.125 2023-11-27 15:10:14,267 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467150 2023-11-27 15:10:20,422 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10250, loss[loss=0.087, simple_loss=0.1258, pruned_loss=0.01586, audio_tagging_loss=0.008229, over 16994.00 frames. ], tot_loss[loss=0.06749, simple_loss=0.0913, pruned_loss=0.01293, audio_tagging_loss=0.008904, over 3055521.87 frames. ], batch size: 63, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:10:31,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3114420.0, ans=0.0 2023-11-27 15:10:36,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3114420.0, ans=0.125 2023-11-27 15:10:40,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3114420.0, ans=0.125 2023-11-27 15:10:49,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3114486.6666666665, ans=10.0 2023-11-27 15:11:03,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3114553.3333333335, ans=0.125 2023-11-27 15:11:11,574 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.883e+01 9.540e+01 1.004e+02 1.419e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 15:11:11,681 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467200 2023-11-27 15:11:17,255 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10300, loss[loss=0.06275, simple_loss=0.08329, pruned_loss=0.011, audio_tagging_loss=0.0101, over 14957.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.09198, pruned_loss=0.0131, audio_tagging_loss=0.008748, over 3059005.88 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:11:24,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3114686.6666666665, ans=0.0 2023-11-27 15:11:33,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3114753.3333333335, ans=0.0 2023-11-27 15:11:38,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3114753.3333333335, ans=0.2 2023-11-27 15:11:48,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3114820.0, ans=0.125 2023-11-27 15:11:54,123 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2023-11-27 15:11:55,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3114886.6666666665, ans=0.125 2023-11-27 15:12:07,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3114953.3333333335, ans=0.1 2023-11-27 15:12:09,011 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467250 2023-11-27 15:12:10,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3114953.3333333335, ans=0.1 2023-11-27 15:12:14,906 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10350, loss[loss=0.05753, simple_loss=0.07189, pruned_loss=0.0104, audio_tagging_loss=0.01119, over 15082.00 frames. ], tot_loss[loss=0.06796, simple_loss=0.09202, pruned_loss=0.01302, audio_tagging_loss=0.008927, over 3056453.92 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:12:22,650 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.24 vs. limit=10.0 2023-11-27 15:12:36,852 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.24 vs. limit=12.0 2023-11-27 15:12:44,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2023-11-27 15:12:56,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3115220.0, ans=0.125 2023-11-27 15:13:03,946 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=15.0 2023-11-27 15:13:05,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3115286.6666666665, ans=0.0 2023-11-27 15:13:07,649 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 8.698e+01 9.408e+01 1.024e+02 1.336e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 15:13:07,739 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467300 2023-11-27 15:13:13,065 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10400, loss[loss=0.05917, simple_loss=0.08714, pruned_loss=0.006331, audio_tagging_loss=0.00927, over 14658.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09065, pruned_loss=0.01281, audio_tagging_loss=0.009059, over 3054954.51 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:13:15,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3115353.3333333335, ans=0.125 2023-11-27 15:13:15,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3115353.3333333335, ans=0.125 2023-11-27 15:13:40,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3115486.6666666665, ans=0.025 2023-11-27 15:13:41,876 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=15.0 2023-11-27 15:13:51,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3115553.3333333335, ans=0.125 2023-11-27 15:14:04,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3115620.0, ans=0.0 2023-11-27 15:14:05,101 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467350 2023-11-27 15:14:10,468 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10450, loss[loss=0.06629, simple_loss=0.0924, pruned_loss=0.01178, audio_tagging_loss=0.008312, over 15202.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09046, pruned_loss=0.01302, audio_tagging_loss=0.009026, over 3055451.72 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:14:11,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3115686.6666666665, ans=0.2 2023-11-27 15:14:25,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3115753.3333333335, ans=0.035 2023-11-27 15:14:52,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3115886.6666666665, ans=0.125 2023-11-27 15:14:57,196 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-11-27 15:15:02,070 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.347e+01 8.776e+01 9.506e+01 1.066e+02 3.679e+02, threshold=1.901e+02, percent-clipped=1.0 2023-11-27 15:15:02,173 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467400 2023-11-27 15:15:08,013 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10500, loss[loss=0.05862, simple_loss=0.06989, pruned_loss=0.01349, audio_tagging_loss=0.01018, over 14143.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09016, pruned_loss=0.01291, audio_tagging_loss=0.008904, over 3053236.81 frames. ], batch size: 56, lr: 1.73e-03, grad_scale: 32.0 2023-11-27 15:15:27,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3116086.6666666665, ans=0.125 2023-11-27 15:15:30,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3116153.3333333335, ans=0.1 2023-11-27 15:15:38,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=15.0 2023-11-27 15:15:47,544 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:15:56,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3116286.6666666665, ans=0.125 2023-11-27 15:16:00,291 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467450 2023-11-27 15:16:06,290 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10550, loss[loss=0.07238, simple_loss=0.1025, pruned_loss=0.01396, audio_tagging_loss=0.007158, over 15839.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09064, pruned_loss=0.013, audio_tagging_loss=0.008732, over 3048184.68 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:16:15,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3116353.3333333335, ans=0.2 2023-11-27 15:16:17,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3116420.0, ans=0.0 2023-11-27 15:16:23,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3116420.0, ans=0.07 2023-11-27 15:16:57,563 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467500 2023-11-27 15:16:58,578 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 8.619e+01 9.191e+01 9.903e+01 2.574e+02, threshold=1.838e+02, percent-clipped=2.0 2023-11-27 15:16:58,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3116620.0, ans=0.125 2023-11-27 15:17:00,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3116620.0, ans=0.2 2023-11-27 15:17:00,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3116620.0, ans=10.0 2023-11-27 15:17:03,605 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10600, loss[loss=0.07647, simple_loss=0.1054, pruned_loss=0.01611, audio_tagging_loss=0.00767, over 15258.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09035, pruned_loss=0.01286, audio_tagging_loss=0.008617, over 3045751.62 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:17:03,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3116686.6666666665, ans=0.035 2023-11-27 15:17:14,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3116753.3333333335, ans=0.1 2023-11-27 15:17:41,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3116886.6666666665, ans=0.0 2023-11-27 15:17:50,396 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.66 vs. limit=15.0 2023-11-27 15:17:55,351 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467550 2023-11-27 15:18:00,691 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10650, loss[loss=0.06988, simple_loss=0.09854, pruned_loss=0.01164, audio_tagging_loss=0.008971, over 15449.00 frames. ], tot_loss[loss=0.06725, simple_loss=0.09121, pruned_loss=0.01303, audio_tagging_loss=0.008617, over 3050799.56 frames. ], batch size: 59, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:18:01,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3117020.0, ans=0.0 2023-11-27 15:18:14,290 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2023-11-27 15:18:17,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3117086.6666666665, ans=0.125 2023-11-27 15:18:23,539 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=15.0 2023-11-27 15:18:29,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3117153.3333333335, ans=0.0 2023-11-27 15:18:30,203 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=22.5 2023-11-27 15:18:36,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3117220.0, ans=0.125 2023-11-27 15:18:42,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3117220.0, ans=0.125 2023-11-27 15:18:52,867 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467600 2023-11-27 15:18:55,394 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.520e+01 9.175e+01 9.888e+01 1.340e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-27 15:18:55,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3117286.6666666665, ans=0.2 2023-11-27 15:18:59,299 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10700, loss[loss=0.08224, simple_loss=0.1154, pruned_loss=0.0178, audio_tagging_loss=0.006732, over 15478.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09106, pruned_loss=0.01293, audio_tagging_loss=0.008618, over 3046625.30 frames. ], batch size: 58, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:19:13,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3117420.0, ans=0.0 2023-11-27 15:19:18,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3117420.0, ans=0.0 2023-11-27 15:19:23,772 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=12.0 2023-11-27 15:19:25,816 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-11-27 15:19:48,909 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:19:48,927 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:19:48,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3117620.0, ans=0.1 2023-11-27 15:19:50,906 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467650 2023-11-27 15:19:55,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=3117686.6666666665, ans=0.02 2023-11-27 15:19:56,284 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10750, loss[loss=0.05663, simple_loss=0.07501, pruned_loss=0.0107, audio_tagging_loss=0.008423, over 14443.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08936, pruned_loss=0.01258, audio_tagging_loss=0.0087, over 3035575.99 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:20:37,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3117886.6666666665, ans=0.035 2023-11-27 15:20:47,222 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467700 2023-11-27 15:20:49,673 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-11-27 15:20:49,905 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.438e+01 8.439e+01 9.244e+01 9.878e+01 1.512e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 15:20:53,261 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10800, loss[loss=0.05533, simple_loss=0.07394, pruned_loss=0.008564, audio_tagging_loss=0.009798, over 14537.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.0898, pruned_loss=0.01258, audio_tagging_loss=0.00866, over 3035056.99 frames. ], batch size: 54, lr: 1.73e-03, grad_scale: 16.0 2023-11-27 15:21:04,138 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.97 vs. limit=15.0 2023-11-27 15:21:11,702 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.29 vs. limit=22.5 2023-11-27 15:21:31,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3118220.0, ans=0.125 2023-11-27 15:21:35,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3118220.0, ans=0.125 2023-11-27 15:21:44,946 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467750 2023-11-27 15:21:47,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3118286.6666666665, ans=0.5 2023-11-27 15:21:48,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3118286.6666666665, ans=0.125 2023-11-27 15:21:50,794 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10850, loss[loss=0.06707, simple_loss=0.08457, pruned_loss=0.01461, audio_tagging_loss=0.01018, over 14810.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08953, pruned_loss=0.01259, audio_tagging_loss=0.008747, over 3043269.16 frames. ], batch size: 55, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:21:56,343 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=15.0 2023-11-27 15:22:03,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3118420.0, ans=0.125 2023-11-27 15:22:29,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3118553.3333333335, ans=0.5 2023-11-27 15:22:43,095 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467800 2023-11-27 15:22:46,563 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.986e+01 9.693e+01 1.013e+02 1.433e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-27 15:22:47,095 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2023-11-27 15:22:48,721 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10900, loss[loss=0.06787, simple_loss=0.09236, pruned_loss=0.01473, audio_tagging_loss=0.006953, over 14589.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09003, pruned_loss=0.01274, audio_tagging_loss=0.008748, over 3042297.76 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:22:49,817 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:23:01,357 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2023-11-27 15:23:16,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.13 vs. limit=15.0 2023-11-27 15:23:34,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3118953.3333333335, ans=0.1 2023-11-27 15:23:40,184 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467850 2023-11-27 15:23:42,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3118953.3333333335, ans=0.1 2023-11-27 15:23:45,537 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 10950, loss[loss=0.07775, simple_loss=0.1144, pruned_loss=0.01293, audio_tagging_loss=0.007626, over 16358.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09, pruned_loss=0.01278, audio_tagging_loss=0.008773, over 3047325.65 frames. ], batch size: 60, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:24:08,040 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.13 vs. limit=15.0 2023-11-27 15:24:10,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3119153.3333333335, ans=0.0 2023-11-27 15:24:37,558 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467900 2023-11-27 15:24:40,680 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.413e+01 8.952e+01 9.286e+01 1.000e+02 1.370e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 15:24:41,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3119353.3333333335, ans=0.2 2023-11-27 15:24:42,830 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11000, loss[loss=0.07094, simple_loss=0.1126, pruned_loss=0.006881, audio_tagging_loss=0.007758, over 15522.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08956, pruned_loss=0.0127, audio_tagging_loss=0.008866, over 3052615.76 frames. ], batch size: 57, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:24:54,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3119420.0, ans=0.125 2023-11-27 15:24:57,060 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:25:05,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3119486.6666666665, ans=0.125 2023-11-27 15:25:14,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3119486.6666666665, ans=0.125 2023-11-27 15:25:23,065 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.89 vs. limit=22.5 2023-11-27 15:25:34,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3119620.0, ans=0.0 2023-11-27 15:25:35,446 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 467950 2023-11-27 15:25:37,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3119620.0, ans=0.1 2023-11-27 15:25:40,950 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11050, loss[loss=0.07228, simple_loss=0.1032, pruned_loss=0.01347, audio_tagging_loss=0.007212, over 16560.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09013, pruned_loss=0.01282, audio_tagging_loss=0.008886, over 3046453.02 frames. ], batch size: 61, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:25:56,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3119753.3333333335, ans=0.125 2023-11-27 15:26:02,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3119820.0, ans=0.125 2023-11-27 15:26:17,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3119886.6666666665, ans=0.1 2023-11-27 15:26:22,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3119886.6666666665, ans=0.125 2023-11-27 15:26:25,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3119953.3333333335, ans=0.125 2023-11-27 15:26:31,568 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468000 2023-11-27 15:26:37,140 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.786e+01 9.414e+01 9.890e+01 1.526e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-27 15:26:39,330 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11100, loss[loss=0.0748, simple_loss=0.1021, pruned_loss=0.01413, audio_tagging_loss=0.009632, over 16159.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09044, pruned_loss=0.01278, audio_tagging_loss=0.008979, over 3049814.70 frames. ], batch size: 62, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:27:10,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3120153.3333333335, ans=0.0 2023-11-27 15:27:16,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3120220.0, ans=0.125 2023-11-27 15:27:23,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3120220.0, ans=0.125 2023-11-27 15:27:30,432 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468050 2023-11-27 15:27:33,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3120286.6666666665, ans=0.125 2023-11-27 15:27:36,996 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11150, loss[loss=0.05086, simple_loss=0.05845, pruned_loss=0.009907, audio_tagging_loss=0.01173, over 13587.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09063, pruned_loss=0.01273, audio_tagging_loss=0.008991, over 3046985.74 frames. ], batch size: 53, lr: 1.73e-03, grad_scale: 8.0 2023-11-27 15:27:52,482 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2023-11-27 15:28:02,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3120486.6666666665, ans=0.125 2023-11-27 15:28:29,073 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468100 2023-11-27 15:28:32,273 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.636e+01 8.619e+01 9.079e+01 9.995e+01 1.250e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-27 15:28:34,467 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11200, loss[loss=0.06859, simple_loss=0.0908, pruned_loss=0.01521, audio_tagging_loss=0.007975, over 16044.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09108, pruned_loss=0.01268, audio_tagging_loss=0.009136, over 3045252.67 frames. ], batch size: 61, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:28:37,267 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-27 15:28:44,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3120753.3333333335, ans=0.125 2023-11-27 15:28:49,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3120753.3333333335, ans=0.125 2023-11-27 15:28:58,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3120820.0, ans=0.125 2023-11-27 15:29:01,609 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=12.0 2023-11-27 15:29:02,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3120820.0, ans=0.0 2023-11-27 15:29:11,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3120886.6666666665, ans=0.0 2023-11-27 15:29:22,003 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.66 vs. limit=15.0 2023-11-27 15:29:25,876 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468150 2023-11-27 15:29:31,260 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11250, loss[loss=0.08301, simple_loss=0.1164, pruned_loss=0.01624, audio_tagging_loss=0.008588, over 14827.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09043, pruned_loss=0.01271, audio_tagging_loss=0.009142, over 3040287.23 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:29:40,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3121020.0, ans=0.0 2023-11-27 15:30:22,701 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468200 2023-11-27 15:30:23,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3121286.6666666665, ans=0.07 2023-11-27 15:30:26,648 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.71 vs. limit=22.5 2023-11-27 15:30:27,259 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.434e+01 8.697e+01 9.472e+01 1.045e+02 1.319e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-27 15:30:28,835 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11300, loss[loss=0.0647, simple_loss=0.08413, pruned_loss=0.01344, audio_tagging_loss=0.009187, over 14962.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09114, pruned_loss=0.013, audio_tagging_loss=0.008938, over 3040469.52 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:30:45,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3121420.0, ans=0.125 2023-11-27 15:31:00,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3121486.6666666665, ans=0.125 2023-11-27 15:31:08,357 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:31:11,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2023-11-27 15:31:20,436 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468250 2023-11-27 15:31:26,443 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11350, loss[loss=0.05469, simple_loss=0.07639, pruned_loss=0.008461, audio_tagging_loss=0.008032, over 14567.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09023, pruned_loss=0.01268, audio_tagging_loss=0.008854, over 3046716.06 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:31:43,087 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.43 vs. limit=12.0 2023-11-27 15:32:17,326 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468300 2023-11-27 15:32:21,510 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 8.462e+01 9.162e+01 9.878e+01 1.221e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 15:32:22,616 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11400, loss[loss=0.06782, simple_loss=0.1007, pruned_loss=0.01068, audio_tagging_loss=0.00682, over 15271.00 frames. ], tot_loss[loss=0.06724, simple_loss=0.09126, pruned_loss=0.01285, audio_tagging_loss=0.008756, over 3045020.04 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:32:57,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3122220.0, ans=0.0 2023-11-27 15:33:08,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3122286.6666666665, ans=0.0 2023-11-27 15:33:11,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3122286.6666666665, ans=0.125 2023-11-27 15:33:13,427 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468350 2023-11-27 15:33:18,889 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11450, loss[loss=0.0641, simple_loss=0.08441, pruned_loss=0.01298, audio_tagging_loss=0.008916, over 15462.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09059, pruned_loss=0.01281, audio_tagging_loss=0.008697, over 3048224.62 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:33:37,974 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.63 vs. limit=22.5 2023-11-27 15:33:47,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3122486.6666666665, ans=15.0 2023-11-27 15:34:00,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3122553.3333333335, ans=0.0 2023-11-27 15:34:09,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3122620.0, ans=0.125 2023-11-27 15:34:11,196 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468400 2023-11-27 15:34:16,381 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 8.713e+01 9.522e+01 1.023e+02 1.434e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 15:34:17,527 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11500, loss[loss=0.07065, simple_loss=0.08965, pruned_loss=0.01614, audio_tagging_loss=0.009689, over 14337.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09044, pruned_loss=0.01285, audio_tagging_loss=0.008687, over 3039044.67 frames. ], batch size: 53, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:34:34,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3122753.3333333335, ans=0.05 2023-11-27 15:34:44,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3122820.0, ans=0.0 2023-11-27 15:34:52,049 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2023-11-27 15:34:56,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3122886.6666666665, ans=0.0 2023-11-27 15:35:09,554 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468450 2023-11-27 15:35:15,022 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11550, loss[loss=0.08166, simple_loss=0.1103, pruned_loss=0.01605, audio_tagging_loss=0.01049, over 15517.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09084, pruned_loss=0.01283, audio_tagging_loss=0.00874, over 3044521.25 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 8.0 2023-11-27 15:35:15,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3123020.0, ans=0.1 2023-11-27 15:35:30,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3123086.6666666665, ans=0.125 2023-11-27 15:35:38,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3123153.3333333335, ans=0.05 2023-11-27 15:35:46,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3123153.3333333335, ans=0.1 2023-11-27 15:35:47,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3123153.3333333335, ans=0.025 2023-11-27 15:35:51,987 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=8.0 2023-11-27 15:35:54,352 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 15:36:01,414 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.65 vs. limit=15.0 2023-11-27 15:36:06,434 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468500 2023-11-27 15:36:06,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3123286.6666666665, ans=0.0 2023-11-27 15:36:10,703 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 8.754e+01 9.552e+01 1.002e+02 2.038e+02, threshold=1.910e+02, percent-clipped=1.0 2023-11-27 15:36:11,822 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11600, loss[loss=0.08962, simple_loss=0.1237, pruned_loss=0.01909, audio_tagging_loss=0.00869, over 15422.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09127, pruned_loss=0.01292, audio_tagging_loss=0.008804, over 3046864.24 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:36:31,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3123420.0, ans=0.1 2023-11-27 15:36:34,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3123486.6666666665, ans=0.0 2023-11-27 15:36:57,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3123620.0, ans=0.125 2023-11-27 15:37:03,091 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-27 15:37:03,570 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468550 2023-11-27 15:37:03,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3123620.0, ans=0.0 2023-11-27 15:37:09,513 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11650, loss[loss=0.04846, simple_loss=0.06119, pruned_loss=0.008331, audio_tagging_loss=0.009538, over 14225.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09056, pruned_loss=0.01287, audio_tagging_loss=0.008859, over 3041339.09 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:37:20,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3123753.3333333335, ans=0.125 2023-11-27 15:37:23,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3123753.3333333335, ans=0.1 2023-11-27 15:37:40,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3123820.0, ans=0.125 2023-11-27 15:37:52,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3123886.6666666665, ans=0.125 2023-11-27 15:37:53,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3123886.6666666665, ans=0.2 2023-11-27 15:38:01,341 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468600 2023-11-27 15:38:04,504 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2023-11-27 15:38:06,624 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.764e+01 9.242e+01 1.012e+02 1.452e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 15:38:07,818 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11700, loss[loss=0.04998, simple_loss=0.06475, pruned_loss=0.008455, audio_tagging_loss=0.009151, over 15005.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.08991, pruned_loss=0.01269, audio_tagging_loss=0.008911, over 3044279.85 frames. ], batch size: 61, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:38:14,750 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=12.0 2023-11-27 15:38:39,479 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2023-11-27 15:38:59,038 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468650 2023-11-27 15:39:03,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3124353.3333333335, ans=0.0 2023-11-27 15:39:04,418 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11750, loss[loss=0.07549, simple_loss=0.09397, pruned_loss=0.01457, audio_tagging_loss=0.01393, over 16018.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08973, pruned_loss=0.01253, audio_tagging_loss=0.008947, over 3046970.20 frames. ], batch size: 61, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:39:28,972 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 15:39:40,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3124553.3333333335, ans=0.125 2023-11-27 15:39:41,068 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.25 vs. limit=15.0 2023-11-27 15:39:56,132 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468700 2023-11-27 15:40:00,302 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.652e+01 8.569e+01 9.104e+01 9.733e+01 1.192e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-27 15:40:01,917 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11800, loss[loss=0.0787, simple_loss=0.114, pruned_loss=0.01659, audio_tagging_loss=0.005097, over 15673.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08903, pruned_loss=0.01243, audio_tagging_loss=0.009008, over 3039597.06 frames. ], batch size: 58, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:40:29,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3124820.0, ans=0.09899494936611666 2023-11-27 15:40:29,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3124820.0, ans=0.2 2023-11-27 15:40:44,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3124886.6666666665, ans=0.2 2023-11-27 15:40:53,868 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468750 2023-11-27 15:40:59,250 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11850, loss[loss=0.07729, simple_loss=0.1044, pruned_loss=0.01732, audio_tagging_loss=0.007762, over 15616.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08935, pruned_loss=0.01237, audio_tagging_loss=0.009001, over 3047893.91 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:41:50,320 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468800 2023-11-27 15:41:55,548 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.796e+01 8.519e+01 9.146e+01 9.837e+01 1.247e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-27 15:41:56,670 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11900, loss[loss=0.06557, simple_loss=0.08985, pruned_loss=0.01342, audio_tagging_loss=0.007224, over 15700.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08903, pruned_loss=0.01228, audio_tagging_loss=0.009011, over 3052483.42 frames. ], batch size: 59, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:42:10,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3125420.0, ans=0.1 2023-11-27 15:42:48,154 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468850 2023-11-27 15:42:53,403 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 11950, loss[loss=0.04821, simple_loss=0.06237, pruned_loss=0.008225, audio_tagging_loss=0.008804, over 14375.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.08938, pruned_loss=0.01246, audio_tagging_loss=0.009069, over 3047784.37 frames. ], batch size: 56, lr: 1.72e-03, grad_scale: 16.0 2023-11-27 15:43:07,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3125753.3333333335, ans=0.2 2023-11-27 15:43:07,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3125753.3333333335, ans=0.125 2023-11-27 15:43:17,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3125820.0, ans=0.1 2023-11-27 15:43:41,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3125953.3333333335, ans=0.125 2023-11-27 15:43:44,365 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468900 2023-11-27 15:43:48,024 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.38 vs. limit=15.0 2023-11-27 15:43:48,490 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.695e+01 9.240e+01 9.952e+01 1.274e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 15:43:49,571 INFO [train_asr.py:1235] (2/4) Epoch 39, batch 12000, loss[loss=0.07169, simple_loss=0.09303, pruned_loss=0.01259, audio_tagging_loss=0.01258, over 14464.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08881, pruned_loss=0.01244, audio_tagging_loss=0.009165, over 3038153.33 frames. ], batch size: 55, lr: 1.72e-03, grad_scale: 32.0 2023-11-27 15:43:49,571 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 15:44:09,380 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8062, 4.9499, 5.0243, 4.8789], device='cuda:2') 2023-11-27 15:44:15,423 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9674, 3.8209, 4.8373, 4.4516], device='cuda:2') 2023-11-27 15:44:18,035 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9685, 3.1670, 2.9339, 3.1624, 3.3871, 2.8041, 3.4227, 2.6043], device='cuda:2') 2023-11-27 15:44:24,031 INFO [train_asr.py:1267] (2/4) Epoch 39, validation: loss=0.05766, simple_loss=0.05064, pruned_loss=0.005162, audio_tagging_loss=0.02718, over 4681554.00 frames. 2023-11-27 15:44:24,032 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 15:44:25,510 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.39 vs. limit=22.5 2023-11-27 15:44:33,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3126086.6666666665, ans=0.0 2023-11-27 15:44:36,809 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=22.5 2023-11-27 15:44:37,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3126086.6666666665, ans=0.125 2023-11-27 15:44:39,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3126086.6666666665, ans=0.2 2023-11-27 15:44:42,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3126086.6666666665, ans=0.05 2023-11-27 15:44:47,467 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2023-11-27 15:45:06,137 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 0, loss[loss=0.08566, simple_loss=0.1064, pruned_loss=0.01248, audio_tagging_loss=0.01999, over 14542.00 frames. ], tot_loss[loss=0.08566, simple_loss=0.1064, pruned_loss=0.01248, audio_tagging_loss=0.01999, over 14542.00 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:45:06,138 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 15:45:41,201 INFO [train_asr.py:1267] (2/4) Epoch 40, validation: loss=0.05772, simple_loss=0.0507, pruned_loss=0.005215, audio_tagging_loss=0.02715, over 4681554.00 frames. 2023-11-27 15:45:41,202 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 15:45:45,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.64 vs. limit=22.5 2023-11-27 15:45:47,668 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2023-11-27 15:45:53,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3126253.3333333335, ans=0.125 2023-11-27 15:45:57,225 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=22.5 2023-11-27 15:46:03,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3126320.0, ans=0.125 2023-11-27 15:46:04,301 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 468950 2023-11-27 15:46:11,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3126320.0, ans=0.1 2023-11-27 15:46:29,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3126453.3333333335, ans=0.04949747468305833 2023-11-27 15:46:39,249 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 50, loss[loss=0.06468, simple_loss=0.08112, pruned_loss=0.008882, audio_tagging_loss=0.01524, over 14922.00 frames. ], tot_loss[loss=0.07392, simple_loss=0.08859, pruned_loss=0.0126, audio_tagging_loss=0.01702, over 681307.40 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:46:45,255 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=15.0 2023-11-27 15:46:55,200 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=15.0 2023-11-27 15:46:55,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3126586.6666666665, ans=0.025 2023-11-27 15:47:01,932 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469000 2023-11-27 15:47:05,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.80 vs. limit=15.0 2023-11-27 15:47:06,532 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 9.209e+01 9.877e+01 1.086e+02 2.497e+02, threshold=1.975e+02, percent-clipped=1.0 2023-11-27 15:47:25,126 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=12.0 2023-11-27 15:47:30,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3126786.6666666665, ans=0.2 2023-11-27 15:47:35,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3126853.3333333335, ans=0.5 2023-11-27 15:47:36,520 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 100, loss[loss=0.07671, simple_loss=0.09792, pruned_loss=0.01162, audio_tagging_loss=0.01613, over 15838.00 frames. ], tot_loss[loss=0.07431, simple_loss=0.0897, pruned_loss=0.0129, audio_tagging_loss=0.01656, over 1204595.71 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:47:46,516 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.58 vs. limit=10.0 2023-11-27 15:47:49,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3126920.0, ans=0.2 2023-11-27 15:47:52,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.45 vs. limit=22.5 2023-11-27 15:48:00,214 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469050 2023-11-27 15:48:09,871 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2023-11-27 15:48:10,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3127053.3333333335, ans=0.125 2023-11-27 15:48:13,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3127053.3333333335, ans=0.0 2023-11-27 15:48:15,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3127053.3333333335, ans=0.2 2023-11-27 15:48:20,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=15.0 2023-11-27 15:48:23,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3127120.0, ans=10.0 2023-11-27 15:48:34,254 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 150, loss[loss=0.07464, simple_loss=0.1014, pruned_loss=0.01338, audio_tagging_loss=0.01055, over 15031.00 frames. ], tot_loss[loss=0.07197, simple_loss=0.0891, pruned_loss=0.01273, audio_tagging_loss=0.01469, over 1614222.96 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:48:57,641 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.55 vs. limit=22.5 2023-11-27 15:48:58,068 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469100 2023-11-27 15:49:03,615 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.882e+01 9.129e+01 9.870e+01 1.058e+02 1.571e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-27 15:49:19,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3127386.6666666665, ans=0.0 2023-11-27 15:49:24,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3127453.3333333335, ans=0.125 2023-11-27 15:49:26,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3127453.3333333335, ans=0.1 2023-11-27 15:49:32,935 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 200, loss[loss=0.0806, simple_loss=0.1105, pruned_loss=0.0197, audio_tagging_loss=0.005643, over 15472.00 frames. ], tot_loss[loss=0.07103, simple_loss=0.09027, pruned_loss=0.01293, audio_tagging_loss=0.01297, over 1934898.47 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:49:33,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3127520.0, ans=0.05 2023-11-27 15:49:41,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3127520.0, ans=0.0 2023-11-27 15:49:45,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3127586.6666666665, ans=0.1 2023-11-27 15:49:55,009 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469150 2023-11-27 15:50:06,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3127720.0, ans=0.0 2023-11-27 15:50:10,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3127720.0, ans=0.125 2023-11-27 15:50:15,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3127720.0, ans=0.1 2023-11-27 15:50:16,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3127720.0, ans=0.125 2023-11-27 15:50:24,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3127786.6666666665, ans=0.125 2023-11-27 15:50:29,965 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 250, loss[loss=0.07218, simple_loss=0.1023, pruned_loss=0.01338, audio_tagging_loss=0.007637, over 14111.00 frames. ], tot_loss[loss=0.06983, simple_loss=0.09049, pruned_loss=0.01294, audio_tagging_loss=0.01164, over 2187241.75 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:50:45,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3127920.0, ans=10.0 2023-11-27 15:50:51,686 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2023-11-27 15:50:53,394 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469200 2023-11-27 15:50:53,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3127986.6666666665, ans=0.1 2023-11-27 15:50:59,103 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.262e+01 8.939e+01 9.454e+01 1.026e+02 1.364e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-27 15:51:26,659 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 300, loss[loss=0.06103, simple_loss=0.08154, pruned_loss=0.01113, audio_tagging_loss=0.00913, over 15370.00 frames. ], tot_loss[loss=0.06934, simple_loss=0.0913, pruned_loss=0.01291, audio_tagging_loss=0.01078, over 2380070.86 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:51:50,342 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469250 2023-11-27 15:51:56,202 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.91 vs. limit=15.0 2023-11-27 15:52:01,889 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2023-11-27 15:52:18,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3128453.3333333335, ans=0.125 2023-11-27 15:52:24,500 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 350, loss[loss=0.03839, simple_loss=0.04721, pruned_loss=0.005891, audio_tagging_loss=0.008892, over 15146.00 frames. ], tot_loss[loss=0.06764, simple_loss=0.08963, pruned_loss=0.01252, audio_tagging_loss=0.01031, over 2530889.46 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:52:25,946 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2023-11-27 15:52:29,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3128520.0, ans=0.125 2023-11-27 15:52:36,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3128586.6666666665, ans=0.125 2023-11-27 15:52:46,867 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469300 2023-11-27 15:52:52,151 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.667e+01 9.273e+01 1.018e+02 1.811e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 15:52:58,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3128720.0, ans=0.0 2023-11-27 15:53:00,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3128720.0, ans=0.0 2023-11-27 15:53:02,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3128720.0, ans=0.125 2023-11-27 15:53:14,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3128786.6666666665, ans=0.2 2023-11-27 15:53:21,707 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 400, loss[loss=0.05815, simple_loss=0.0806, pruned_loss=0.007162, audio_tagging_loss=0.01069, over 15335.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.0895, pruned_loss=0.01247, audio_tagging_loss=0.009961, over 2642936.20 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 15:53:25,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3128853.3333333335, ans=0.04949747468305833 2023-11-27 15:53:34,249 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.97 vs. limit=22.5 2023-11-27 15:53:44,298 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469350 2023-11-27 15:53:52,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.10 vs. limit=15.0 2023-11-27 15:54:08,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=15.0 2023-11-27 15:54:17,407 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 450, loss[loss=0.08033, simple_loss=0.1093, pruned_loss=0.01839, audio_tagging_loss=0.007317, over 15533.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09011, pruned_loss=0.01264, audio_tagging_loss=0.009638, over 2734544.04 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:54:21,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3129186.6666666665, ans=0.025 2023-11-27 15:54:27,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3129186.6666666665, ans=0.125 2023-11-27 15:54:33,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3129253.3333333335, ans=0.1 2023-11-27 15:54:41,481 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469400 2023-11-27 15:54:48,255 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 8.585e+01 9.092e+01 1.003e+02 1.210e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-27 15:54:49,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3129320.0, ans=0.125 2023-11-27 15:54:57,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3129386.6666666665, ans=0.07 2023-11-27 15:55:00,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3129386.6666666665, ans=0.1 2023-11-27 15:55:13,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3129453.3333333335, ans=0.025 2023-11-27 15:55:14,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3129453.3333333335, ans=0.125 2023-11-27 15:55:16,567 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 500, loss[loss=0.0924, simple_loss=0.1311, pruned_loss=0.02059, audio_tagging_loss=0.00625, over 15800.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.08939, pruned_loss=0.01253, audio_tagging_loss=0.009466, over 2795611.47 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:55:25,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3129520.0, ans=0.2 2023-11-27 15:55:35,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3129586.6666666665, ans=0.025 2023-11-27 15:55:39,472 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469450 2023-11-27 15:56:04,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3129786.6666666665, ans=0.0 2023-11-27 15:56:14,249 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 550, loss[loss=0.07021, simple_loss=0.09697, pruned_loss=0.01438, audio_tagging_loss=0.007351, over 15218.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09012, pruned_loss=0.01253, audio_tagging_loss=0.009329, over 2852268.60 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:56:16,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3129853.3333333335, ans=0.1 2023-11-27 15:56:24,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3129920.0, ans=0.125 2023-11-27 15:56:26,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3129920.0, ans=0.125 2023-11-27 15:56:29,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3129920.0, ans=15.0 2023-11-27 15:56:31,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3129920.0, ans=0.2 2023-11-27 15:56:36,940 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469500 2023-11-27 15:56:37,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3129986.6666666665, ans=0.2 2023-11-27 15:56:40,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3129986.6666666665, ans=0.125 2023-11-27 15:56:44,644 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.960e+01 8.483e+01 9.154e+01 9.792e+01 1.177e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-27 15:56:55,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3130053.3333333335, ans=0.125 2023-11-27 15:56:55,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3130053.3333333335, ans=0.125 2023-11-27 15:56:58,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3130053.3333333335, ans=0.125 2023-11-27 15:56:58,863 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=22.5 2023-11-27 15:57:11,341 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 600, loss[loss=0.07684, simple_loss=0.1107, pruned_loss=0.01218, audio_tagging_loss=0.009306, over 15837.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.0904, pruned_loss=0.01275, audio_tagging_loss=0.009244, over 2896356.48 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:57:21,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3130186.6666666665, ans=0.125 2023-11-27 15:57:35,651 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469550 2023-11-27 15:57:42,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3130320.0, ans=0.125 2023-11-27 15:57:43,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3130320.0, ans=0.125 2023-11-27 15:57:55,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3130386.6666666665, ans=0.0 2023-11-27 15:57:57,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3130453.3333333335, ans=0.125 2023-11-27 15:58:09,179 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 650, loss[loss=0.08173, simple_loss=0.11, pruned_loss=0.01726, audio_tagging_loss=0.009484, over 16102.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09001, pruned_loss=0.0127, audio_tagging_loss=0.009121, over 2928045.58 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 15:58:16,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3130520.0, ans=0.125 2023-11-27 15:58:29,838 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2023-11-27 15:58:32,562 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469600 2023-11-27 15:58:40,336 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.671e+01 9.149e+01 9.953e+01 1.299e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 15:59:00,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3130786.6666666665, ans=0.1 2023-11-27 15:59:07,567 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 700, loss[loss=0.07594, simple_loss=0.1037, pruned_loss=0.01722, audio_tagging_loss=0.00685, over 15521.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.08987, pruned_loss=0.0127, audio_tagging_loss=0.009061, over 2955230.09 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 8.0 2023-11-27 15:59:17,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3130853.3333333335, ans=22.5 2023-11-27 15:59:30,532 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469650 2023-11-27 15:59:47,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3131053.3333333335, ans=0.1 2023-11-27 15:59:53,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.03 vs. limit=22.5 2023-11-27 15:59:58,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3131120.0, ans=0.125 2023-11-27 16:00:05,237 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 750, loss[loss=0.08172, simple_loss=0.1142, pruned_loss=0.01744, audio_tagging_loss=0.007189, over 15527.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09132, pruned_loss=0.01284, audio_tagging_loss=0.008915, over 2977998.63 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 8.0 2023-11-27 16:00:17,243 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.93 vs. limit=15.0 2023-11-27 16:00:20,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3131253.3333333335, ans=0.04949747468305833 2023-11-27 16:00:28,326 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469700 2023-11-27 16:00:28,802 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=22.5 2023-11-27 16:00:36,958 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.681e+01 9.396e+01 9.945e+01 1.193e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 16:00:44,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3131386.6666666665, ans=0.125 2023-11-27 16:00:49,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3131386.6666666665, ans=0.125 2023-11-27 16:00:56,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3131453.3333333335, ans=0.1 2023-11-27 16:01:03,219 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 800, loss[loss=0.06424, simple_loss=0.09361, pruned_loss=0.01109, audio_tagging_loss=0.006341, over 16804.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.09219, pruned_loss=0.01301, audio_tagging_loss=0.008823, over 2995662.61 frames. ], batch size: 63, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:01:26,296 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469750 2023-11-27 16:01:38,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3131720.0, ans=0.125 2023-11-27 16:01:40,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3131720.0, ans=0.2 2023-11-27 16:01:53,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3131786.6666666665, ans=0.125 2023-11-27 16:01:56,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3131786.6666666665, ans=0.2 2023-11-27 16:02:00,642 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 850, loss[loss=0.1075, simple_loss=0.1583, pruned_loss=0.02168, audio_tagging_loss=0.006617, over 15606.00 frames. ], tot_loss[loss=0.06803, simple_loss=0.09223, pruned_loss=0.01301, audio_tagging_loss=0.008904, over 3002394.43 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:02:12,067 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2023-11-27 16:02:18,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3131920.0, ans=0.125 2023-11-27 16:02:22,657 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469800 2023-11-27 16:02:29,292 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.05 vs. limit=15.0 2023-11-27 16:02:31,531 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.724e+01 9.421e+01 1.007e+02 1.369e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-27 16:02:37,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3132053.3333333335, ans=0.0 2023-11-27 16:02:37,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3132053.3333333335, ans=0.07 2023-11-27 16:02:37,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3132053.3333333335, ans=0.125 2023-11-27 16:02:38,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3132053.3333333335, ans=0.125 2023-11-27 16:02:44,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=3132053.3333333335, ans=0.1 2023-11-27 16:02:47,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3132120.0, ans=0.07 2023-11-27 16:02:52,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3132120.0, ans=0.125 2023-11-27 16:02:57,914 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 900, loss[loss=0.06983, simple_loss=0.09707, pruned_loss=0.01576, audio_tagging_loss=0.005532, over 14330.00 frames. ], tot_loss[loss=0.06804, simple_loss=0.09205, pruned_loss=0.01303, audio_tagging_loss=0.008985, over 3013516.46 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:03:19,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3132320.0, ans=0.125 2023-11-27 16:03:20,885 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469850 2023-11-27 16:03:25,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3132320.0, ans=0.0 2023-11-27 16:03:25,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3132320.0, ans=0.125 2023-11-27 16:03:36,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3132386.6666666665, ans=0.1 2023-11-27 16:03:55,288 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 950, loss[loss=0.07372, simple_loss=0.1075, pruned_loss=0.01167, audio_tagging_loss=0.008309, over 14710.00 frames. ], tot_loss[loss=0.06829, simple_loss=0.09219, pruned_loss=0.01321, audio_tagging_loss=0.008987, over 3022661.83 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:04:12,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3132586.6666666665, ans=0.5 2023-11-27 16:04:19,208 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469900 2023-11-27 16:04:26,961 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.445e+01 8.703e+01 9.559e+01 1.057e+02 1.419e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 16:04:35,078 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=15.0 2023-11-27 16:04:39,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3132720.0, ans=0.0 2023-11-27 16:04:53,152 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1000, loss[loss=0.05622, simple_loss=0.081, pruned_loss=0.009606, audio_tagging_loss=0.006111, over 14529.00 frames. ], tot_loss[loss=0.06799, simple_loss=0.09197, pruned_loss=0.01317, audio_tagging_loss=0.008834, over 3030155.38 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:05:08,962 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:05:14,596 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.80 vs. limit=15.0 2023-11-27 16:05:16,389 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 469950 2023-11-27 16:05:19,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3132986.6666666665, ans=0.2 2023-11-27 16:05:20,801 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:05:38,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3133120.0, ans=0.2 2023-11-27 16:05:42,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3133120.0, ans=0.0 2023-11-27 16:05:51,391 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1050, loss[loss=0.07418, simple_loss=0.1018, pruned_loss=0.01652, audio_tagging_loss=0.006774, over 14478.00 frames. ], tot_loss[loss=0.06784, simple_loss=0.0921, pruned_loss=0.01306, audio_tagging_loss=0.008724, over 3042288.23 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:06:14,236 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470000 2023-11-27 16:06:22,104 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.934e+01 9.738e+01 1.038e+02 1.396e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-27 16:06:48,707 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1100, loss[loss=0.09821, simple_loss=0.1323, pruned_loss=0.02472, audio_tagging_loss=0.007315, over 15826.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09129, pruned_loss=0.01294, audio_tagging_loss=0.008768, over 3040119.51 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:06:54,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3133520.0, ans=0.125 2023-11-27 16:06:54,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3133520.0, ans=0.125 2023-11-27 16:06:55,070 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:06:59,645 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:07:12,229 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470050 2023-11-27 16:07:39,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3133786.6666666665, ans=0.125 2023-11-27 16:07:43,933 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2023-11-27 16:07:46,796 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1150, loss[loss=0.07134, simple_loss=0.0904, pruned_loss=0.01382, audio_tagging_loss=0.01232, over 15230.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09044, pruned_loss=0.01266, audio_tagging_loss=0.008726, over 3041460.87 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:07:50,577 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=15.0 2023-11-27 16:07:59,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3133920.0, ans=0.0 2023-11-27 16:08:10,129 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470100 2023-11-27 16:08:18,076 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.166e+01 8.606e+01 9.243e+01 9.874e+01 1.339e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 16:08:18,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3133986.6666666665, ans=0.125 2023-11-27 16:08:20,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3134053.3333333335, ans=0.125 2023-11-27 16:08:25,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3134053.3333333335, ans=0.125 2023-11-27 16:08:25,743 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-11-27 16:08:27,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3134053.3333333335, ans=0.0 2023-11-27 16:08:31,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3134053.3333333335, ans=0.125 2023-11-27 16:08:40,804 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.21 vs. limit=22.5 2023-11-27 16:08:44,560 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1200, loss[loss=0.06346, simple_loss=0.08019, pruned_loss=0.01378, audio_tagging_loss=0.009586, over 13872.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09068, pruned_loss=0.01282, audio_tagging_loss=0.00868, over 3035955.38 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:08:53,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3134186.6666666665, ans=0.125 2023-11-27 16:08:53,705 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=15.0 2023-11-27 16:09:00,240 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.89 vs. limit=22.5 2023-11-27 16:09:03,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3134253.3333333335, ans=0.0 2023-11-27 16:09:08,356 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470150 2023-11-27 16:09:29,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3134453.3333333335, ans=0.1 2023-11-27 16:09:29,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3134453.3333333335, ans=0.125 2023-11-27 16:09:42,466 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1250, loss[loss=0.05507, simple_loss=0.07498, pruned_loss=0.007703, audio_tagging_loss=0.009884, over 16249.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.0901, pruned_loss=0.01262, audio_tagging_loss=0.008659, over 3036375.02 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:09:51,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3134520.0, ans=0.0 2023-11-27 16:09:52,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3134520.0, ans=0.5 2023-11-27 16:10:02,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3134586.6666666665, ans=0.125 2023-11-27 16:10:06,003 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470200 2023-11-27 16:10:06,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3134653.3333333335, ans=0.125 2023-11-27 16:10:07,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3134653.3333333335, ans=0.1 2023-11-27 16:10:07,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3134653.3333333335, ans=0.05 2023-11-27 16:10:11,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3134653.3333333335, ans=10.0 2023-11-27 16:10:14,025 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.573e+01 9.266e+01 9.865e+01 1.338e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-27 16:10:33,106 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.83 vs. limit=10.0 2023-11-27 16:10:40,913 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1300, loss[loss=0.04976, simple_loss=0.06263, pruned_loss=0.00744, audio_tagging_loss=0.01101, over 14044.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08974, pruned_loss=0.01239, audio_tagging_loss=0.00868, over 3034144.53 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:10:49,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3134853.3333333335, ans=0.1 2023-11-27 16:11:01,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3134920.0, ans=0.07 2023-11-27 16:11:03,364 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470250 2023-11-27 16:11:11,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3134986.6666666665, ans=0.125 2023-11-27 16:11:14,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3135053.3333333335, ans=0.1 2023-11-27 16:11:30,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3135120.0, ans=0.125 2023-11-27 16:11:38,496 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1350, loss[loss=0.06485, simple_loss=0.07928, pruned_loss=0.01559, audio_tagging_loss=0.00962, over 14951.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09036, pruned_loss=0.01252, audio_tagging_loss=0.00871, over 3033546.33 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:12:01,833 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470300 2023-11-27 16:12:09,985 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.641e+01 9.240e+01 9.975e+01 1.416e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 16:12:23,878 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:12:25,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3135453.3333333335, ans=0.125 2023-11-27 16:12:29,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3135453.3333333335, ans=0.0 2023-11-27 16:12:36,569 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1400, loss[loss=0.08137, simple_loss=0.1035, pruned_loss=0.01981, audio_tagging_loss=0.009795, over 16232.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09029, pruned_loss=0.01258, audio_tagging_loss=0.008876, over 3042060.06 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:12:53,627 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.50 vs. limit=15.0 2023-11-27 16:12:59,810 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470350 2023-11-27 16:12:59,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3135653.3333333335, ans=0.125 2023-11-27 16:13:03,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3135653.3333333335, ans=0.0 2023-11-27 16:13:18,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3135720.0, ans=0.125 2023-11-27 16:13:19,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3135720.0, ans=0.2 2023-11-27 16:13:22,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3135786.6666666665, ans=15.0 2023-11-27 16:13:28,532 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-11-27 16:13:34,927 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1450, loss[loss=0.06131, simple_loss=0.08777, pruned_loss=0.009407, audio_tagging_loss=0.00802, over 15433.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09021, pruned_loss=0.01255, audio_tagging_loss=0.008938, over 3040407.35 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:13:39,106 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.35 vs. limit=10.0 2023-11-27 16:13:44,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3135853.3333333335, ans=0.125 2023-11-27 16:13:52,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3135920.0, ans=0.0 2023-11-27 16:13:53,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3135920.0, ans=0.1 2023-11-27 16:13:57,705 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470400 2023-11-27 16:14:05,668 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.278e+01 8.750e+01 9.513e+01 1.013e+02 1.289e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-27 16:14:17,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3136053.3333333335, ans=0.125 2023-11-27 16:14:30,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3136120.0, ans=0.125 2023-11-27 16:14:32,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.07 vs. limit=15.0 2023-11-27 16:14:32,815 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1500, loss[loss=0.06663, simple_loss=0.08969, pruned_loss=0.01318, audio_tagging_loss=0.008601, over 14507.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09025, pruned_loss=0.01266, audio_tagging_loss=0.008944, over 3044947.61 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:14:33,318 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.81 vs. limit=15.0 2023-11-27 16:14:42,142 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2023-11-27 16:14:43,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3136253.3333333335, ans=0.1 2023-11-27 16:14:56,067 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470450 2023-11-27 16:15:07,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3136386.6666666665, ans=0.125 2023-11-27 16:15:20,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3136453.3333333335, ans=0.125 2023-11-27 16:15:26,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3136453.3333333335, ans=0.2 2023-11-27 16:15:28,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3136453.3333333335, ans=0.1 2023-11-27 16:15:30,249 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1550, loss[loss=0.06633, simple_loss=0.07994, pruned_loss=0.01461, audio_tagging_loss=0.01175, over 15502.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09076, pruned_loss=0.01276, audio_tagging_loss=0.008932, over 3053148.93 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:15:53,754 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470500 2023-11-27 16:16:03,042 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 8.701e+01 9.341e+01 1.017e+02 1.377e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 16:16:03,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3136653.3333333335, ans=0.0 2023-11-27 16:16:08,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3136720.0, ans=0.1 2023-11-27 16:16:13,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3136720.0, ans=0.1 2023-11-27 16:16:25,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3136786.6666666665, ans=0.0 2023-11-27 16:16:28,060 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1600, loss[loss=0.0469, simple_loss=0.05702, pruned_loss=0.008036, audio_tagging_loss=0.01035, over 15967.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08965, pruned_loss=0.01264, audio_tagging_loss=0.00906, over 3049131.24 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:16:31,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3136853.3333333335, ans=0.09899494936611666 2023-11-27 16:16:50,910 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470550 2023-11-27 16:16:53,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3136986.6666666665, ans=0.125 2023-11-27 16:17:07,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3137053.3333333335, ans=0.125 2023-11-27 16:17:21,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3137120.0, ans=0.1 2023-11-27 16:17:26,218 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1650, loss[loss=0.05692, simple_loss=0.06245, pruned_loss=0.01406, audio_tagging_loss=0.01164, over 14832.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08976, pruned_loss=0.01247, audio_tagging_loss=0.009126, over 3050523.09 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:17:28,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3137186.6666666665, ans=0.1 2023-11-27 16:17:32,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3137186.6666666665, ans=0.2 2023-11-27 16:17:35,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3137186.6666666665, ans=0.125 2023-11-27 16:17:39,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3137253.3333333335, ans=0.125 2023-11-27 16:17:48,369 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470600 2023-11-27 16:17:58,999 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 8.945e+01 9.413e+01 1.026e+02 1.249e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-27 16:18:03,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3137386.6666666665, ans=0.125 2023-11-27 16:18:23,907 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1700, loss[loss=0.06143, simple_loss=0.08414, pruned_loss=0.01115, audio_tagging_loss=0.008211, over 15021.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08935, pruned_loss=0.01244, audio_tagging_loss=0.009152, over 3051887.00 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:18:28,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3137520.0, ans=0.0 2023-11-27 16:18:31,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3137520.0, ans=0.0 2023-11-27 16:18:43,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3137586.6666666665, ans=0.125 2023-11-27 16:18:44,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3137586.6666666665, ans=0.2 2023-11-27 16:18:46,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3137653.3333333335, ans=0.125 2023-11-27 16:18:47,167 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470650 2023-11-27 16:18:58,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3137720.0, ans=0.09899494936611666 2023-11-27 16:19:21,609 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1750, loss[loss=0.08807, simple_loss=0.1098, pruned_loss=0.02115, audio_tagging_loss=0.01201, over 15629.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.0899, pruned_loss=0.01253, audio_tagging_loss=0.008912, over 3055225.24 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:19:30,542 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.84 vs. limit=15.0 2023-11-27 16:19:32,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3137920.0, ans=0.125 2023-11-27 16:19:41,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3137920.0, ans=0.0 2023-11-27 16:19:42,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3137920.0, ans=0.0 2023-11-27 16:19:44,782 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470700 2023-11-27 16:19:52,543 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:19:54,548 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.663e+01 9.086e+01 9.649e+01 1.198e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-27 16:19:59,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3138053.3333333335, ans=0.1 2023-11-27 16:20:07,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3138120.0, ans=0.0 2023-11-27 16:20:19,418 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1800, loss[loss=0.07408, simple_loss=0.09862, pruned_loss=0.01513, audio_tagging_loss=0.009636, over 15471.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09035, pruned_loss=0.01267, audio_tagging_loss=0.008802, over 3049906.56 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:20:33,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3138253.3333333335, ans=0.0 2023-11-27 16:20:41,840 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470750 2023-11-27 16:21:01,671 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.26 vs. limit=10.0 2023-11-27 16:21:07,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3138453.3333333335, ans=0.1 2023-11-27 16:21:13,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3138453.3333333335, ans=0.125 2023-11-27 16:21:15,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3138520.0, ans=0.125 2023-11-27 16:21:16,682 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1850, loss[loss=0.06762, simple_loss=0.09062, pruned_loss=0.01198, audio_tagging_loss=0.01033, over 15346.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09069, pruned_loss=0.01278, audio_tagging_loss=0.008729, over 3053951.98 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:21:16,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3138520.0, ans=0.1 2023-11-27 16:21:21,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3138520.0, ans=0.125 2023-11-27 16:21:40,198 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470800 2023-11-27 16:21:45,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3138653.3333333335, ans=0.125 2023-11-27 16:21:50,847 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.385e+01 8.620e+01 9.187e+01 9.919e+01 1.245e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-27 16:21:53,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3138720.0, ans=0.2 2023-11-27 16:22:07,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3138786.6666666665, ans=0.125 2023-11-27 16:22:14,809 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1900, loss[loss=0.08569, simple_loss=0.1134, pruned_loss=0.01948, audio_tagging_loss=0.009533, over 15074.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.09224, pruned_loss=0.01305, audio_tagging_loss=0.008701, over 3058054.20 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:22:38,557 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470850 2023-11-27 16:23:00,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.40 vs. limit=22.5 2023-11-27 16:23:12,682 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 1950, loss[loss=0.07064, simple_loss=0.1005, pruned_loss=0.01139, audio_tagging_loss=0.009026, over 16041.00 frames. ], tot_loss[loss=0.06764, simple_loss=0.09218, pruned_loss=0.01296, audio_tagging_loss=0.008588, over 3058327.55 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:23:35,695 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470900 2023-11-27 16:23:38,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.91 vs. limit=22.5 2023-11-27 16:23:41,118 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2023-11-27 16:23:46,453 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.170e+01 8.576e+01 9.160e+01 9.774e+01 1.352e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 16:23:47,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3139386.6666666665, ans=0.125 2023-11-27 16:23:55,206 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2023-11-27 16:24:10,837 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2000, loss[loss=0.04861, simple_loss=0.06521, pruned_loss=0.008398, audio_tagging_loss=0.0076, over 15568.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09111, pruned_loss=0.01275, audio_tagging_loss=0.008626, over 3053344.24 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:24:18,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3139520.0, ans=0.0 2023-11-27 16:24:33,673 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 470950 2023-11-27 16:24:36,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3139653.3333333335, ans=0.0 2023-11-27 16:24:49,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3139720.0, ans=0.0 2023-11-27 16:24:52,005 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2023-11-27 16:24:56,074 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2023-11-27 16:25:05,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3139786.6666666665, ans=0.0 2023-11-27 16:25:07,719 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2050, loss[loss=0.05296, simple_loss=0.07186, pruned_loss=0.00991, audio_tagging_loss=0.007125, over 15948.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09088, pruned_loss=0.01278, audio_tagging_loss=0.008648, over 3049147.83 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:25:14,731 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2023-11-27 16:25:18,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3139920.0, ans=0.0 2023-11-27 16:25:31,909 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471000 2023-11-27 16:25:39,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3139986.6666666665, ans=0.0 2023-11-27 16:25:41,989 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.142e+01 8.831e+01 9.330e+01 1.029e+02 1.229e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-27 16:25:42,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3140053.3333333335, ans=0.0 2023-11-27 16:25:44,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3140053.3333333335, ans=0.0 2023-11-27 16:25:46,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3140053.3333333335, ans=0.0 2023-11-27 16:26:03,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3140120.0, ans=0.0 2023-11-27 16:26:05,842 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2100, loss[loss=0.08195, simple_loss=0.116, pruned_loss=0.01572, audio_tagging_loss=0.008227, over 15195.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09106, pruned_loss=0.01291, audio_tagging_loss=0.008606, over 3051210.19 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:26:07,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3140186.6666666665, ans=0.125 2023-11-27 16:26:18,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3140253.3333333335, ans=0.05 2023-11-27 16:26:29,091 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471050 2023-11-27 16:27:03,646 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2150, loss[loss=0.07534, simple_loss=0.1028, pruned_loss=0.01344, audio_tagging_loss=0.01048, over 15615.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09142, pruned_loss=0.01306, audio_tagging_loss=0.008603, over 3051307.58 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:27:27,048 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471100 2023-11-27 16:27:36,632 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.778e+01 9.339e+01 1.008e+02 1.312e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 16:27:42,448 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:27:44,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3140720.0, ans=0.1 2023-11-27 16:27:49,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3140786.6666666665, ans=0.125 2023-11-27 16:28:01,137 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2200, loss[loss=0.06608, simple_loss=0.08385, pruned_loss=0.01551, audio_tagging_loss=0.008645, over 14639.00 frames. ], tot_loss[loss=0.0676, simple_loss=0.09178, pruned_loss=0.01306, audio_tagging_loss=0.008649, over 3044026.11 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:28:03,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3140853.3333333335, ans=0.0 2023-11-27 16:28:13,410 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.569e-03 2023-11-27 16:28:24,799 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471150 2023-11-27 16:28:43,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3141053.3333333335, ans=0.1 2023-11-27 16:28:43,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3141053.3333333335, ans=0.0 2023-11-27 16:28:46,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3141120.0, ans=0.0 2023-11-27 16:28:48,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3141120.0, ans=0.125 2023-11-27 16:28:54,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3141120.0, ans=0.09899494936611666 2023-11-27 16:28:58,788 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2250, loss[loss=0.06931, simple_loss=0.09066, pruned_loss=0.01443, audio_tagging_loss=0.009554, over 15560.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.09143, pruned_loss=0.01297, audio_tagging_loss=0.008734, over 3044012.14 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:29:01,550 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2023-11-27 16:29:16,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=3141253.3333333335, ans=0.1 2023-11-27 16:29:21,723 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471200 2023-11-27 16:29:33,586 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.247e+01 8.710e+01 9.342e+01 1.003e+02 1.212e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 16:29:34,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3141386.6666666665, ans=0.0 2023-11-27 16:29:45,540 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.62 vs. limit=15.0 2023-11-27 16:29:47,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3141453.3333333335, ans=0.1 2023-11-27 16:29:52,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3141453.3333333335, ans=0.125 2023-11-27 16:29:57,615 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2300, loss[loss=0.06591, simple_loss=0.08273, pruned_loss=0.01272, audio_tagging_loss=0.01182, over 15659.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.0914, pruned_loss=0.01297, audio_tagging_loss=0.008749, over 3046316.36 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:30:05,840 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.78 vs. limit=12.0 2023-11-27 16:30:19,816 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471250 2023-11-27 16:30:37,487 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:30:43,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3141786.6666666665, ans=0.1 2023-11-27 16:30:50,549 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.30 vs. limit=15.0 2023-11-27 16:30:51,108 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:30:51,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3141786.6666666665, ans=0.125 2023-11-27 16:30:54,402 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2350, loss[loss=0.06247, simple_loss=0.08172, pruned_loss=0.01184, audio_tagging_loss=0.00977, over 14429.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09017, pruned_loss=0.01284, audio_tagging_loss=0.008908, over 3038358.23 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:31:03,125 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.83 vs. limit=10.0 2023-11-27 16:31:15,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3141920.0, ans=0.0 2023-11-27 16:31:15,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3141920.0, ans=0.0 2023-11-27 16:31:16,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3141986.6666666665, ans=0.125 2023-11-27 16:31:17,164 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-27 16:31:18,070 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471300 2023-11-27 16:31:28,936 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-11-27 16:31:29,671 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.596e+01 9.412e+01 1.004e+02 1.275e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 16:31:33,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3142053.3333333335, ans=0.2 2023-11-27 16:31:33,683 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-11-27 16:31:36,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3142053.3333333335, ans=0.125 2023-11-27 16:31:37,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=3142053.3333333335, ans=0.2 2023-11-27 16:31:41,359 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-27 16:31:44,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3142120.0, ans=0.125 2023-11-27 16:31:52,506 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2400, loss[loss=0.06162, simple_loss=0.0707, pruned_loss=0.01364, audio_tagging_loss=0.01263, over 15219.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09018, pruned_loss=0.01274, audio_tagging_loss=0.008874, over 3035309.99 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:32:00,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3142186.6666666665, ans=0.1 2023-11-27 16:32:05,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3142253.3333333335, ans=0.125 2023-11-27 16:32:12,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3142253.3333333335, ans=0.0 2023-11-27 16:32:15,876 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471350 2023-11-27 16:32:26,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3142386.6666666665, ans=0.2 2023-11-27 16:32:27,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3142386.6666666665, ans=0.2 2023-11-27 16:32:48,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3142453.3333333335, ans=0.125 2023-11-27 16:32:50,643 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2450, loss[loss=0.06089, simple_loss=0.08542, pruned_loss=0.011, audio_tagging_loss=0.007169, over 14796.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.08982, pruned_loss=0.01251, audio_tagging_loss=0.00889, over 3036280.86 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:32:54,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3142520.0, ans=0.125 2023-11-27 16:32:56,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3142520.0, ans=0.125 2023-11-27 16:33:05,190 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2023-11-27 16:33:13,820 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471400 2023-11-27 16:33:16,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3142653.3333333335, ans=0.125 2023-11-27 16:33:25,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.627e+01 9.278e+01 9.943e+01 1.246e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-27 16:33:34,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3142720.0, ans=0.125 2023-11-27 16:33:35,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3142720.0, ans=0.125 2023-11-27 16:33:39,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3142786.6666666665, ans=0.0 2023-11-27 16:33:47,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3142853.3333333335, ans=0.125 2023-11-27 16:33:48,568 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2500, loss[loss=0.07834, simple_loss=0.1085, pruned_loss=0.01574, audio_tagging_loss=0.008361, over 14924.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09032, pruned_loss=0.01263, audio_tagging_loss=0.008926, over 3037431.69 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:34:10,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3142986.6666666665, ans=0.125 2023-11-27 16:34:11,477 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471450 2023-11-27 16:34:38,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3143120.0, ans=0.125 2023-11-27 16:34:44,856 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=15.0 2023-11-27 16:34:46,506 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2550, loss[loss=0.06232, simple_loss=0.08826, pruned_loss=0.01039, audio_tagging_loss=0.007802, over 15782.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09035, pruned_loss=0.01258, audio_tagging_loss=0.008896, over 3039047.84 frames. ], batch size: 60, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:34:51,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3143186.6666666665, ans=0.125 2023-11-27 16:34:51,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3143186.6666666665, ans=6.0 2023-11-27 16:34:52,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3143186.6666666665, ans=0.125 2023-11-27 16:35:06,540 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.91 vs. limit=10.0 2023-11-27 16:35:09,323 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471500 2023-11-27 16:35:17,144 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=12.0 2023-11-27 16:35:21,953 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.534e+01 9.124e+01 9.895e+01 1.510e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-27 16:35:44,621 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2600, loss[loss=0.04193, simple_loss=0.05505, pruned_loss=0.004794, audio_tagging_loss=0.009611, over 14939.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08986, pruned_loss=0.01241, audio_tagging_loss=0.008856, over 3042104.91 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:35:50,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3143520.0, ans=0.125 2023-11-27 16:36:07,255 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471550 2023-11-27 16:36:09,314 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2023-11-27 16:36:09,610 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=15.0 2023-11-27 16:36:37,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3143786.6666666665, ans=0.1 2023-11-27 16:36:41,776 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2650, loss[loss=0.03719, simple_loss=0.04554, pruned_loss=0.005482, audio_tagging_loss=0.008942, over 14513.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08959, pruned_loss=0.0124, audio_tagging_loss=0.008858, over 3035689.11 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:36:46,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3143853.3333333335, ans=0.125 2023-11-27 16:37:05,510 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471600 2023-11-27 16:37:06,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3143986.6666666665, ans=0.125 2023-11-27 16:37:17,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3144053.3333333335, ans=0.07 2023-11-27 16:37:18,550 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.734e+01 8.789e+01 9.225e+01 1.011e+02 1.898e+02, threshold=1.845e+02, percent-clipped=1.0 2023-11-27 16:37:29,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3144120.0, ans=0.125 2023-11-27 16:37:41,113 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2700, loss[loss=0.05269, simple_loss=0.06496, pruned_loss=0.008881, audio_tagging_loss=0.01134, over 13518.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09007, pruned_loss=0.01243, audio_tagging_loss=0.008789, over 3037174.53 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:37:45,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3144186.6666666665, ans=0.0 2023-11-27 16:37:49,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3144186.6666666665, ans=0.0 2023-11-27 16:37:55,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3144253.3333333335, ans=0.125 2023-11-27 16:38:01,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3144253.3333333335, ans=0.2 2023-11-27 16:38:03,505 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471650 2023-11-27 16:38:23,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3144386.6666666665, ans=0.0 2023-11-27 16:38:38,650 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2750, loss[loss=0.06471, simple_loss=0.0907, pruned_loss=0.01064, audio_tagging_loss=0.008713, over 14462.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08956, pruned_loss=0.01226, audio_tagging_loss=0.00873, over 3034948.36 frames. ], batch size: 54, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:39:00,717 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471700 2023-11-27 16:39:14,194 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.654e+01 9.307e+01 9.925e+01 1.318e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 16:39:27,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3144786.6666666665, ans=0.0 2023-11-27 16:39:31,245 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:39:35,720 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2800, loss[loss=0.06606, simple_loss=0.09207, pruned_loss=0.01404, audio_tagging_loss=0.005978, over 14926.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.0894, pruned_loss=0.01227, audio_tagging_loss=0.008724, over 3033884.62 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:39:45,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3144920.0, ans=0.0 2023-11-27 16:39:59,047 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471750 2023-11-27 16:40:21,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3145120.0, ans=0.0 2023-11-27 16:40:33,062 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2850, loss[loss=0.06668, simple_loss=0.09, pruned_loss=0.01176, audio_tagging_loss=0.009913, over 14826.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08914, pruned_loss=0.01235, audio_tagging_loss=0.008722, over 3030417.28 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:40:37,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3145186.6666666665, ans=0.125 2023-11-27 16:40:43,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3145186.6666666665, ans=0.0 2023-11-27 16:40:56,796 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471800 2023-11-27 16:41:10,170 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 8.689e+01 9.342e+01 1.021e+02 1.296e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-27 16:41:13,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3145386.6666666665, ans=0.0 2023-11-27 16:41:31,281 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2900, loss[loss=0.0787, simple_loss=0.1073, pruned_loss=0.01704, audio_tagging_loss=0.00801, over 14741.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09004, pruned_loss=0.01245, audio_tagging_loss=0.008658, over 3031477.53 frames. ], batch size: 56, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:41:33,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3145520.0, ans=0.0 2023-11-27 16:41:34,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3145520.0, ans=0.0 2023-11-27 16:41:47,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3145586.6666666665, ans=0.0 2023-11-27 16:41:47,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3145586.6666666665, ans=0.1 2023-11-27 16:41:54,039 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471850 2023-11-27 16:41:54,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3145653.3333333335, ans=0.0 2023-11-27 16:42:01,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3145653.3333333335, ans=0.0 2023-11-27 16:42:05,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3145720.0, ans=0.125 2023-11-27 16:42:16,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3145786.6666666665, ans=0.125 2023-11-27 16:42:28,693 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 2950, loss[loss=0.05072, simple_loss=0.06619, pruned_loss=0.009469, audio_tagging_loss=0.008154, over 14828.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08967, pruned_loss=0.01247, audio_tagging_loss=0.008734, over 3031212.11 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:42:52,180 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471900 2023-11-27 16:43:00,454 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-27 16:43:05,772 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.312e+01 8.644e+01 9.313e+01 9.896e+01 1.371e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 16:43:18,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3146120.0, ans=0.2 2023-11-27 16:43:19,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3146120.0, ans=0.0 2023-11-27 16:43:25,552 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3000, loss[loss=0.06294, simple_loss=0.08336, pruned_loss=0.009693, audio_tagging_loss=0.01156, over 14375.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09095, pruned_loss=0.01274, audio_tagging_loss=0.008797, over 3034195.91 frames. ], batch size: 55, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:43:25,553 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 16:44:00,568 INFO [train_asr.py:1267] (2/4) Epoch 40, validation: loss=0.0576, simple_loss=0.0507, pruned_loss=0.005183, audio_tagging_loss=0.02707, over 4681554.00 frames. 2023-11-27 16:44:00,569 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 16:44:22,687 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 471950 2023-11-27 16:44:56,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3146520.0, ans=0.07 2023-11-27 16:44:57,626 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3050, loss[loss=0.0935, simple_loss=0.1317, pruned_loss=0.02299, audio_tagging_loss=0.004663, over 14006.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09186, pruned_loss=0.0129, audio_tagging_loss=0.008718, over 3028910.33 frames. ], batch size: 52, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:45:08,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3146586.6666666665, ans=0.125 2023-11-27 16:45:08,465 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2023-11-27 16:45:10,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3146586.6666666665, ans=0.125 2023-11-27 16:45:20,557 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472000 2023-11-27 16:45:37,418 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.960e+01 9.776e+01 1.063e+02 1.311e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-27 16:45:37,479 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:45:44,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3146720.0, ans=0.125 2023-11-27 16:45:45,723 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.43 vs. limit=10.0 2023-11-27 16:45:48,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3146786.6666666665, ans=0.125 2023-11-27 16:45:57,821 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3100, loss[loss=0.06095, simple_loss=0.08626, pruned_loss=0.008795, audio_tagging_loss=0.009024, over 13987.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09084, pruned_loss=0.01262, audio_tagging_loss=0.00887, over 3026376.22 frames. ], batch size: 51, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:46:01,121 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2023-11-27 16:46:02,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3146853.3333333335, ans=0.125 2023-11-27 16:46:13,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3146920.0, ans=0.125 2023-11-27 16:46:21,466 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472050 2023-11-27 16:46:22,874 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=22.5 2023-11-27 16:46:25,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3146986.6666666665, ans=0.0 2023-11-27 16:46:36,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3147053.3333333335, ans=0.1 2023-11-27 16:46:36,914 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:46:55,366 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3150, loss[loss=0.05038, simple_loss=0.06479, pruned_loss=0.007568, audio_tagging_loss=0.01042, over 15004.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09106, pruned_loss=0.01268, audio_tagging_loss=0.008834, over 3034595.02 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:47:00,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3147186.6666666665, ans=0.0 2023-11-27 16:47:03,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3147186.6666666665, ans=0.09899494936611666 2023-11-27 16:47:18,652 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472100 2023-11-27 16:47:31,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3147386.6666666665, ans=0.2 2023-11-27 16:47:32,430 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.379e+01 8.709e+01 9.238e+01 9.861e+01 1.387e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 16:47:34,362 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.68 vs. limit=15.0 2023-11-27 16:47:52,701 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3200, loss[loss=0.04271, simple_loss=0.0536, pruned_loss=0.006913, audio_tagging_loss=0.008997, over 16162.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09154, pruned_loss=0.01274, audio_tagging_loss=0.00883, over 3034478.39 frames. ], batch size: 62, lr: 1.70e-03, grad_scale: 32.0 2023-11-27 16:47:58,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3147520.0, ans=0.0 2023-11-27 16:48:15,790 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472150 2023-11-27 16:48:19,672 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2023-11-27 16:48:21,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3147653.3333333335, ans=0.1 2023-11-27 16:48:31,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3147720.0, ans=0.125 2023-11-27 16:48:47,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3147786.6666666665, ans=0.1 2023-11-27 16:48:50,356 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3250, loss[loss=0.07329, simple_loss=0.09317, pruned_loss=0.0147, audio_tagging_loss=0.01201, over 14092.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09081, pruned_loss=0.01261, audio_tagging_loss=0.008997, over 3038788.13 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:48:53,980 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.98 vs. limit=22.5 2023-11-27 16:49:12,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3147986.6666666665, ans=0.125 2023-11-27 16:49:14,113 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472200 2023-11-27 16:49:28,697 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.334e+01 8.717e+01 9.369e+01 9.960e+01 1.192e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 16:49:40,068 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.95 vs. limit=22.5 2023-11-27 16:49:41,378 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.28 vs. limit=15.0 2023-11-27 16:49:48,406 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3300, loss[loss=0.09175, simple_loss=0.1259, pruned_loss=0.01934, audio_tagging_loss=0.009472, over 15688.00 frames. ], tot_loss[loss=0.06748, simple_loss=0.09122, pruned_loss=0.01281, audio_tagging_loss=0.009065, over 3048850.03 frames. ], batch size: 57, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:50:11,716 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472250 2023-11-27 16:50:11,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3148320.0, ans=0.0 2023-11-27 16:50:24,800 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:50:24,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3148386.6666666665, ans=0.125 2023-11-27 16:50:33,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3148453.3333333335, ans=0.1 2023-11-27 16:50:46,541 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3350, loss[loss=0.05401, simple_loss=0.07268, pruned_loss=0.008168, audio_tagging_loss=0.009503, over 16271.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09183, pruned_loss=0.01288, audio_tagging_loss=0.008883, over 3047804.51 frames. ], batch size: 61, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:50:52,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3148520.0, ans=0.2 2023-11-27 16:51:09,644 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472300 2023-11-27 16:51:10,044 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.01 vs. limit=22.5 2023-11-27 16:51:11,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3148653.3333333335, ans=0.2 2023-11-27 16:51:12,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3148653.3333333335, ans=0.0 2023-11-27 16:51:17,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3148653.3333333335, ans=0.95 2023-11-27 16:51:17,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3148653.3333333335, ans=0.2 2023-11-27 16:51:18,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3148653.3333333335, ans=0.125 2023-11-27 16:51:24,402 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.635e+01 9.292e+01 9.708e+01 1.105e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-27 16:51:43,855 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3400, loss[loss=0.08349, simple_loss=0.1141, pruned_loss=0.01885, audio_tagging_loss=0.007607, over 14151.00 frames. ], tot_loss[loss=0.06815, simple_loss=0.09233, pruned_loss=0.01309, audio_tagging_loss=0.00889, over 3041091.98 frames. ], batch size: 53, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:52:07,209 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472350 2023-11-27 16:52:38,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3149120.0, ans=0.2 2023-11-27 16:52:41,857 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3450, loss[loss=0.06833, simple_loss=0.09174, pruned_loss=0.0131, audio_tagging_loss=0.009366, over 15499.00 frames. ], tot_loss[loss=0.06755, simple_loss=0.09171, pruned_loss=0.0129, audio_tagging_loss=0.008803, over 3047222.89 frames. ], batch size: 58, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:52:53,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3149253.3333333335, ans=0.125 2023-11-27 16:53:04,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3149320.0, ans=0.0 2023-11-27 16:53:05,244 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472400 2023-11-27 16:53:16,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3149386.6666666665, ans=0.1 2023-11-27 16:53:20,463 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.772e+01 9.450e+01 1.013e+02 1.492e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 16:53:24,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3149386.6666666665, ans=0.0 2023-11-27 16:53:35,591 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:53:39,855 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3500, loss[loss=0.08204, simple_loss=0.1128, pruned_loss=0.01641, audio_tagging_loss=0.009209, over 15851.00 frames. ], tot_loss[loss=0.06749, simple_loss=0.09203, pruned_loss=0.01279, audio_tagging_loss=0.008681, over 3047954.29 frames. ], batch size: 59, lr: 1.70e-03, grad_scale: 16.0 2023-11-27 16:53:55,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3149586.6666666665, ans=0.2 2023-11-27 16:53:58,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3149586.6666666665, ans=0.035 2023-11-27 16:54:02,584 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.90 vs. limit=15.0 2023-11-27 16:54:03,469 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472450 2023-11-27 16:54:13,257 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:54:34,536 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2023-11-27 16:54:37,432 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3550, loss[loss=0.04531, simple_loss=0.05912, pruned_loss=0.008132, audio_tagging_loss=0.007616, over 15535.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09071, pruned_loss=0.01261, audio_tagging_loss=0.008676, over 3045933.35 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 16:54:54,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3149920.0, ans=0.125 2023-11-27 16:54:59,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3149986.6666666665, ans=0.025 2023-11-27 16:54:59,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3149986.6666666665, ans=0.04949747468305833 2023-11-27 16:55:00,524 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472500 2023-11-27 16:55:15,971 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.604e+01 9.006e+01 9.735e+01 1.232e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-27 16:55:35,491 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3600, loss[loss=0.07719, simple_loss=0.1055, pruned_loss=0.0148, audio_tagging_loss=0.009646, over 15608.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08984, pruned_loss=0.01242, audio_tagging_loss=0.008719, over 3045241.08 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 16:55:40,521 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-11-27 16:55:49,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3150253.3333333335, ans=0.125 2023-11-27 16:55:56,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3150253.3333333335, ans=0.04949747468305833 2023-11-27 16:55:58,144 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472550 2023-11-27 16:56:13,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3150386.6666666665, ans=0.015 2023-11-27 16:56:23,285 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2023-11-27 16:56:33,328 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3650, loss[loss=0.09112, simple_loss=0.1227, pruned_loss=0.02273, audio_tagging_loss=0.007048, over 15562.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09071, pruned_loss=0.01276, audio_tagging_loss=0.008713, over 3047178.97 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 16:56:38,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3150520.0, ans=0.125 2023-11-27 16:56:42,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3150520.0, ans=0.1 2023-11-27 16:56:47,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3150586.6666666665, ans=0.125 2023-11-27 16:56:56,720 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472600 2023-11-27 16:57:11,659 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.156e+01 8.799e+01 9.366e+01 9.854e+01 1.150e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 16:57:24,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3150786.6666666665, ans=0.125 2023-11-27 16:57:26,484 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 16:57:27,847 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2023-11-27 16:57:29,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3150853.3333333335, ans=0.0 2023-11-27 16:57:30,577 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3700, loss[loss=0.0768, simple_loss=0.1029, pruned_loss=0.01518, audio_tagging_loss=0.01019, over 15990.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09055, pruned_loss=0.01272, audio_tagging_loss=0.00873, over 3049257.30 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 16:57:45,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3150920.0, ans=0.2 2023-11-27 16:57:53,930 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472650 2023-11-27 16:57:59,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3150986.6666666665, ans=0.125 2023-11-27 16:58:04,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3151053.3333333335, ans=0.0 2023-11-27 16:58:10,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3151053.3333333335, ans=0.1 2023-11-27 16:58:12,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3151053.3333333335, ans=0.0 2023-11-27 16:58:21,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3151120.0, ans=0.125 2023-11-27 16:58:28,641 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3750, loss[loss=0.07024, simple_loss=0.1047, pruned_loss=0.01211, audio_tagging_loss=0.005776, over 15175.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09016, pruned_loss=0.01251, audio_tagging_loss=0.008681, over 3051077.78 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 16:58:51,246 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472700 2023-11-27 16:58:59,374 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.94 vs. limit=15.0 2023-11-27 16:59:07,642 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.997e+01 9.607e+01 1.030e+02 1.522e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-27 16:59:12,469 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 16:59:26,485 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3800, loss[loss=0.08139, simple_loss=0.1116, pruned_loss=0.01703, audio_tagging_loss=0.008537, over 14222.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09122, pruned_loss=0.01267, audio_tagging_loss=0.008696, over 3050448.43 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 16:59:31,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3151520.0, ans=0.07 2023-11-27 16:59:44,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3151586.6666666665, ans=10.0 2023-11-27 16:59:49,609 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472750 2023-11-27 16:59:52,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3151653.3333333335, ans=0.1 2023-11-27 17:00:05,662 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.34 vs. limit=15.0 2023-11-27 17:00:23,107 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3850, loss[loss=0.05765, simple_loss=0.0765, pruned_loss=0.01003, audio_tagging_loss=0.009369, over 14521.00 frames. ], tot_loss[loss=0.06727, simple_loss=0.09185, pruned_loss=0.01268, audio_tagging_loss=0.008663, over 3047089.52 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:00:46,431 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472800 2023-11-27 17:01:02,648 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.917e+01 9.426e+01 9.996e+01 1.241e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-27 17:01:21,833 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3900, loss[loss=0.07274, simple_loss=0.1001, pruned_loss=0.01511, audio_tagging_loss=0.007604, over 15929.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09149, pruned_loss=0.01248, audio_tagging_loss=0.008775, over 3046901.09 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:01:23,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3152186.6666666665, ans=0.2 2023-11-27 17:01:24,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3152186.6666666665, ans=0.2 2023-11-27 17:01:44,288 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472850 2023-11-27 17:01:45,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3152320.0, ans=0.1 2023-11-27 17:01:50,094 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2023-11-27 17:01:57,892 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.58 vs. limit=22.5 2023-11-27 17:02:18,846 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 3950, loss[loss=0.06266, simple_loss=0.08144, pruned_loss=0.0142, audio_tagging_loss=0.007733, over 14404.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09078, pruned_loss=0.01241, audio_tagging_loss=0.008915, over 3052967.50 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:02:25,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3152520.0, ans=0.125 2023-11-27 17:02:27,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3152520.0, ans=0.0 2023-11-27 17:02:30,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3152586.6666666665, ans=0.125 2023-11-27 17:02:41,787 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472900 2023-11-27 17:02:46,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3152653.3333333335, ans=0.0 2023-11-27 17:02:50,446 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:02:50,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3152653.3333333335, ans=0.0 2023-11-27 17:02:58,015 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.787e+01 9.336e+01 1.021e+02 1.304e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 17:03:10,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3152786.6666666665, ans=0.0 2023-11-27 17:03:16,288 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4000, loss[loss=0.06835, simple_loss=0.09355, pruned_loss=0.01086, audio_tagging_loss=0.01072, over 15666.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09094, pruned_loss=0.01248, audio_tagging_loss=0.008896, over 3053624.73 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:03:25,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3152853.3333333335, ans=0.025 2023-11-27 17:03:31,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3152920.0, ans=0.125 2023-11-27 17:03:36,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3152920.0, ans=0.125 2023-11-27 17:03:39,899 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 472950 2023-11-27 17:04:00,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3153053.3333333335, ans=0.125 2023-11-27 17:04:02,060 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-11-27 17:04:11,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3153120.0, ans=0.2 2023-11-27 17:04:12,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3153186.6666666665, ans=0.125 2023-11-27 17:04:13,842 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4050, loss[loss=0.0531, simple_loss=0.06417, pruned_loss=0.006322, audio_tagging_loss=0.01469, over 16017.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09114, pruned_loss=0.01259, audio_tagging_loss=0.008937, over 3045985.45 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:04:21,529 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:04:34,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3153253.3333333335, ans=0.0 2023-11-27 17:04:37,464 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473000 2023-11-27 17:04:48,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3153386.6666666665, ans=0.1 2023-11-27 17:04:53,032 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.948e+01 9.021e+01 9.575e+01 1.043e+02 1.402e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 17:04:54,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3153386.6666666665, ans=0.0 2023-11-27 17:05:12,224 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4100, loss[loss=0.06111, simple_loss=0.08782, pruned_loss=0.008715, audio_tagging_loss=0.008481, over 15448.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09108, pruned_loss=0.01253, audio_tagging_loss=0.00901, over 3041334.31 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:05:21,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3153520.0, ans=0.125 2023-11-27 17:05:24,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.57 vs. limit=15.0 2023-11-27 17:05:33,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3153653.3333333335, ans=0.125 2023-11-27 17:05:34,788 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473050 2023-11-27 17:05:58,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=3153786.6666666665, ans=0.1 2023-11-27 17:06:09,972 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4150, loss[loss=0.07827, simple_loss=0.1107, pruned_loss=0.01524, audio_tagging_loss=0.007703, over 15095.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09159, pruned_loss=0.01275, audio_tagging_loss=0.008799, over 3041011.93 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:06:27,362 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.41 vs. limit=15.0 2023-11-27 17:06:31,169 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2023-11-27 17:06:32,970 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473100 2023-11-27 17:06:44,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=3154053.3333333335, ans=0.2 2023-11-27 17:06:48,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3154053.3333333335, ans=0.0 2023-11-27 17:06:50,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.25 vs. limit=15.0 2023-11-27 17:06:50,623 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.636e+01 9.410e+01 1.013e+02 1.260e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 17:06:55,194 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:06:56,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3154120.0, ans=0.125 2023-11-27 17:07:00,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3154120.0, ans=0.1 2023-11-27 17:07:05,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3154120.0, ans=0.0 2023-11-27 17:07:07,796 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4200, loss[loss=0.05211, simple_loss=0.06396, pruned_loss=0.01002, audio_tagging_loss=0.01011, over 13860.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.0907, pruned_loss=0.01262, audio_tagging_loss=0.008784, over 3043504.14 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:07:11,398 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.12 vs. limit=12.0 2023-11-27 17:07:13,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3154186.6666666665, ans=0.125 2023-11-27 17:07:18,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3154253.3333333335, ans=0.125 2023-11-27 17:07:23,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3154253.3333333335, ans=0.0 2023-11-27 17:07:28,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3154253.3333333335, ans=0.125 2023-11-27 17:07:31,560 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473150 2023-11-27 17:07:48,769 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.24 vs. limit=5.0 2023-11-27 17:08:04,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3154520.0, ans=0.125 2023-11-27 17:08:05,637 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4250, loss[loss=0.07112, simple_loss=0.09253, pruned_loss=0.01579, audio_tagging_loss=0.009066, over 15221.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09165, pruned_loss=0.01286, audio_tagging_loss=0.008688, over 3048665.04 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:08:08,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3154520.0, ans=0.125 2023-11-27 17:08:19,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3154586.6666666665, ans=0.1 2023-11-27 17:08:20,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3154586.6666666665, ans=0.0 2023-11-27 17:08:28,847 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473200 2023-11-27 17:08:46,607 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 8.728e+01 9.330e+01 9.912e+01 1.216e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-27 17:09:02,540 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2023-11-27 17:09:04,284 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4300, loss[loss=0.07601, simple_loss=0.1064, pruned_loss=0.01568, audio_tagging_loss=0.007115, over 14958.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09152, pruned_loss=0.01292, audio_tagging_loss=0.008672, over 3048250.99 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:09:10,164 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=12.0 2023-11-27 17:09:24,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3154920.0, ans=0.125 2023-11-27 17:09:27,284 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473250 2023-11-27 17:09:32,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3154986.6666666665, ans=0.1 2023-11-27 17:09:40,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3155053.3333333335, ans=0.125 2023-11-27 17:09:40,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3155053.3333333335, ans=0.07 2023-11-27 17:09:55,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3155120.0, ans=0.125 2023-11-27 17:09:55,775 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.48 vs. limit=22.5 2023-11-27 17:10:00,572 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4350, loss[loss=0.06692, simple_loss=0.08709, pruned_loss=0.01401, audio_tagging_loss=0.009359, over 15039.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09139, pruned_loss=0.01297, audio_tagging_loss=0.00864, over 3050312.37 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:10:04,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.65 vs. limit=15.0 2023-11-27 17:10:17,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3155253.3333333335, ans=0.125 2023-11-27 17:10:24,463 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473300 2023-11-27 17:10:24,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3155320.0, ans=0.2 2023-11-27 17:10:41,005 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.765e+01 9.493e+01 1.043e+02 1.484e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-27 17:10:50,171 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2023-11-27 17:10:58,267 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=12.0 2023-11-27 17:10:58,669 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4400, loss[loss=0.06668, simple_loss=0.09391, pruned_loss=0.01375, audio_tagging_loss=0.005982, over 16194.00 frames. ], tot_loss[loss=0.06759, simple_loss=0.09192, pruned_loss=0.01309, audio_tagging_loss=0.008536, over 3058603.97 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:11:19,057 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2023-11-27 17:11:21,939 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473350 2023-11-27 17:11:51,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3155786.6666666665, ans=0.125 2023-11-27 17:11:57,080 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4450, loss[loss=0.05055, simple_loss=0.06284, pruned_loss=0.008118, audio_tagging_loss=0.01102, over 15368.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09203, pruned_loss=0.01305, audio_tagging_loss=0.008504, over 3053876.53 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:12:01,053 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2023-11-27 17:12:08,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3155920.0, ans=0.0 2023-11-27 17:12:19,365 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473400 2023-11-27 17:12:28,497 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.27 vs. limit=15.0 2023-11-27 17:12:38,301 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.039e+01 8.835e+01 9.403e+01 1.018e+02 2.786e+02, threshold=1.881e+02, percent-clipped=1.0 2023-11-27 17:12:45,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3156120.0, ans=0.125 2023-11-27 17:12:54,396 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4500, loss[loss=0.07221, simple_loss=0.09632, pruned_loss=0.01358, audio_tagging_loss=0.01047, over 15942.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09091, pruned_loss=0.01269, audio_tagging_loss=0.008618, over 3058929.30 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:13:12,178 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-11-27 17:13:13,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3156253.3333333335, ans=0.1 2023-11-27 17:13:16,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3156320.0, ans=0.2 2023-11-27 17:13:17,174 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473450 2023-11-27 17:13:22,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3156320.0, ans=0.07 2023-11-27 17:13:28,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3156386.6666666665, ans=0.125 2023-11-27 17:13:49,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3156453.3333333335, ans=0.125 2023-11-27 17:13:52,307 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4550, loss[loss=0.06845, simple_loss=0.09366, pruned_loss=0.01512, audio_tagging_loss=0.006502, over 17141.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09097, pruned_loss=0.01275, audio_tagging_loss=0.008657, over 3060050.93 frames. ], batch size: 64, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:13:52,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3156520.0, ans=0.125 2023-11-27 17:14:01,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3156520.0, ans=0.125 2023-11-27 17:14:01,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3156520.0, ans=0.125 2023-11-27 17:14:03,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3156586.6666666665, ans=0.09899494936611666 2023-11-27 17:14:09,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3156586.6666666665, ans=0.125 2023-11-27 17:14:12,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3156586.6666666665, ans=0.125 2023-11-27 17:14:15,576 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473500 2023-11-27 17:14:17,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3156653.3333333335, ans=0.125 2023-11-27 17:14:33,780 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.569e+01 9.256e+01 9.932e+01 4.356e+02, threshold=1.851e+02, percent-clipped=1.0 2023-11-27 17:14:39,254 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:14:39,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3156786.6666666665, ans=0.125 2023-11-27 17:14:49,599 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4600, loss[loss=0.07025, simple_loss=0.09836, pruned_loss=0.01449, audio_tagging_loss=0.006585, over 15809.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09061, pruned_loss=0.01257, audio_tagging_loss=0.008701, over 3051350.04 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:15:00,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3156920.0, ans=0.0 2023-11-27 17:15:00,674 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.43 vs. limit=15.0 2023-11-27 17:15:03,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3156920.0, ans=0.125 2023-11-27 17:15:05,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3156920.0, ans=10.0 2023-11-27 17:15:12,808 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473550 2023-11-27 17:15:24,547 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2023-11-27 17:15:25,846 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.47 vs. limit=10.0 2023-11-27 17:15:26,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3157053.3333333335, ans=0.05 2023-11-27 17:15:47,486 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4650, loss[loss=0.05885, simple_loss=0.0753, pruned_loss=0.01088, audio_tagging_loss=0.01032, over 15044.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09043, pruned_loss=0.01261, audio_tagging_loss=0.008787, over 3058598.24 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:15:53,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3157186.6666666665, ans=0.0 2023-11-27 17:15:56,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3157186.6666666665, ans=0.125 2023-11-27 17:16:05,163 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.27 vs. limit=22.5 2023-11-27 17:16:10,317 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473600 2023-11-27 17:16:10,727 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2023-11-27 17:16:20,941 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.84 vs. limit=10.0 2023-11-27 17:16:29,573 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.658e+01 8.758e+01 9.328e+01 1.030e+02 1.229e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-27 17:16:29,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3157386.6666666665, ans=0.2 2023-11-27 17:16:42,672 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.39 vs. limit=15.0 2023-11-27 17:16:44,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3157520.0, ans=0.1 2023-11-27 17:16:45,830 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4700, loss[loss=0.08763, simple_loss=0.1192, pruned_loss=0.02008, audio_tagging_loss=0.007932, over 14946.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09019, pruned_loss=0.01268, audio_tagging_loss=0.008906, over 3060058.77 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:17:08,397 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473650 2023-11-27 17:17:16,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3157653.3333333335, ans=0.125 2023-11-27 17:17:26,269 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=12.0 2023-11-27 17:17:29,046 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.10 vs. limit=10.0 2023-11-27 17:17:43,357 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4750, loss[loss=0.08159, simple_loss=0.1079, pruned_loss=0.01738, audio_tagging_loss=0.01026, over 14049.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09082, pruned_loss=0.01288, audio_tagging_loss=0.009, over 3053173.48 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:17:53,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.08 vs. limit=15.0 2023-11-27 17:18:06,428 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473700 2023-11-27 17:18:22,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3158053.3333333335, ans=0.125 2023-11-27 17:18:24,342 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.634e+01 8.859e+01 9.575e+01 1.045e+02 1.210e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 17:18:40,205 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4800, loss[loss=0.07653, simple_loss=0.1067, pruned_loss=0.0156, audio_tagging_loss=0.007577, over 14976.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09043, pruned_loss=0.01283, audio_tagging_loss=0.008994, over 3048890.41 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:18:55,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3158253.3333333335, ans=0.0 2023-11-27 17:19:03,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3158320.0, ans=0.125 2023-11-27 17:19:03,901 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473750 2023-11-27 17:19:07,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3158320.0, ans=0.125 2023-11-27 17:19:21,012 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:19:24,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3158386.6666666665, ans=0.1 2023-11-27 17:19:38,165 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4850, loss[loss=0.05139, simple_loss=0.06122, pruned_loss=0.01005, audio_tagging_loss=0.01073, over 13813.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09013, pruned_loss=0.01289, audio_tagging_loss=0.009084, over 3041456.61 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:19:42,580 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2023-11-27 17:19:51,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3158586.6666666665, ans=0.125 2023-11-27 17:19:57,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3158586.6666666665, ans=0.0 2023-11-27 17:20:01,591 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473800 2023-11-27 17:20:07,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3158653.3333333335, ans=0.125 2023-11-27 17:20:20,710 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 8.680e+01 9.364e+01 9.927e+01 1.620e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 17:20:22,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3158720.0, ans=0.125 2023-11-27 17:20:25,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3158786.6666666665, ans=0.125 2023-11-27 17:20:28,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3158786.6666666665, ans=0.0 2023-11-27 17:20:31,415 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:20:36,692 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4900, loss[loss=0.07812, simple_loss=0.1083, pruned_loss=0.019, audio_tagging_loss=0.004995, over 15418.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09036, pruned_loss=0.01292, audio_tagging_loss=0.008935, over 3041749.86 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:20:49,260 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.59 vs. limit=5.0 2023-11-27 17:20:56,540 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.78 vs. limit=10.0 2023-11-27 17:21:00,023 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473850 2023-11-27 17:21:04,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3158986.6666666665, ans=0.0 2023-11-27 17:21:08,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3158986.6666666665, ans=0.125 2023-11-27 17:21:23,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3159120.0, ans=0.125 2023-11-27 17:21:34,307 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 4950, loss[loss=0.05856, simple_loss=0.08223, pruned_loss=0.01006, audio_tagging_loss=0.007384, over 15564.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09078, pruned_loss=0.01283, audio_tagging_loss=0.008865, over 3039452.60 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:21:40,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=3159186.6666666665, ans=0.1 2023-11-27 17:21:57,368 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473900 2023-11-27 17:21:57,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3159320.0, ans=0.125 2023-11-27 17:22:07,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3159386.6666666665, ans=0.125 2023-11-27 17:22:09,393 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.41 vs. limit=6.0 2023-11-27 17:22:16,608 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.075e+01 8.677e+01 9.528e+01 1.024e+02 1.553e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-27 17:22:16,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3159386.6666666665, ans=0.2 2023-11-27 17:22:31,925 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5000, loss[loss=0.05752, simple_loss=0.07721, pruned_loss=0.0105, audio_tagging_loss=0.008413, over 15240.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09089, pruned_loss=0.01287, audio_tagging_loss=0.008641, over 3037716.20 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:22:55,046 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 473950 2023-11-27 17:22:55,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3159653.3333333335, ans=0.2 2023-11-27 17:22:55,380 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.00 vs. limit=15.0 2023-11-27 17:23:02,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3159653.3333333335, ans=0.1 2023-11-27 17:23:13,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3159720.0, ans=0.125 2023-11-27 17:23:24,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3159786.6666666665, ans=0.125 2023-11-27 17:23:29,511 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5050, loss[loss=0.05988, simple_loss=0.07826, pruned_loss=0.01346, audio_tagging_loss=0.00729, over 14178.00 frames. ], tot_loss[loss=0.06747, simple_loss=0.09182, pruned_loss=0.01296, audio_tagging_loss=0.008601, over 3042186.77 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:23:33,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3159853.3333333335, ans=0.0 2023-11-27 17:23:37,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3159853.3333333335, ans=0.125 2023-11-27 17:23:44,045 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.23 vs. limit=15.0 2023-11-27 17:23:52,195 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474000 2023-11-27 17:23:52,694 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2023-11-27 17:23:55,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3159986.6666666665, ans=0.0 2023-11-27 17:23:56,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3159986.6666666665, ans=0.1 2023-11-27 17:23:57,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3159986.6666666665, ans=0.2 2023-11-27 17:23:58,911 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.19 vs. limit=10.0 2023-11-27 17:24:05,193 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.51 vs. limit=15.0 2023-11-27 17:24:12,647 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.792e+01 8.599e+01 9.260e+01 9.891e+01 1.238e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-27 17:24:18,275 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-11-27 17:24:20,139 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:24:20,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3160120.0, ans=0.0 2023-11-27 17:24:27,728 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5100, loss[loss=0.05863, simple_loss=0.07093, pruned_loss=0.01113, audio_tagging_loss=0.01203, over 14802.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09107, pruned_loss=0.01286, audio_tagging_loss=0.008635, over 3048906.80 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:24:34,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3160186.6666666665, ans=10.0 2023-11-27 17:24:40,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3160253.3333333335, ans=0.2 2023-11-27 17:24:51,258 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474050 2023-11-27 17:25:21,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3160453.3333333335, ans=0.125 2023-11-27 17:25:24,990 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5150, loss[loss=0.06351, simple_loss=0.08802, pruned_loss=0.01267, audio_tagging_loss=0.006829, over 15945.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.0901, pruned_loss=0.0127, audio_tagging_loss=0.008619, over 3043412.61 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:25:27,699 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=15.0 2023-11-27 17:25:37,022 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:25:47,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3160653.3333333335, ans=0.2 2023-11-27 17:25:48,583 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474100 2023-11-27 17:26:07,188 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.651e+01 9.333e+01 9.963e+01 1.109e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 17:26:22,465 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5200, loss[loss=0.05301, simple_loss=0.06032, pruned_loss=0.01112, audio_tagging_loss=0.01173, over 14817.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09043, pruned_loss=0.01269, audio_tagging_loss=0.008607, over 3043198.38 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:26:45,120 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474150 2023-11-27 17:26:54,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3160986.6666666665, ans=0.125 2023-11-27 17:26:56,146 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.50 vs. limit=15.0 2023-11-27 17:27:06,073 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.95 vs. limit=12.0 2023-11-27 17:27:10,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3161120.0, ans=0.1 2023-11-27 17:27:20,079 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5250, loss[loss=0.05621, simple_loss=0.07931, pruned_loss=0.01029, audio_tagging_loss=0.006271, over 15347.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.0902, pruned_loss=0.01276, audio_tagging_loss=0.008627, over 3043122.17 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:27:37,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3161253.3333333335, ans=0.0 2023-11-27 17:27:37,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3161253.3333333335, ans=0.2 2023-11-27 17:27:42,542 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474200 2023-11-27 17:27:44,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3161320.0, ans=0.125 2023-11-27 17:27:51,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3161320.0, ans=0.1 2023-11-27 17:28:02,463 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.07 vs. limit=12.0 2023-11-27 17:28:03,022 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.456e+01 8.718e+01 9.401e+01 1.041e+02 1.435e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-27 17:28:08,166 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.32 vs. limit=22.5 2023-11-27 17:28:15,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3161453.3333333335, ans=0.125 2023-11-27 17:28:17,163 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5300, loss[loss=0.06851, simple_loss=0.0897, pruned_loss=0.01473, audio_tagging_loss=0.008932, over 14042.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09024, pruned_loss=0.01276, audio_tagging_loss=0.008604, over 3044609.49 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:28:26,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3161520.0, ans=0.04949747468305833 2023-11-27 17:28:27,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3161586.6666666665, ans=0.125 2023-11-27 17:28:37,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3161586.6666666665, ans=0.0 2023-11-27 17:28:40,937 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474250 2023-11-27 17:28:42,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3161653.3333333335, ans=0.0 2023-11-27 17:28:48,859 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:28:53,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3161720.0, ans=0.2 2023-11-27 17:29:05,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3161786.6666666665, ans=0.04949747468305833 2023-11-27 17:29:06,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3161786.6666666665, ans=0.0 2023-11-27 17:29:14,709 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5350, loss[loss=0.07008, simple_loss=0.09213, pruned_loss=0.01392, audio_tagging_loss=0.01009, over 15416.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.09132, pruned_loss=0.01288, audio_tagging_loss=0.008532, over 3049084.13 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:29:22,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3161853.3333333335, ans=0.2 2023-11-27 17:29:31,929 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.51 vs. limit=10.0 2023-11-27 17:29:37,998 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474300 2023-11-27 17:29:48,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3162053.3333333335, ans=0.5 2023-11-27 17:29:48,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3162053.3333333335, ans=0.025 2023-11-27 17:29:57,552 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 8.549e+01 9.139e+01 9.970e+01 1.797e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-27 17:30:00,493 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=15.0 2023-11-27 17:30:13,039 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5400, loss[loss=0.0495, simple_loss=0.06111, pruned_loss=0.009212, audio_tagging_loss=0.009731, over 14159.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09087, pruned_loss=0.01279, audio_tagging_loss=0.008686, over 3049699.86 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:30:24,094 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:30:29,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3162253.3333333335, ans=0.2 2023-11-27 17:30:35,238 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474350 2023-11-27 17:31:09,435 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5450, loss[loss=0.06699, simple_loss=0.08938, pruned_loss=0.01287, audio_tagging_loss=0.009434, over 14560.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09145, pruned_loss=0.0129, audio_tagging_loss=0.008712, over 3059044.22 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:31:33,100 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474400 2023-11-27 17:31:43,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3162720.0, ans=0.125 2023-11-27 17:31:53,362 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.734e+01 9.322e+01 1.014e+02 1.420e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-27 17:32:07,530 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5500, loss[loss=0.05902, simple_loss=0.08204, pruned_loss=0.01059, audio_tagging_loss=0.007402, over 15368.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.0916, pruned_loss=0.01277, audio_tagging_loss=0.008774, over 3052704.06 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:32:25,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3162920.0, ans=0.035 2023-11-27 17:32:30,730 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474450 2023-11-27 17:32:39,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3162986.6666666665, ans=0.125 2023-11-27 17:32:43,892 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-11-27 17:32:44,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3163053.3333333335, ans=0.125 2023-11-27 17:32:53,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3163120.0, ans=0.125 2023-11-27 17:33:05,365 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5550, loss[loss=0.06951, simple_loss=0.0904, pruned_loss=0.01429, audio_tagging_loss=0.01002, over 13939.00 frames. ], tot_loss[loss=0.06811, simple_loss=0.09267, pruned_loss=0.01294, audio_tagging_loss=0.008837, over 3045477.25 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:33:27,870 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474500 2023-11-27 17:33:30,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3163320.0, ans=0.125 2023-11-27 17:33:31,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3163320.0, ans=0.125 2023-11-27 17:33:32,762 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=22.5 2023-11-27 17:33:49,151 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.390e+01 8.657e+01 9.312e+01 9.840e+01 1.170e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-27 17:34:02,472 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5600, loss[loss=0.08077, simple_loss=0.103, pruned_loss=0.0199, audio_tagging_loss=0.00937, over 14957.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09075, pruned_loss=0.01269, audio_tagging_loss=0.009064, over 3045558.17 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:34:04,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3163520.0, ans=0.0 2023-11-27 17:34:06,268 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.68 vs. limit=22.5 2023-11-27 17:34:08,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3163520.0, ans=0.125 2023-11-27 17:34:09,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3163520.0, ans=0.1 2023-11-27 17:34:20,376 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.00 vs. limit=10.0 2023-11-27 17:34:25,492 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474550 2023-11-27 17:34:31,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3163653.3333333335, ans=0.0 2023-11-27 17:34:40,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3163720.0, ans=0.125 2023-11-27 17:34:47,601 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:34:47,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3163786.6666666665, ans=0.0 2023-11-27 17:34:54,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3163786.6666666665, ans=0.125 2023-11-27 17:34:54,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3163786.6666666665, ans=0.2 2023-11-27 17:34:59,983 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5650, loss[loss=0.06966, simple_loss=0.08928, pruned_loss=0.01354, audio_tagging_loss=0.01148, over 15218.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09041, pruned_loss=0.01252, audio_tagging_loss=0.009143, over 3046518.65 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:35:15,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3163920.0, ans=0.125 2023-11-27 17:35:15,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3163920.0, ans=0.125 2023-11-27 17:35:15,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3163920.0, ans=0.1 2023-11-27 17:35:23,679 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474600 2023-11-27 17:35:45,039 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.769e+01 8.679e+01 9.217e+01 1.003e+02 1.541e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-27 17:35:58,297 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5700, loss[loss=0.06338, simple_loss=0.08615, pruned_loss=0.01189, audio_tagging_loss=0.008417, over 16442.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08948, pruned_loss=0.01227, audio_tagging_loss=0.009201, over 3054815.93 frames. ], batch size: 63, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:36:04,850 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.34 vs. limit=22.5 2023-11-27 17:36:15,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3164253.3333333335, ans=0.015 2023-11-27 17:36:19,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3164320.0, ans=0.125 2023-11-27 17:36:20,619 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474650 2023-11-27 17:36:24,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3164320.0, ans=0.125 2023-11-27 17:36:26,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3164320.0, ans=0.125 2023-11-27 17:36:55,054 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5750, loss[loss=0.07632, simple_loss=0.1014, pruned_loss=0.01575, audio_tagging_loss=0.009863, over 15675.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08906, pruned_loss=0.01227, audio_tagging_loss=0.009041, over 3054047.95 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:36:57,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3164520.0, ans=0.0 2023-11-27 17:37:09,475 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.71 vs. limit=10.0 2023-11-27 17:37:17,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3164653.3333333335, ans=0.0 2023-11-27 17:37:18,035 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474700 2023-11-27 17:37:18,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3164653.3333333335, ans=0.125 2023-11-27 17:37:39,970 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 8.634e+01 9.303e+01 1.008e+02 1.326e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 17:37:47,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3164786.6666666665, ans=0.125 2023-11-27 17:37:52,572 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5800, loss[loss=0.06595, simple_loss=0.08652, pruned_loss=0.01492, audio_tagging_loss=0.007771, over 16057.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08866, pruned_loss=0.01216, audio_tagging_loss=0.008981, over 3048271.76 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:37:57,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3164853.3333333335, ans=0.1 2023-11-27 17:38:07,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3164920.0, ans=0.0 2023-11-27 17:38:11,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3164920.0, ans=10.0 2023-11-27 17:38:14,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3164986.6666666665, ans=0.0 2023-11-27 17:38:15,585 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474750 2023-11-27 17:38:23,189 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.45 vs. limit=10.0 2023-11-27 17:38:30,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3165053.3333333335, ans=0.1 2023-11-27 17:38:43,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3165120.0, ans=0.125 2023-11-27 17:38:44,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3165120.0, ans=0.0 2023-11-27 17:38:49,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3165186.6666666665, ans=0.07 2023-11-27 17:38:49,885 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5850, loss[loss=0.05371, simple_loss=0.06645, pruned_loss=0.007433, audio_tagging_loss=0.01305, over 15557.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08911, pruned_loss=0.0123, audio_tagging_loss=0.008955, over 3048011.27 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:38:52,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3165186.6666666665, ans=0.0 2023-11-27 17:38:55,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3165186.6666666665, ans=0.2 2023-11-27 17:38:56,577 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.99 vs. limit=15.0 2023-11-27 17:39:13,004 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474800 2023-11-27 17:39:35,139 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.752e+01 8.698e+01 9.361e+01 9.946e+01 1.172e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-27 17:39:44,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3165453.3333333335, ans=0.125 2023-11-27 17:39:48,452 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5900, loss[loss=0.08078, simple_loss=0.1128, pruned_loss=0.01812, audio_tagging_loss=0.006278, over 15120.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08927, pruned_loss=0.01248, audio_tagging_loss=0.00888, over 3040835.23 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:40:08,367 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.63 vs. limit=10.0 2023-11-27 17:40:11,568 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474850 2023-11-27 17:40:19,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3165653.3333333335, ans=0.125 2023-11-27 17:40:22,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.91 vs. limit=15.0 2023-11-27 17:40:46,220 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 5950, loss[loss=0.05356, simple_loss=0.06948, pruned_loss=0.01076, audio_tagging_loss=0.008062, over 15687.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08839, pruned_loss=0.01228, audio_tagging_loss=0.008849, over 3045865.80 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:40:49,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3165853.3333333335, ans=0.125 2023-11-27 17:40:55,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3165853.3333333335, ans=0.95 2023-11-27 17:40:59,107 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2023-11-27 17:41:09,172 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474900 2023-11-27 17:41:30,962 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.669e+01 9.306e+01 1.020e+02 1.374e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 17:41:43,436 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6000, loss[loss=0.0441, simple_loss=0.05608, pruned_loss=0.005635, audio_tagging_loss=0.01043, over 14942.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08771, pruned_loss=0.01209, audio_tagging_loss=0.008849, over 3039037.31 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:41:43,436 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 17:42:05,881 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9008, 1.5449, 3.4944, 2.9794, 2.8221, 3.1102, 3.1264, 3.0686], device='cuda:2') 2023-11-27 17:42:18,065 INFO [train_asr.py:1267] (2/4) Epoch 40, validation: loss=0.05751, simple_loss=0.05064, pruned_loss=0.005151, audio_tagging_loss=0.02703, over 4681554.00 frames. 2023-11-27 17:42:18,066 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 17:42:20,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3166186.6666666665, ans=0.125 2023-11-27 17:42:40,803 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 474950 2023-11-27 17:42:48,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3166320.0, ans=0.2 2023-11-27 17:42:59,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3166386.6666666665, ans=0.125 2023-11-27 17:43:02,559 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 17:43:14,949 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6050, loss[loss=0.06334, simple_loss=0.08393, pruned_loss=0.01178, audio_tagging_loss=0.009596, over 15349.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08746, pruned_loss=0.01212, audio_tagging_loss=0.008795, over 3039942.77 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:43:21,429 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.07 vs. limit=22.5 2023-11-27 17:43:24,697 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=15.0 2023-11-27 17:43:38,175 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475000 2023-11-27 17:43:45,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3166653.3333333335, ans=0.0 2023-11-27 17:43:47,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3166653.3333333335, ans=0.125 2023-11-27 17:43:50,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3166720.0, ans=0.1 2023-11-27 17:43:50,364 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.31 vs. limit=22.5 2023-11-27 17:44:01,699 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.500e+01 9.097e+01 9.950e+01 1.272e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-27 17:44:02,389 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=15.0 2023-11-27 17:44:12,682 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6100, loss[loss=0.07663, simple_loss=0.1106, pruned_loss=0.01567, audio_tagging_loss=0.005676, over 15758.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08774, pruned_loss=0.0121, audio_tagging_loss=0.008761, over 3043520.87 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:44:33,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3166920.0, ans=0.0 2023-11-27 17:44:35,833 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475050 2023-11-27 17:44:36,780 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2023-11-27 17:44:37,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3166986.6666666665, ans=0.5 2023-11-27 17:45:10,565 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6150, loss[loss=0.05782, simple_loss=0.07667, pruned_loss=0.01044, audio_tagging_loss=0.009048, over 15016.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.087, pruned_loss=0.01199, audio_tagging_loss=0.008987, over 3040636.14 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:45:14,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3167186.6666666665, ans=0.2 2023-11-27 17:45:15,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3167186.6666666665, ans=0.09899494936611666 2023-11-27 17:45:19,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3167186.6666666665, ans=0.0 2023-11-27 17:45:19,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3167186.6666666665, ans=0.125 2023-11-27 17:45:27,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3167253.3333333335, ans=0.125 2023-11-27 17:45:34,336 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475100 2023-11-27 17:45:40,994 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 17:45:49,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3167386.6666666665, ans=0.125 2023-11-27 17:45:56,718 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.777e+01 9.490e+01 1.001e+02 1.284e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 17:46:08,776 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6200, loss[loss=0.06952, simple_loss=0.09125, pruned_loss=0.01472, audio_tagging_loss=0.009176, over 16036.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.0876, pruned_loss=0.01207, audio_tagging_loss=0.008959, over 3041847.33 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:46:12,582 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=15.0 2023-11-27 17:46:16,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3167520.0, ans=0.125 2023-11-27 17:46:31,931 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475150 2023-11-27 17:47:04,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3167853.3333333335, ans=10.0 2023-11-27 17:47:04,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3167853.3333333335, ans=0.1 2023-11-27 17:47:05,734 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6250, loss[loss=0.05763, simple_loss=0.06471, pruned_loss=0.01332, audio_tagging_loss=0.01196, over 14871.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08795, pruned_loss=0.0123, audio_tagging_loss=0.008998, over 3037844.95 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:47:07,383 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.27 vs. limit=22.5 2023-11-27 17:47:28,406 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475200 2023-11-27 17:47:52,283 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.766e+01 9.317e+01 1.001e+02 1.294e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 17:47:59,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3168120.0, ans=0.125 2023-11-27 17:48:03,992 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6300, loss[loss=0.06244, simple_loss=0.07613, pruned_loss=0.01471, audio_tagging_loss=0.009666, over 15505.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08852, pruned_loss=0.01238, audio_tagging_loss=0.009067, over 3044916.18 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:48:06,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.18 vs. limit=15.0 2023-11-27 17:48:10,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3168186.6666666665, ans=0.1 2023-11-27 17:48:16,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3168253.3333333335, ans=0.0 2023-11-27 17:48:27,737 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475250 2023-11-27 17:48:30,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3168320.0, ans=0.0 2023-11-27 17:48:31,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=3168320.0, ans=0.02 2023-11-27 17:48:34,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.19 vs. limit=6.0 2023-11-27 17:48:42,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3168386.6666666665, ans=0.1 2023-11-27 17:48:46,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3168386.6666666665, ans=15.0 2023-11-27 17:48:54,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3168453.3333333335, ans=0.2 2023-11-27 17:49:01,771 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6350, loss[loss=0.07415, simple_loss=0.09703, pruned_loss=0.0186, audio_tagging_loss=0.007029, over 15100.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.0886, pruned_loss=0.0125, audio_tagging_loss=0.009111, over 3040552.69 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:49:04,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3168520.0, ans=0.0 2023-11-27 17:49:14,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3168586.6666666665, ans=0.125 2023-11-27 17:49:16,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3168586.6666666665, ans=0.125 2023-11-27 17:49:17,725 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2023-11-27 17:49:25,257 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475300 2023-11-27 17:49:26,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.19 vs. limit=10.0 2023-11-27 17:49:39,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3168720.0, ans=0.07 2023-11-27 17:49:47,724 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 8.715e+01 9.486e+01 1.017e+02 1.352e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-27 17:50:00,008 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6400, loss[loss=0.07103, simple_loss=0.09039, pruned_loss=0.016, audio_tagging_loss=0.009841, over 14886.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.08968, pruned_loss=0.01267, audio_tagging_loss=0.009069, over 3044887.63 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:50:11,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3168920.0, ans=0.0 2023-11-27 17:50:21,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3168986.6666666665, ans=0.125 2023-11-27 17:50:22,398 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475350 2023-11-27 17:50:57,174 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6450, loss[loss=0.05404, simple_loss=0.06594, pruned_loss=0.009851, audio_tagging_loss=0.01122, over 15447.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.08977, pruned_loss=0.01257, audio_tagging_loss=0.009116, over 3037563.28 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:51:07,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3169253.3333333335, ans=0.125 2023-11-27 17:51:13,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3169253.3333333335, ans=0.0 2023-11-27 17:51:20,169 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475400 2023-11-27 17:51:24,001 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.60 vs. limit=10.0 2023-11-27 17:51:42,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3169453.3333333335, ans=0.125 2023-11-27 17:51:44,569 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.677e+01 8.833e+01 9.242e+01 1.006e+02 1.317e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 17:51:48,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3169453.3333333335, ans=0.125 2023-11-27 17:51:53,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3169520.0, ans=0.1 2023-11-27 17:51:54,622 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6500, loss[loss=0.05508, simple_loss=0.07696, pruned_loss=0.01014, audio_tagging_loss=0.006461, over 16479.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09026, pruned_loss=0.01267, audio_tagging_loss=0.009059, over 3034185.18 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:52:05,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3169586.6666666665, ans=0.2 2023-11-27 17:52:06,258 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=22.5 2023-11-27 17:52:17,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2023-11-27 17:52:18,396 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475450 2023-11-27 17:52:43,015 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=15.0 2023-11-27 17:52:53,643 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6550, loss[loss=0.07543, simple_loss=0.1057, pruned_loss=0.01475, audio_tagging_loss=0.007843, over 16051.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.08999, pruned_loss=0.01253, audio_tagging_loss=0.008861, over 3035094.89 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:53:01,995 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2023-11-27 17:53:09,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=12.0 2023-11-27 17:53:13,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3169920.0, ans=0.125 2023-11-27 17:53:16,464 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475500 2023-11-27 17:53:28,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3170053.3333333335, ans=0.125 2023-11-27 17:53:34,300 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=22.5 2023-11-27 17:53:40,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.596e+01 9.247e+01 9.962e+01 1.603e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 17:53:51,298 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6600, loss[loss=0.07713, simple_loss=0.1081, pruned_loss=0.01362, audio_tagging_loss=0.009457, over 15292.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09038, pruned_loss=0.01268, audio_tagging_loss=0.008874, over 3037702.31 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:53:51,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3170186.6666666665, ans=0.125 2023-11-27 17:54:12,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3170320.0, ans=0.1 2023-11-27 17:54:13,313 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.51 vs. limit=10.0 2023-11-27 17:54:13,332 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2023-11-27 17:54:13,807 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475550 2023-11-27 17:54:23,434 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.48 vs. limit=22.5 2023-11-27 17:54:48,460 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6650, loss[loss=0.07699, simple_loss=0.1109, pruned_loss=0.01495, audio_tagging_loss=0.006585, over 15417.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09027, pruned_loss=0.0127, audio_tagging_loss=0.008838, over 3034056.43 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:54:56,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3170520.0, ans=0.125 2023-11-27 17:55:07,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3170586.6666666665, ans=0.0 2023-11-27 17:55:11,991 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475600 2023-11-27 17:55:12,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3170653.3333333335, ans=0.2 2023-11-27 17:55:14,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3170653.3333333335, ans=0.125 2023-11-27 17:55:25,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3170720.0, ans=0.125 2023-11-27 17:55:30,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3170720.0, ans=0.125 2023-11-27 17:55:36,099 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.804e+01 9.430e+01 1.026e+02 1.343e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 17:55:37,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3170786.6666666665, ans=0.125 2023-11-27 17:55:38,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3170786.6666666665, ans=0.1 2023-11-27 17:55:46,567 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6700, loss[loss=0.05642, simple_loss=0.07648, pruned_loss=0.008859, audio_tagging_loss=0.009318, over 14980.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09108, pruned_loss=0.01287, audio_tagging_loss=0.008703, over 3036834.31 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:55:46,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3170853.3333333335, ans=0.0 2023-11-27 17:56:04,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3170920.0, ans=0.125 2023-11-27 17:56:08,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3170986.6666666665, ans=0.015 2023-11-27 17:56:09,800 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475650 2023-11-27 17:56:12,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3170986.6666666665, ans=0.125 2023-11-27 17:56:20,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3171053.3333333335, ans=0.5 2023-11-27 17:56:44,909 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6750, loss[loss=0.07891, simple_loss=0.104, pruned_loss=0.0196, audio_tagging_loss=0.007293, over 15269.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.0905, pruned_loss=0.01278, audio_tagging_loss=0.008753, over 3034783.25 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:56:46,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3171186.6666666665, ans=10.0 2023-11-27 17:56:48,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3171186.6666666665, ans=0.0 2023-11-27 17:56:48,693 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.13 vs. limit=10.0 2023-11-27 17:56:52,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3171186.6666666665, ans=0.125 2023-11-27 17:56:57,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3171253.3333333335, ans=0.1 2023-11-27 17:57:07,558 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475700 2023-11-27 17:57:27,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3171386.6666666665, ans=0.2 2023-11-27 17:57:32,205 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.698e+01 8.726e+01 9.253e+01 9.869e+01 1.204e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-27 17:57:39,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3171453.3333333335, ans=0.125 2023-11-27 17:57:40,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3171453.3333333335, ans=0.125 2023-11-27 17:57:42,253 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6800, loss[loss=0.0795, simple_loss=0.1131, pruned_loss=0.0158, audio_tagging_loss=0.007137, over 15455.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.0897, pruned_loss=0.01248, audio_tagging_loss=0.00884, over 3032720.29 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 17:57:43,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3171520.0, ans=0.0 2023-11-27 17:58:05,065 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475750 2023-11-27 17:58:28,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3171786.6666666665, ans=0.125 2023-11-27 17:58:35,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3171786.6666666665, ans=0.04949747468305833 2023-11-27 17:58:36,915 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.82 vs. limit=15.0 2023-11-27 17:58:39,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3171853.3333333335, ans=6.0 2023-11-27 17:58:40,105 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6850, loss[loss=0.06423, simple_loss=0.09263, pruned_loss=0.01224, audio_tagging_loss=0.00567, over 16161.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08984, pruned_loss=0.01261, audio_tagging_loss=0.0087, over 3035485.98 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:58:40,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3171853.3333333335, ans=0.2 2023-11-27 17:58:48,537 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.29 vs. limit=15.0 2023-11-27 17:58:51,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3171920.0, ans=0.125 2023-11-27 17:59:03,430 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475800 2023-11-27 17:59:17,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3172053.3333333335, ans=0.125 2023-11-27 17:59:18,809 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2023-11-27 17:59:21,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3172053.3333333335, ans=0.2 2023-11-27 17:59:28,683 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.901e+01 9.541e+01 1.005e+02 1.351e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 17:59:34,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3172120.0, ans=0.125 2023-11-27 17:59:37,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.30 vs. limit=15.0 2023-11-27 17:59:38,180 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6900, loss[loss=0.05996, simple_loss=0.08438, pruned_loss=0.01176, audio_tagging_loss=0.006012, over 14990.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08954, pruned_loss=0.01248, audio_tagging_loss=0.008625, over 3028408.63 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 17:59:45,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3172186.6666666665, ans=0.125 2023-11-27 18:00:01,154 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475850 2023-11-27 18:00:23,957 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.96 vs. limit=10.0 2023-11-27 18:00:25,631 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:00:35,528 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2023-11-27 18:00:36,154 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 6950, loss[loss=0.0583, simple_loss=0.07878, pruned_loss=0.008554, audio_tagging_loss=0.01035, over 14234.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08908, pruned_loss=0.01237, audio_tagging_loss=0.008619, over 3029731.69 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:00:59,154 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475900 2023-11-27 18:01:10,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3172720.0, ans=0.05 2023-11-27 18:01:14,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3172720.0, ans=0.0 2023-11-27 18:01:24,335 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.751e+01 9.118e+01 9.607e+01 1.229e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-27 18:01:30,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3172786.6666666665, ans=0.0 2023-11-27 18:01:33,720 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7000, loss[loss=0.06836, simple_loss=0.08592, pruned_loss=0.01492, audio_tagging_loss=0.01049, over 14851.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08923, pruned_loss=0.0125, audio_tagging_loss=0.008716, over 3035488.02 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:01:36,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3172853.3333333335, ans=0.125 2023-11-27 18:01:40,071 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-27 18:01:47,528 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:01:50,553 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.35 vs. limit=6.0 2023-11-27 18:01:56,650 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 475950 2023-11-27 18:02:03,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3172986.6666666665, ans=0.125 2023-11-27 18:02:30,896 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7050, loss[loss=0.04609, simple_loss=0.06873, pruned_loss=0.005365, audio_tagging_loss=0.006361, over 14461.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08873, pruned_loss=0.01239, audio_tagging_loss=0.008779, over 3034004.51 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:02:32,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=12.0 2023-11-27 18:02:43,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3173253.3333333335, ans=0.125 2023-11-27 18:02:46,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3173253.3333333335, ans=0.1 2023-11-27 18:02:47,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3173253.3333333335, ans=0.125 2023-11-27 18:02:54,199 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476000 2023-11-27 18:03:01,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3173320.0, ans=0.07 2023-11-27 18:03:21,881 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.448e+01 8.705e+01 9.232e+01 9.917e+01 1.412e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 18:03:27,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3173453.3333333335, ans=0.125 2023-11-27 18:03:27,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3173453.3333333335, ans=0.125 2023-11-27 18:03:31,277 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7100, loss[loss=0.06129, simple_loss=0.08942, pruned_loss=0.009694, audio_tagging_loss=0.006892, over 14411.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09033, pruned_loss=0.01267, audio_tagging_loss=0.00881, over 3040734.19 frames. ], batch size: 54, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:03:54,139 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476050 2023-11-27 18:04:09,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3173720.0, ans=0.5 2023-11-27 18:04:28,664 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7150, loss[loss=0.06243, simple_loss=0.07663, pruned_loss=0.009628, audio_tagging_loss=0.01449, over 15769.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09007, pruned_loss=0.0126, audio_tagging_loss=0.008904, over 3049017.04 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:04:33,677 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:04:34,069 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.43 vs. limit=15.0 2023-11-27 18:04:47,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3173920.0, ans=0.125 2023-11-27 18:04:51,729 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476100 2023-11-27 18:04:55,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3173986.6666666665, ans=0.125 2023-11-27 18:05:03,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3174053.3333333335, ans=0.0 2023-11-27 18:05:17,250 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.847e+01 9.283e+01 1.002e+02 1.688e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 18:05:17,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3174120.0, ans=0.1 2023-11-27 18:05:21,343 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.21 vs. limit=15.0 2023-11-27 18:05:25,998 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7200, loss[loss=0.07953, simple_loss=0.1112, pruned_loss=0.01659, audio_tagging_loss=0.007313, over 15780.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.08983, pruned_loss=0.01264, audio_tagging_loss=0.008973, over 3043668.42 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:05:31,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3174186.6666666665, ans=0.125 2023-11-27 18:05:35,850 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2023-11-27 18:05:37,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3174253.3333333335, ans=0.125 2023-11-27 18:05:49,100 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476150 2023-11-27 18:06:17,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3174453.3333333335, ans=0.0 2023-11-27 18:06:17,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3174453.3333333335, ans=0.125 2023-11-27 18:06:22,680 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.05 vs. limit=10.0 2023-11-27 18:06:23,148 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7250, loss[loss=0.05206, simple_loss=0.06649, pruned_loss=0.007756, audio_tagging_loss=0.01106, over 14062.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.08997, pruned_loss=0.01266, audio_tagging_loss=0.009041, over 3048781.00 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:06:28,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3174520.0, ans=0.07 2023-11-27 18:06:42,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3174586.6666666665, ans=0.0 2023-11-27 18:06:45,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3174653.3333333335, ans=0.0 2023-11-27 18:06:46,771 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476200 2023-11-27 18:06:53,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3174653.3333333335, ans=0.09899494936611666 2023-11-27 18:07:06,902 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2023-11-27 18:07:11,710 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.649e+01 9.273e+01 9.853e+01 1.291e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 18:07:11,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3174786.6666666665, ans=0.125 2023-11-27 18:07:21,664 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7300, loss[loss=0.05495, simple_loss=0.07808, pruned_loss=0.008944, audio_tagging_loss=0.006964, over 16104.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09033, pruned_loss=0.01253, audio_tagging_loss=0.008925, over 3042349.96 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:07:21,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3174853.3333333335, ans=0.0 2023-11-27 18:07:25,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3174853.3333333335, ans=0.125 2023-11-27 18:07:44,987 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476250 2023-11-27 18:07:45,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3174986.6666666665, ans=0.0 2023-11-27 18:07:53,133 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-27 18:07:58,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3175053.3333333335, ans=0.125 2023-11-27 18:08:08,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3175120.0, ans=0.125 2023-11-27 18:08:17,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3175120.0, ans=0.0 2023-11-27 18:08:19,042 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7350, loss[loss=0.07419, simple_loss=0.1048, pruned_loss=0.01522, audio_tagging_loss=0.006557, over 14779.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09024, pruned_loss=0.01266, audio_tagging_loss=0.008843, over 3042873.47 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:08:19,442 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.26 vs. limit=10.0 2023-11-27 18:08:26,801 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.02 vs. limit=22.5 2023-11-27 18:08:30,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3175253.3333333335, ans=0.125 2023-11-27 18:08:41,497 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476300 2023-11-27 18:09:08,063 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.708e+01 9.249e+01 1.003e+02 1.493e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 18:09:15,709 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7400, loss[loss=0.05641, simple_loss=0.0853, pruned_loss=0.006948, audio_tagging_loss=0.006818, over 13957.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.0905, pruned_loss=0.01264, audio_tagging_loss=0.008816, over 3046683.95 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:09:30,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3175586.6666666665, ans=0.05 2023-11-27 18:09:33,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3175586.6666666665, ans=0.0 2023-11-27 18:09:39,320 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476350 2023-11-27 18:09:43,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3175653.3333333335, ans=0.125 2023-11-27 18:09:48,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3175653.3333333335, ans=0.0 2023-11-27 18:10:06,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3175786.6666666665, ans=0.0 2023-11-27 18:10:12,903 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7450, loss[loss=0.07861, simple_loss=0.1211, pruned_loss=0.01118, audio_tagging_loss=0.006862, over 16123.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09052, pruned_loss=0.01259, audio_tagging_loss=0.008737, over 3048300.66 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:10:14,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3175853.3333333335, ans=0.0 2023-11-27 18:10:36,490 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476400 2023-11-27 18:10:59,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3176120.0, ans=0.125 2023-11-27 18:11:00,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3176120.0, ans=0.1 2023-11-27 18:11:02,891 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.662e+01 9.277e+01 9.892e+01 1.175e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 18:11:04,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3176120.0, ans=0.2 2023-11-27 18:11:11,068 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7500, loss[loss=0.0724, simple_loss=0.08504, pruned_loss=0.01849, audio_tagging_loss=0.01139, over 14504.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09037, pruned_loss=0.01259, audio_tagging_loss=0.00872, over 3041384.41 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:11:22,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3176253.3333333335, ans=0.1 2023-11-27 18:11:33,533 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476450 2023-11-27 18:11:36,447 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-27 18:11:38,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.98 vs. limit=6.0 2023-11-27 18:11:47,150 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2023-11-27 18:11:54,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3176386.6666666665, ans=0.0 2023-11-27 18:11:58,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3176453.3333333335, ans=0.2 2023-11-27 18:12:08,306 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7550, loss[loss=0.0669, simple_loss=0.09541, pruned_loss=0.01148, audio_tagging_loss=0.007718, over 16996.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09034, pruned_loss=0.01249, audio_tagging_loss=0.008664, over 3049927.07 frames. ], batch size: 62, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:12:13,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3176520.0, ans=0.05 2023-11-27 18:12:18,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3176586.6666666665, ans=0.125 2023-11-27 18:12:26,911 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=22.5 2023-11-27 18:12:31,246 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476500 2023-11-27 18:12:51,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3176720.0, ans=0.125 2023-11-27 18:12:53,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3176786.6666666665, ans=0.125 2023-11-27 18:12:57,544 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 8.727e+01 9.587e+01 1.045e+02 1.317e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 18:13:05,311 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7600, loss[loss=0.05871, simple_loss=0.07321, pruned_loss=0.01077, audio_tagging_loss=0.01133, over 16016.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08966, pruned_loss=0.01247, audio_tagging_loss=0.008658, over 3058383.40 frames. ], batch size: 60, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:13:07,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3176853.3333333335, ans=0.0 2023-11-27 18:13:09,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3176853.3333333335, ans=0.125 2023-11-27 18:13:23,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3176920.0, ans=0.125 2023-11-27 18:13:23,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3176920.0, ans=0.1 2023-11-27 18:13:25,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3176920.0, ans=0.2 2023-11-27 18:13:28,807 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476550 2023-11-27 18:13:34,167 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=22.5 2023-11-27 18:13:38,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3176986.6666666665, ans=0.1 2023-11-27 18:13:58,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3177120.0, ans=0.125 2023-11-27 18:14:03,563 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7650, loss[loss=0.05795, simple_loss=0.0759, pruned_loss=0.01254, audio_tagging_loss=0.00746, over 14335.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09057, pruned_loss=0.01264, audio_tagging_loss=0.00859, over 3051452.08 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:14:04,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3177186.6666666665, ans=0.125 2023-11-27 18:14:26,006 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476600 2023-11-27 18:14:52,991 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.590e+01 9.167e+01 9.909e+01 1.245e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-27 18:15:01,113 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7700, loss[loss=0.06557, simple_loss=0.07709, pruned_loss=0.01477, audio_tagging_loss=0.01225, over 14920.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09027, pruned_loss=0.01256, audio_tagging_loss=0.008628, over 3052602.18 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:15:09,388 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=15.0 2023-11-27 18:15:23,664 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476650 2023-11-27 18:15:26,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3177653.3333333335, ans=0.125 2023-11-27 18:15:31,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3177653.3333333335, ans=0.2 2023-11-27 18:15:33,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3177653.3333333335, ans=0.1 2023-11-27 18:15:35,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3177720.0, ans=0.0 2023-11-27 18:15:57,814 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7750, loss[loss=0.0633, simple_loss=0.08751, pruned_loss=0.01079, audio_tagging_loss=0.00875, over 15474.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09101, pruned_loss=0.01265, audio_tagging_loss=0.008729, over 3051664.19 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:16:21,019 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476700 2023-11-27 18:16:29,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3177986.6666666665, ans=0.0 2023-11-27 18:16:45,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3178120.0, ans=0.125 2023-11-27 18:16:48,182 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.657e+01 9.352e+01 9.918e+01 1.309e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-27 18:16:54,647 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7800, loss[loss=0.06861, simple_loss=0.09306, pruned_loss=0.0112, audio_tagging_loss=0.01088, over 16389.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09114, pruned_loss=0.01258, audio_tagging_loss=0.008714, over 3053312.81 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:17:07,037 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=12.0 2023-11-27 18:17:12,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3178253.3333333335, ans=0.125 2023-11-27 18:17:18,155 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476750 2023-11-27 18:17:38,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3178386.6666666665, ans=0.0 2023-11-27 18:17:53,007 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7850, loss[loss=0.05608, simple_loss=0.0774, pruned_loss=0.007135, audio_tagging_loss=0.01024, over 14592.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09006, pruned_loss=0.01235, audio_tagging_loss=0.008832, over 3053044.79 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:17:53,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3178520.0, ans=0.025 2023-11-27 18:18:15,343 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476800 2023-11-27 18:18:18,123 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:18:44,723 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.789e+01 9.389e+01 9.930e+01 1.229e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 18:18:50,074 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7900, loss[loss=0.06813, simple_loss=0.1012, pruned_loss=0.01004, audio_tagging_loss=0.007502, over 15450.00 frames. ], tot_loss[loss=0.06739, simple_loss=0.0916, pruned_loss=0.01273, audio_tagging_loss=0.008864, over 3056910.75 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:18:52,828 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.65 vs. limit=15.0 2023-11-27 18:19:05,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3178920.0, ans=0.0 2023-11-27 18:19:13,021 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476850 2023-11-27 18:19:16,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3178986.6666666665, ans=0.125 2023-11-27 18:19:30,061 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=22.5 2023-11-27 18:19:35,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3179120.0, ans=0.1 2023-11-27 18:19:47,735 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 7950, loss[loss=0.06818, simple_loss=0.1013, pruned_loss=0.01095, audio_tagging_loss=0.006586, over 14452.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09157, pruned_loss=0.0127, audio_tagging_loss=0.008849, over 3048617.72 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:20:05,966 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:20:11,371 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476900 2023-11-27 18:20:22,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3179386.6666666665, ans=0.0 2023-11-27 18:20:23,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3179386.6666666665, ans=0.07 2023-11-27 18:20:30,075 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2023-11-27 18:20:39,448 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 8.819e+01 9.346e+01 1.021e+02 1.484e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-27 18:20:41,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3179453.3333333335, ans=0.2 2023-11-27 18:20:43,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3179520.0, ans=0.0 2023-11-27 18:20:44,338 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.46 vs. limit=22.5 2023-11-27 18:20:44,948 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8000, loss[loss=0.06348, simple_loss=0.08782, pruned_loss=0.01009, audio_tagging_loss=0.009476, over 15841.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.08974, pruned_loss=0.01243, audio_tagging_loss=0.008996, over 3050277.35 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:20:55,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3179520.0, ans=0.125 2023-11-27 18:21:05,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3179586.6666666665, ans=0.09899494936611666 2023-11-27 18:21:08,489 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 476950 2023-11-27 18:21:42,473 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8050, loss[loss=0.07219, simple_loss=0.101, pruned_loss=0.01393, audio_tagging_loss=0.00777, over 15458.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08939, pruned_loss=0.01239, audio_tagging_loss=0.009074, over 3048842.14 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:21:46,297 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.42 vs. limit=12.0 2023-11-27 18:22:02,381 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.60 vs. limit=15.0 2023-11-27 18:22:03,623 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=12.0 2023-11-27 18:22:05,354 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477000 2023-11-27 18:22:06,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3179986.6666666665, ans=0.1 2023-11-27 18:22:06,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3179986.6666666665, ans=0.0 2023-11-27 18:22:06,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3179986.6666666665, ans=0.0 2023-11-27 18:22:17,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3180053.3333333335, ans=0.0 2023-11-27 18:22:19,483 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.43 vs. limit=15.0 2023-11-27 18:22:25,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3180053.3333333335, ans=0.125 2023-11-27 18:22:33,494 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.70 vs. limit=12.0 2023-11-27 18:22:35,015 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 8.539e+01 9.239e+01 9.821e+01 1.214e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-27 18:22:39,970 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8100, loss[loss=0.08227, simple_loss=0.1173, pruned_loss=0.01478, audio_tagging_loss=0.00885, over 15452.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09008, pruned_loss=0.0125, audio_tagging_loss=0.008933, over 3049019.41 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:22:55,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3180253.3333333335, ans=0.2 2023-11-27 18:22:59,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3180253.3333333335, ans=0.125 2023-11-27 18:23:03,618 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477050 2023-11-27 18:23:09,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3180320.0, ans=0.1 2023-11-27 18:23:23,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3180386.6666666665, ans=0.1 2023-11-27 18:23:26,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3180453.3333333335, ans=0.125 2023-11-27 18:23:36,913 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8150, loss[loss=0.06604, simple_loss=0.09474, pruned_loss=0.01169, audio_tagging_loss=0.006978, over 15514.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09079, pruned_loss=0.01255, audio_tagging_loss=0.008752, over 3052323.01 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:24:00,068 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477100 2023-11-27 18:24:06,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3180653.3333333335, ans=0.125 2023-11-27 18:24:29,739 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.482e+01 9.156e+01 9.778e+01 1.274e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-27 18:24:34,779 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8200, loss[loss=0.06647, simple_loss=0.08765, pruned_loss=0.01359, audio_tagging_loss=0.009048, over 14991.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09039, pruned_loss=0.01247, audio_tagging_loss=0.008684, over 3049811.87 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:24:39,145 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:24:49,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3180920.0, ans=0.125 2023-11-27 18:24:56,935 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477150 2023-11-27 18:24:59,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3180986.6666666665, ans=0.0 2023-11-27 18:25:12,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=12.0 2023-11-27 18:25:20,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3181120.0, ans=0.125 2023-11-27 18:25:30,345 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=15.0 2023-11-27 18:25:31,876 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8250, loss[loss=0.07077, simple_loss=0.09194, pruned_loss=0.01504, audio_tagging_loss=0.009766, over 15545.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09063, pruned_loss=0.01274, audio_tagging_loss=0.008743, over 3045498.85 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:25:37,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3181186.6666666665, ans=0.125 2023-11-27 18:25:54,886 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477200 2023-11-27 18:25:59,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3181320.0, ans=0.1 2023-11-27 18:26:01,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3181320.0, ans=0.0 2023-11-27 18:26:24,475 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.573e+01 9.130e+01 1.008e+02 1.998e+02, threshold=1.826e+02, percent-clipped=1.0 2023-11-27 18:26:26,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3181453.3333333335, ans=0.125 2023-11-27 18:26:28,639 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.30 vs. limit=10.0 2023-11-27 18:26:29,287 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8300, loss[loss=0.0582, simple_loss=0.08487, pruned_loss=0.007048, audio_tagging_loss=0.008719, over 15720.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09049, pruned_loss=0.01269, audio_tagging_loss=0.008703, over 3047459.97 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:26:30,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3181520.0, ans=0.125 2023-11-27 18:26:52,221 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477250 2023-11-27 18:26:52,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3181653.3333333335, ans=0.125 2023-11-27 18:27:16,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3181786.6666666665, ans=0.125 2023-11-27 18:27:19,888 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2023-11-27 18:27:26,436 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8350, loss[loss=0.05747, simple_loss=0.07814, pruned_loss=0.01058, audio_tagging_loss=0.007824, over 15374.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09035, pruned_loss=0.01265, audio_tagging_loss=0.008687, over 3048998.24 frames. ], batch size: 58, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:27:31,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2023-11-27 18:27:35,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3181853.3333333335, ans=0.2 2023-11-27 18:27:41,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3181920.0, ans=0.2 2023-11-27 18:27:47,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3181920.0, ans=0.04949747468305833 2023-11-27 18:27:49,244 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477300 2023-11-27 18:27:55,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3181986.6666666665, ans=0.125 2023-11-27 18:28:05,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3182053.3333333335, ans=0.125 2023-11-27 18:28:08,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3182053.3333333335, ans=0.125 2023-11-27 18:28:19,116 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.748e+01 9.541e+01 1.013e+02 1.320e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 18:28:23,397 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8400, loss[loss=0.07279, simple_loss=0.1025, pruned_loss=0.01339, audio_tagging_loss=0.00813, over 15133.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08994, pruned_loss=0.01258, audio_tagging_loss=0.008763, over 3052004.43 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:28:38,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3182253.3333333335, ans=0.125 2023-11-27 18:28:39,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3182253.3333333335, ans=0.2 2023-11-27 18:28:43,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3182253.3333333335, ans=0.07 2023-11-27 18:28:44,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3182253.3333333335, ans=0.125 2023-11-27 18:28:46,221 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477350 2023-11-27 18:28:50,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3182320.0, ans=0.1 2023-11-27 18:29:20,938 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8450, loss[loss=0.07946, simple_loss=0.1119, pruned_loss=0.01267, audio_tagging_loss=0.01083, over 14697.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08986, pruned_loss=0.01253, audio_tagging_loss=0.008768, over 3054675.96 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:29:22,556 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.30 vs. limit=15.0 2023-11-27 18:29:24,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=3182520.0, ans=0.02 2023-11-27 18:29:26,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3182520.0, ans=0.2 2023-11-27 18:29:34,101 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=22.5 2023-11-27 18:29:43,513 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477400 2023-11-27 18:29:46,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3182653.3333333335, ans=0.2 2023-11-27 18:30:01,872 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.45 vs. limit=22.5 2023-11-27 18:30:13,791 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 8.652e+01 9.208e+01 1.012e+02 1.151e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 18:30:14,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3182786.6666666665, ans=0.125 2023-11-27 18:30:18,878 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8500, loss[loss=0.06107, simple_loss=0.08089, pruned_loss=0.01006, audio_tagging_loss=0.01056, over 14782.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.08975, pruned_loss=0.01245, audio_tagging_loss=0.008796, over 3056594.26 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:30:38,126 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.01 vs. limit=22.5 2023-11-27 18:30:42,030 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477450 2023-11-27 18:30:54,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3183053.3333333335, ans=0.035 2023-11-27 18:31:00,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3183053.3333333335, ans=0.125 2023-11-27 18:31:00,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3183053.3333333335, ans=0.025 2023-11-27 18:31:01,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3183053.3333333335, ans=0.1 2023-11-27 18:31:10,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3183120.0, ans=0.125 2023-11-27 18:31:12,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3183120.0, ans=0.125 2023-11-27 18:31:16,558 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8550, loss[loss=0.06089, simple_loss=0.08327, pruned_loss=0.01022, audio_tagging_loss=0.009035, over 16821.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09025, pruned_loss=0.0126, audio_tagging_loss=0.008811, over 3063237.47 frames. ], batch size: 65, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:31:19,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3183186.6666666665, ans=0.2 2023-11-27 18:31:24,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=3183186.6666666665, ans=12.0 2023-11-27 18:31:29,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3183253.3333333335, ans=0.2 2023-11-27 18:31:39,299 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477500 2023-11-27 18:31:45,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3183320.0, ans=0.125 2023-11-27 18:31:46,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3183320.0, ans=0.125 2023-11-27 18:32:09,085 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.725e+01 9.304e+01 1.021e+02 1.373e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 18:32:13,979 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8600, loss[loss=0.07744, simple_loss=0.09013, pruned_loss=0.01822, audio_tagging_loss=0.01415, over 14020.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09139, pruned_loss=0.01286, audio_tagging_loss=0.00879, over 3065776.28 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:32:22,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3183520.0, ans=0.0 2023-11-27 18:32:27,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3183586.6666666665, ans=0.05 2023-11-27 18:32:35,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3183653.3333333335, ans=0.2 2023-11-27 18:32:36,443 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477550 2023-11-27 18:32:47,685 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:32:50,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3183720.0, ans=0.1 2023-11-27 18:32:51,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2023-11-27 18:33:06,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3183786.6666666665, ans=0.125 2023-11-27 18:33:11,387 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8650, loss[loss=0.07638, simple_loss=0.103, pruned_loss=0.01813, audio_tagging_loss=0.006762, over 14485.00 frames. ], tot_loss[loss=0.06781, simple_loss=0.09197, pruned_loss=0.01296, audio_tagging_loss=0.008864, over 3063685.33 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:33:27,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3183920.0, ans=0.125 2023-11-27 18:33:34,131 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477600 2023-11-27 18:33:47,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3184053.3333333335, ans=0.2 2023-11-27 18:33:48,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3184053.3333333335, ans=0.2 2023-11-27 18:34:04,085 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.946e+01 9.500e+01 1.005e+02 1.406e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 18:34:08,451 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8700, loss[loss=0.06237, simple_loss=0.08793, pruned_loss=0.009393, audio_tagging_loss=0.009014, over 14399.00 frames. ], tot_loss[loss=0.06786, simple_loss=0.09204, pruned_loss=0.01299, audio_tagging_loss=0.008855, over 3064720.25 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:34:17,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3184186.6666666665, ans=0.125 2023-11-27 18:34:22,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3184253.3333333335, ans=10.0 2023-11-27 18:34:27,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3184253.3333333335, ans=0.125 2023-11-27 18:34:31,877 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477650 2023-11-27 18:34:45,775 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.79 vs. limit=15.0 2023-11-27 18:34:48,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3184386.6666666665, ans=0.0 2023-11-27 18:34:55,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3184453.3333333335, ans=0.125 2023-11-27 18:35:05,920 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8750, loss[loss=0.08422, simple_loss=0.1201, pruned_loss=0.01918, audio_tagging_loss=0.004996, over 15987.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.09205, pruned_loss=0.01303, audio_tagging_loss=0.008878, over 3064981.51 frames. ], batch size: 61, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:35:28,786 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477700 2023-11-27 18:35:32,398 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2023-11-27 18:35:54,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3184786.6666666665, ans=0.125 2023-11-27 18:35:58,763 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.523e+01 8.815e+01 9.414e+01 9.987e+01 1.374e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-27 18:36:03,873 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8800, loss[loss=0.07114, simple_loss=0.09092, pruned_loss=0.01385, audio_tagging_loss=0.01183, over 14295.00 frames. ], tot_loss[loss=0.06794, simple_loss=0.09204, pruned_loss=0.01295, audio_tagging_loss=0.008976, over 3066757.92 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 32.0 2023-11-27 18:36:11,593 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:36:11,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3184853.3333333335, ans=0.2 2023-11-27 18:36:17,207 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:36:18,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3184920.0, ans=0.0 2023-11-27 18:36:26,070 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477750 2023-11-27 18:36:28,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3184986.6666666665, ans=0.125 2023-11-27 18:36:43,819 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2023-11-27 18:36:54,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3185120.0, ans=0.125 2023-11-27 18:37:00,076 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8850, loss[loss=0.05383, simple_loss=0.07422, pruned_loss=0.00877, audio_tagging_loss=0.007953, over 14208.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09096, pruned_loss=0.01269, audio_tagging_loss=0.009012, over 3061735.33 frames. ], batch size: 55, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:37:05,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3185186.6666666665, ans=0.125 2023-11-27 18:37:14,839 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:37:23,572 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477800 2023-11-27 18:37:25,154 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.46 vs. limit=22.5 2023-11-27 18:37:32,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3185320.0, ans=0.125 2023-11-27 18:37:40,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3185386.6666666665, ans=0.125 2023-11-27 18:37:41,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3185386.6666666665, ans=0.125 2023-11-27 18:37:52,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3185453.3333333335, ans=0.0 2023-11-27 18:37:54,287 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.632e+01 9.430e+01 1.040e+02 1.292e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 18:37:57,542 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8900, loss[loss=0.07072, simple_loss=0.0984, pruned_loss=0.01534, audio_tagging_loss=0.006189, over 14405.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09076, pruned_loss=0.01264, audio_tagging_loss=0.008911, over 3067509.27 frames. ], batch size: 53, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:38:07,224 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2023-11-27 18:38:20,412 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477850 2023-11-27 18:38:20,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3185653.3333333335, ans=0.1 2023-11-27 18:38:54,429 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 8950, loss[loss=0.06338, simple_loss=0.07526, pruned_loss=0.01488, audio_tagging_loss=0.01087, over 14938.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09144, pruned_loss=0.01279, audio_tagging_loss=0.00868, over 3063432.43 frames. ], batch size: 57, lr: 1.69e-03, grad_scale: 16.0 2023-11-27 18:38:56,063 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.95 vs. limit=15.0 2023-11-27 18:39:04,249 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2023-11-27 18:39:06,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3185920.0, ans=0.125 2023-11-27 18:39:06,609 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2023-11-27 18:39:16,877 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477900 2023-11-27 18:39:19,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3185986.6666666665, ans=0.125 2023-11-27 18:39:36,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3186053.3333333335, ans=0.04949747468305833 2023-11-27 18:39:42,333 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2023-11-27 18:39:49,550 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.830e+01 8.925e+01 9.376e+01 9.837e+01 1.193e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-27 18:39:51,774 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9000, loss[loss=0.06544, simple_loss=0.09021, pruned_loss=0.01511, audio_tagging_loss=0.005229, over 14696.00 frames. ], tot_loss[loss=0.06746, simple_loss=0.09194, pruned_loss=0.01289, audio_tagging_loss=0.008593, over 3064430.41 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:39:51,775 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 18:40:13,342 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.9213, 2.9512, 2.7348, 2.8387, 3.3519, 3.3058, 3.0576, 3.6001], device='cuda:2') 2023-11-27 18:40:27,266 INFO [train_asr.py:1267] (2/4) Epoch 40, validation: loss=0.05837, simple_loss=0.05058, pruned_loss=0.005173, audio_tagging_loss=0.02791, over 4681554.00 frames. 2023-11-27 18:40:27,267 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 18:40:28,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3186186.6666666665, ans=0.125 2023-11-27 18:40:50,072 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 477950 2023-11-27 18:41:07,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3186386.6666666665, ans=0.0 2023-11-27 18:41:25,253 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9050, loss[loss=0.0612, simple_loss=0.08433, pruned_loss=0.01036, audio_tagging_loss=0.008678, over 15303.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.0914, pruned_loss=0.01271, audio_tagging_loss=0.008532, over 3063933.90 frames. ], batch size: 59, lr: 1.69e-03, grad_scale: 4.0 2023-11-27 18:41:27,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3186520.0, ans=0.0 2023-11-27 18:41:38,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3186586.6666666665, ans=0.125 2023-11-27 18:41:43,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3186586.6666666665, ans=0.125 2023-11-27 18:41:46,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3186653.3333333335, ans=0.125 2023-11-27 18:41:47,790 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478000 2023-11-27 18:42:21,503 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 8.889e+01 9.370e+01 1.013e+02 1.191e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-27 18:42:22,733 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9100, loss[loss=0.0435, simple_loss=0.05305, pruned_loss=0.007471, audio_tagging_loss=0.009505, over 14635.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09109, pruned_loss=0.01248, audio_tagging_loss=0.008568, over 3054759.68 frames. ], batch size: 56, lr: 1.69e-03, grad_scale: 8.0 2023-11-27 18:42:29,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3186853.3333333335, ans=0.2 2023-11-27 18:42:45,842 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478050 2023-11-27 18:42:45,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3186986.6666666665, ans=0.125 2023-11-27 18:42:51,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3186986.6666666665, ans=0.1 2023-11-27 18:43:20,520 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9150, loss[loss=0.05319, simple_loss=0.07005, pruned_loss=0.007493, audio_tagging_loss=0.01067, over 15168.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.0902, pruned_loss=0.01242, audio_tagging_loss=0.008622, over 3047773.01 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 18:43:20,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3187186.6666666665, ans=0.125 2023-11-27 18:43:23,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3187186.6666666665, ans=0.07 2023-11-27 18:43:24,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3187186.6666666665, ans=0.125 2023-11-27 18:43:28,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3187186.6666666665, ans=0.0 2023-11-27 18:43:31,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3187253.3333333335, ans=0.125 2023-11-27 18:43:43,408 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.88 vs. limit=15.0 2023-11-27 18:43:44,068 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478100 2023-11-27 18:44:17,234 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.460e+01 8.554e+01 9.287e+01 9.975e+01 1.548e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 18:44:18,384 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9200, loss[loss=0.06303, simple_loss=0.08847, pruned_loss=0.01111, audio_tagging_loss=0.007688, over 15085.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09014, pruned_loss=0.01238, audio_tagging_loss=0.008592, over 3051328.01 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:44:28,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=3187520.0, ans=22.5 2023-11-27 18:44:33,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3187586.6666666665, ans=0.125 2023-11-27 18:44:40,995 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478150 2023-11-27 18:44:41,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3187653.3333333335, ans=0.0 2023-11-27 18:45:10,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3187786.6666666665, ans=0.0 2023-11-27 18:45:15,782 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9250, loss[loss=0.09782, simple_loss=0.1254, pruned_loss=0.02707, audio_tagging_loss=0.008026, over 15192.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09019, pruned_loss=0.01253, audio_tagging_loss=0.008637, over 3050501.30 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:45:16,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3187853.3333333335, ans=0.125 2023-11-27 18:45:29,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3187920.0, ans=0.0 2023-11-27 18:45:38,984 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478200 2023-11-27 18:45:59,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3188053.3333333335, ans=0.125 2023-11-27 18:46:11,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3188120.0, ans=0.2 2023-11-27 18:46:12,566 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.841e+01 9.296e+01 9.979e+01 1.330e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-27 18:46:13,715 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9300, loss[loss=0.09493, simple_loss=0.1392, pruned_loss=0.01995, audio_tagging_loss=0.005392, over 16135.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09009, pruned_loss=0.01249, audio_tagging_loss=0.008616, over 3054386.79 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:46:24,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3188253.3333333335, ans=0.125 2023-11-27 18:46:31,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3188253.3333333335, ans=10.0 2023-11-27 18:46:34,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3188253.3333333335, ans=0.125 2023-11-27 18:46:36,962 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2023-11-27 18:46:37,499 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478250 2023-11-27 18:46:40,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3188320.0, ans=0.0 2023-11-27 18:46:48,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3188386.6666666665, ans=0.0 2023-11-27 18:47:03,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3188453.3333333335, ans=0.125 2023-11-27 18:47:11,349 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9350, loss[loss=0.0525, simple_loss=0.0694, pruned_loss=0.007191, audio_tagging_loss=0.01061, over 14589.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08963, pruned_loss=0.01246, audio_tagging_loss=0.008707, over 3046984.09 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:47:14,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3188520.0, ans=0.125 2023-11-27 18:47:16,676 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2023-11-27 18:47:28,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3188586.6666666665, ans=0.0 2023-11-27 18:47:34,504 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478300 2023-11-27 18:47:34,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3188653.3333333335, ans=0.0 2023-11-27 18:47:54,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3188720.0, ans=0.125 2023-11-27 18:47:59,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.28 vs. limit=10.0 2023-11-27 18:48:05,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3188786.6666666665, ans=0.125 2023-11-27 18:48:08,494 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.625e+01 9.314e+01 1.018e+02 1.859e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-27 18:48:09,688 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9400, loss[loss=0.0752, simple_loss=0.1057, pruned_loss=0.01448, audio_tagging_loss=0.007887, over 15732.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09015, pruned_loss=0.01279, audio_tagging_loss=0.008768, over 3044724.44 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:48:32,738 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478350 2023-11-27 18:48:32,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3188986.6666666665, ans=0.2 2023-11-27 18:48:39,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3188986.6666666665, ans=0.125 2023-11-27 18:48:42,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3188986.6666666665, ans=0.125 2023-11-27 18:49:07,274 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9450, loss[loss=0.07426, simple_loss=0.1066, pruned_loss=0.01397, audio_tagging_loss=0.006977, over 15301.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08952, pruned_loss=0.01256, audio_tagging_loss=0.008786, over 3050346.09 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:49:08,436 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:49:26,695 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.40 vs. limit=15.0 2023-11-27 18:49:30,409 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478400 2023-11-27 18:49:37,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3189320.0, ans=0.125 2023-11-27 18:49:44,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3189386.6666666665, ans=0.125 2023-11-27 18:49:54,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3189453.3333333335, ans=0.125 2023-11-27 18:49:55,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3189453.3333333335, ans=0.125 2023-11-27 18:50:01,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3189453.3333333335, ans=0.2 2023-11-27 18:50:04,747 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.634e+01 9.375e+01 9.974e+01 1.335e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-27 18:50:04,773 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9500, loss[loss=0.07541, simple_loss=0.09709, pruned_loss=0.01928, audio_tagging_loss=0.007586, over 16209.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08971, pruned_loss=0.01249, audio_tagging_loss=0.008842, over 3047073.14 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 18:50:05,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3189520.0, ans=0.125 2023-11-27 18:50:07,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3189520.0, ans=0.125 2023-11-27 18:50:18,056 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=15.0 2023-11-27 18:50:28,185 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478450 2023-11-27 18:50:28,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3189653.3333333335, ans=0.0 2023-11-27 18:50:46,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3189720.0, ans=0.125 2023-11-27 18:51:02,302 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9550, loss[loss=0.0501, simple_loss=0.06107, pruned_loss=0.008855, audio_tagging_loss=0.01071, over 14086.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08908, pruned_loss=0.01233, audio_tagging_loss=0.008885, over 3042631.02 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 18:51:10,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3189853.3333333335, ans=0.125 2023-11-27 18:51:10,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3189853.3333333335, ans=0.125 2023-11-27 18:51:19,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3189920.0, ans=0.0 2023-11-27 18:51:26,103 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478500 2023-11-27 18:51:48,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=22.5 2023-11-27 18:51:59,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3190186.6666666665, ans=0.0 2023-11-27 18:51:59,905 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.341e+01 9.036e+01 9.952e+01 1.407e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-27 18:51:59,931 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9600, loss[loss=0.05952, simple_loss=0.08018, pruned_loss=0.009877, audio_tagging_loss=0.009553, over 15090.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08866, pruned_loss=0.0122, audio_tagging_loss=0.008926, over 3042546.97 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:52:03,794 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:52:23,566 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478550 2023-11-27 18:52:44,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3190386.6666666665, ans=0.0 2023-11-27 18:52:54,323 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.75 vs. limit=22.5 2023-11-27 18:52:58,202 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9650, loss[loss=0.07661, simple_loss=0.1094, pruned_loss=0.01314, audio_tagging_loss=0.008789, over 15256.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08917, pruned_loss=0.01237, audio_tagging_loss=0.008847, over 3038830.29 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:53:06,944 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2023-11-27 18:53:08,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3190586.6666666665, ans=0.125 2023-11-27 18:53:13,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3190586.6666666665, ans=0.1 2023-11-27 18:53:19,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3190653.3333333335, ans=0.125 2023-11-27 18:53:20,762 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478600 2023-11-27 18:53:49,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3190786.6666666665, ans=0.0 2023-11-27 18:53:55,995 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 8.654e+01 9.418e+01 1.007e+02 1.330e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-27 18:53:56,021 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9700, loss[loss=0.1014, simple_loss=0.1372, pruned_loss=0.02488, audio_tagging_loss=0.007918, over 15404.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08984, pruned_loss=0.01232, audio_tagging_loss=0.008656, over 3040829.61 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:54:05,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3190853.3333333335, ans=0.125 2023-11-27 18:54:06,671 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2023-11-27 18:54:11,548 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.43 vs. limit=10.0 2023-11-27 18:54:18,994 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478650 2023-11-27 18:54:25,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3190986.6666666665, ans=0.0 2023-11-27 18:54:26,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3190986.6666666665, ans=0.125 2023-11-27 18:54:52,951 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9750, loss[loss=0.07892, simple_loss=0.1116, pruned_loss=0.015, audio_tagging_loss=0.0081, over 14273.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09042, pruned_loss=0.01246, audio_tagging_loss=0.008594, over 3046033.60 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:54:54,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3191186.6666666665, ans=0.125 2023-11-27 18:55:01,316 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.41 vs. limit=22.5 2023-11-27 18:55:14,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3191253.3333333335, ans=0.125 2023-11-27 18:55:16,946 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478700 2023-11-27 18:55:23,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3191320.0, ans=0.125 2023-11-27 18:55:29,184 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:55:32,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3191386.6666666665, ans=0.125 2023-11-27 18:55:34,985 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2023-11-27 18:55:51,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.742e+01 9.201e+01 9.783e+01 1.182e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-27 18:55:51,128 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9800, loss[loss=0.06484, simple_loss=0.08848, pruned_loss=0.01247, audio_tagging_loss=0.008129, over 15461.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09105, pruned_loss=0.01253, audio_tagging_loss=0.008555, over 3047279.96 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:55:57,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3191520.0, ans=0.1 2023-11-27 18:56:13,877 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478750 2023-11-27 18:56:42,160 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=15.0 2023-11-27 18:56:43,104 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2023-11-27 18:56:45,436 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 18:56:48,639 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9850, loss[loss=0.07609, simple_loss=0.1162, pruned_loss=0.01228, audio_tagging_loss=0.005716, over 15693.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.09185, pruned_loss=0.0127, audio_tagging_loss=0.008444, over 3046295.26 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:57:11,559 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478800 2023-11-27 18:57:12,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3191986.6666666665, ans=0.125 2023-11-27 18:57:21,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3191986.6666666665, ans=0.125 2023-11-27 18:57:44,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3192186.6666666665, ans=0.0 2023-11-27 18:57:45,659 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.871e+01 9.511e+01 1.009e+02 1.336e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 18:57:45,684 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9900, loss[loss=0.05044, simple_loss=0.06799, pruned_loss=0.007723, audio_tagging_loss=0.00872, over 15047.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.0916, pruned_loss=0.01266, audio_tagging_loss=0.008485, over 3047786.84 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:57:52,077 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2023-11-27 18:57:58,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3192253.3333333335, ans=0.1 2023-11-27 18:58:09,251 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478850 2023-11-27 18:58:14,671 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.98 vs. limit=15.0 2023-11-27 18:58:17,869 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=12.0 2023-11-27 18:58:43,970 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 9950, loss[loss=0.0556, simple_loss=0.07977, pruned_loss=0.008903, audio_tagging_loss=0.006807, over 16269.00 frames. ], tot_loss[loss=0.06711, simple_loss=0.09179, pruned_loss=0.01273, audio_tagging_loss=0.008486, over 3054438.34 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 18:58:55,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3192586.6666666665, ans=0.125 2023-11-27 18:58:59,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3192586.6666666665, ans=0.125 2023-11-27 18:59:06,622 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478900 2023-11-27 18:59:35,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3192786.6666666665, ans=0.125 2023-11-27 18:59:41,469 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.506e+01 9.259e+01 9.823e+01 1.115e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-27 18:59:41,496 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10000, loss[loss=0.06779, simple_loss=0.1031, pruned_loss=0.00836, audio_tagging_loss=0.007904, over 15433.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09096, pruned_loss=0.01249, audio_tagging_loss=0.008493, over 3051399.75 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 18:59:46,152 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 18:59:47,150 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:00:04,085 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 478950 2023-11-27 19:00:17,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3193053.3333333335, ans=0.125 2023-11-27 19:00:26,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3193120.0, ans=0.125 2023-11-27 19:00:33,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3193120.0, ans=0.1 2023-11-27 19:00:33,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3193120.0, ans=0.125 2023-11-27 19:00:35,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=15.0 2023-11-27 19:00:38,024 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10050, loss[loss=0.06215, simple_loss=0.08158, pruned_loss=0.01138, audio_tagging_loss=0.009975, over 15334.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09081, pruned_loss=0.01255, audio_tagging_loss=0.008552, over 3050497.91 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:00:47,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3193186.6666666665, ans=0.0 2023-11-27 19:01:01,568 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479000 2023-11-27 19:01:11,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3193320.0, ans=0.0 2023-11-27 19:01:35,903 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10100, loss[loss=0.0639, simple_loss=0.07265, pruned_loss=0.01439, audio_tagging_loss=0.01318, over 14856.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09066, pruned_loss=0.01262, audio_tagging_loss=0.008659, over 3046830.09 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:01:36,987 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 8.749e+01 9.301e+01 1.017e+02 1.197e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 19:01:55,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3193586.6666666665, ans=0.125 2023-11-27 19:01:59,524 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479050 2023-11-27 19:02:14,676 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.12 vs. limit=15.0 2023-11-27 19:02:22,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3193786.6666666665, ans=0.125 2023-11-27 19:02:25,394 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:02:27,636 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.49 vs. limit=22.5 2023-11-27 19:02:28,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3193786.6666666665, ans=0.05 2023-11-27 19:02:30,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3193786.6666666665, ans=0.125 2023-11-27 19:02:33,754 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10150, loss[loss=0.05433, simple_loss=0.07708, pruned_loss=0.008809, audio_tagging_loss=0.006983, over 15812.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09054, pruned_loss=0.01266, audio_tagging_loss=0.008745, over 3046106.39 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:02:41,737 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.02 vs. limit=15.0 2023-11-27 19:02:42,742 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2023-11-27 19:02:47,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3193920.0, ans=0.0 2023-11-27 19:02:55,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=15.0 2023-11-27 19:02:56,489 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479100 2023-11-27 19:03:02,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3193986.6666666665, ans=0.125 2023-11-27 19:03:03,367 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:03:16,060 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=15.0 2023-11-27 19:03:24,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3194120.0, ans=0.125 2023-11-27 19:03:31,551 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10200, loss[loss=0.07964, simple_loss=0.1043, pruned_loss=0.01647, audio_tagging_loss=0.01103, over 14790.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09067, pruned_loss=0.01269, audio_tagging_loss=0.008833, over 3043538.50 frames. ], batch size: 55, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:03:32,626 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.681e+01 9.288e+01 9.961e+01 1.325e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-27 19:03:34,406 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.05 vs. limit=15.0 2023-11-27 19:03:36,567 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-11-27 19:03:54,396 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479150 2023-11-27 19:03:57,176 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:04:14,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3194386.6666666665, ans=0.125 2023-11-27 19:04:28,889 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10250, loss[loss=0.08569, simple_loss=0.1192, pruned_loss=0.01967, audio_tagging_loss=0.006399, over 15525.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.09157, pruned_loss=0.0128, audio_tagging_loss=0.008767, over 3047174.46 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:04:52,737 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479200 2023-11-27 19:04:53,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3194653.3333333335, ans=0.0 2023-11-27 19:05:12,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3194720.0, ans=0.0 2023-11-27 19:05:13,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3194720.0, ans=0.125 2023-11-27 19:05:27,478 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10300, loss[loss=0.05716, simple_loss=0.07957, pruned_loss=0.008312, audio_tagging_loss=0.009059, over 15240.00 frames. ], tot_loss[loss=0.06781, simple_loss=0.09221, pruned_loss=0.01296, audio_tagging_loss=0.008746, over 3048437.71 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:05:28,537 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.569e+01 8.814e+01 9.491e+01 9.959e+01 1.329e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 19:05:28,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3194853.3333333335, ans=0.025 2023-11-27 19:05:28,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3194853.3333333335, ans=0.125 2023-11-27 19:05:31,919 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-11-27 19:05:46,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3194920.0, ans=0.125 2023-11-27 19:05:48,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3194920.0, ans=0.125 2023-11-27 19:05:49,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3194986.6666666665, ans=0.125 2023-11-27 19:05:50,036 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479250 2023-11-27 19:05:56,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3194986.6666666665, ans=0.5 2023-11-27 19:06:03,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3195053.3333333335, ans=0.0 2023-11-27 19:06:05,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3195053.3333333335, ans=0.125 2023-11-27 19:06:14,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3195120.0, ans=0.125 2023-11-27 19:06:24,329 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10350, loss[loss=0.07523, simple_loss=0.09975, pruned_loss=0.01468, audio_tagging_loss=0.01068, over 15151.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.0916, pruned_loss=0.01295, audio_tagging_loss=0.008934, over 3045318.62 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:06:26,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3195186.6666666665, ans=0.125 2023-11-27 19:06:28,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3195186.6666666665, ans=0.125 2023-11-27 19:06:40,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3195253.3333333335, ans=0.125 2023-11-27 19:06:47,562 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479300 2023-11-27 19:06:52,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3195320.0, ans=0.0 2023-11-27 19:07:02,407 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=15.0 2023-11-27 19:07:16,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3195453.3333333335, ans=0.1 2023-11-27 19:07:21,729 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10400, loss[loss=0.07246, simple_loss=0.1063, pruned_loss=0.01171, audio_tagging_loss=0.007614, over 14627.00 frames. ], tot_loss[loss=0.06768, simple_loss=0.09146, pruned_loss=0.01298, audio_tagging_loss=0.008976, over 3041213.62 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:07:24,456 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.346e+01 8.831e+01 9.287e+01 1.004e+02 1.358e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 19:07:45,847 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479350 2023-11-27 19:08:11,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3195786.6666666665, ans=0.0 2023-11-27 19:08:19,753 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10450, loss[loss=0.06593, simple_loss=0.08796, pruned_loss=0.0104, audio_tagging_loss=0.01155, over 15260.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09036, pruned_loss=0.01275, audio_tagging_loss=0.008971, over 3035881.81 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:08:21,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3195853.3333333335, ans=0.0 2023-11-27 19:08:23,617 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2023-11-27 19:08:33,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3195920.0, ans=10.0 2023-11-27 19:08:43,070 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479400 2023-11-27 19:09:18,580 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10500, loss[loss=0.05294, simple_loss=0.06367, pruned_loss=0.007765, audio_tagging_loss=0.01333, over 15357.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.08975, pruned_loss=0.01265, audio_tagging_loss=0.008895, over 3040223.77 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:09:20,764 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.582e+01 9.246e+01 1.004e+02 1.274e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 19:09:32,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3196253.3333333335, ans=0.0 2023-11-27 19:09:41,849 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479450 2023-11-27 19:10:04,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3196453.3333333335, ans=0.125 2023-11-27 19:10:16,046 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10550, loss[loss=0.082, simple_loss=0.1058, pruned_loss=0.01895, audio_tagging_loss=0.01013, over 14577.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09076, pruned_loss=0.01292, audio_tagging_loss=0.008748, over 3041691.98 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:10:17,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3196520.0, ans=0.125 2023-11-27 19:10:18,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3196520.0, ans=0.1 2023-11-27 19:10:29,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3196586.6666666665, ans=0.0 2023-11-27 19:10:32,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3196586.6666666665, ans=0.125 2023-11-27 19:10:36,237 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=15.0 2023-11-27 19:10:39,725 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479500 2023-11-27 19:11:01,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3196786.6666666665, ans=0.125 2023-11-27 19:11:08,159 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.15 vs. limit=15.0 2023-11-27 19:11:13,701 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10600, loss[loss=0.04114, simple_loss=0.04726, pruned_loss=0.00821, audio_tagging_loss=0.009295, over 13387.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09061, pruned_loss=0.01294, audio_tagging_loss=0.00866, over 3042270.63 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:11:15,875 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.628e+01 9.441e+01 1.014e+02 1.251e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-27 19:11:18,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3196853.3333333335, ans=0.2 2023-11-27 19:11:24,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3196920.0, ans=0.125 2023-11-27 19:11:27,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3196920.0, ans=0.125 2023-11-27 19:11:32,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3196920.0, ans=0.1 2023-11-27 19:11:36,973 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479550 2023-11-27 19:11:52,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3197053.3333333335, ans=0.1 2023-11-27 19:11:54,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3197053.3333333335, ans=0.1 2023-11-27 19:12:05,557 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.26 vs. limit=15.0 2023-11-27 19:12:11,329 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10650, loss[loss=0.06142, simple_loss=0.07844, pruned_loss=0.0136, audio_tagging_loss=0.008601, over 15358.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09081, pruned_loss=0.01281, audio_tagging_loss=0.008614, over 3041600.53 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:12:13,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3197186.6666666665, ans=0.125 2023-11-27 19:12:26,941 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.39 vs. limit=22.5 2023-11-27 19:12:34,478 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479600 2023-11-27 19:12:35,955 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-11-27 19:12:42,028 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.70 vs. limit=12.0 2023-11-27 19:12:44,173 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2023-11-27 19:12:44,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3197386.6666666665, ans=0.125 2023-11-27 19:12:50,382 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=15.0 2023-11-27 19:12:59,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3197453.3333333335, ans=0.95 2023-11-27 19:13:09,080 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10700, loss[loss=0.06849, simple_loss=0.09144, pruned_loss=0.01235, audio_tagging_loss=0.01042, over 14466.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08937, pruned_loss=0.01247, audio_tagging_loss=0.008688, over 3039707.37 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:13:11,193 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.717e+01 9.252e+01 9.839e+01 1.176e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 19:13:25,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3197586.6666666665, ans=0.025 2023-11-27 19:13:27,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3197586.6666666665, ans=0.125 2023-11-27 19:13:32,691 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479650 2023-11-27 19:13:36,356 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2023-11-27 19:13:43,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3197720.0, ans=0.1 2023-11-27 19:13:44,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3197720.0, ans=0.125 2023-11-27 19:13:56,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3197786.6666666665, ans=0.125 2023-11-27 19:14:06,955 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10750, loss[loss=0.06573, simple_loss=0.08021, pruned_loss=0.01846, audio_tagging_loss=0.007168, over 15008.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09001, pruned_loss=0.01248, audio_tagging_loss=0.008635, over 3035091.13 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:14:22,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3197920.0, ans=0.0 2023-11-27 19:14:23,153 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:14:29,484 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479700 2023-11-27 19:14:35,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3197986.6666666665, ans=0.05 2023-11-27 19:14:40,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3198053.3333333335, ans=0.2 2023-11-27 19:14:51,388 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=12.0 2023-11-27 19:15:04,568 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10800, loss[loss=0.05877, simple_loss=0.07004, pruned_loss=0.01378, audio_tagging_loss=0.009968, over 13139.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09041, pruned_loss=0.01257, audio_tagging_loss=0.008629, over 3042456.81 frames. ], batch size: 53, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:15:06,816 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 8.494e+01 9.274e+01 9.978e+01 1.190e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 19:15:09,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3198186.6666666665, ans=0.04949747468305833 2023-11-27 19:15:13,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3198186.6666666665, ans=0.0 2023-11-27 19:15:15,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3198253.3333333335, ans=0.125 2023-11-27 19:15:27,734 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479750 2023-11-27 19:15:54,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3198453.3333333335, ans=0.0 2023-11-27 19:15:57,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3198453.3333333335, ans=0.125 2023-11-27 19:15:59,942 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.02 vs. limit=15.0 2023-11-27 19:16:02,282 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10850, loss[loss=0.0571, simple_loss=0.0728, pruned_loss=0.01088, audio_tagging_loss=0.009821, over 15585.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09024, pruned_loss=0.01248, audio_tagging_loss=0.008738, over 3037267.31 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:16:03,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3198520.0, ans=0.0 2023-11-27 19:16:04,895 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-27 19:16:19,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3198586.6666666665, ans=0.125 2023-11-27 19:16:25,182 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479800 2023-11-27 19:16:35,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3198720.0, ans=0.1 2023-11-27 19:16:56,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3198786.6666666665, ans=0.125 2023-11-27 19:16:57,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3198786.6666666665, ans=0.0 2023-11-27 19:17:00,004 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10900, loss[loss=0.06017, simple_loss=0.07992, pruned_loss=0.01237, audio_tagging_loss=0.007838, over 14987.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09038, pruned_loss=0.01258, audio_tagging_loss=0.008772, over 3042617.80 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:17:00,034 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:17:00,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3198853.3333333335, ans=0.125 2023-11-27 19:17:02,228 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.922e+01 9.500e+01 1.014e+02 1.176e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-27 19:17:07,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3198853.3333333335, ans=0.125 2023-11-27 19:17:11,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3198920.0, ans=0.5 2023-11-27 19:17:22,610 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479850 2023-11-27 19:17:42,766 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.48 vs. limit=22.5 2023-11-27 19:17:47,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3199120.0, ans=0.0 2023-11-27 19:17:48,882 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.93 vs. limit=22.5 2023-11-27 19:17:57,551 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 10950, loss[loss=0.05624, simple_loss=0.06578, pruned_loss=0.01108, audio_tagging_loss=0.01227, over 14231.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08977, pruned_loss=0.01261, audio_tagging_loss=0.008827, over 3039489.16 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:18:03,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3199186.6666666665, ans=0.0 2023-11-27 19:18:04,930 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.12 vs. limit=22.5 2023-11-27 19:18:20,281 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479900 2023-11-27 19:18:30,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3199320.0, ans=0.125 2023-11-27 19:18:34,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3199386.6666666665, ans=0.0 2023-11-27 19:18:37,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3199386.6666666665, ans=0.125 2023-11-27 19:18:43,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3199453.3333333335, ans=0.125 2023-11-27 19:18:53,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3199520.0, ans=0.2 2023-11-27 19:18:54,493 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11000, loss[loss=0.08222, simple_loss=0.1136, pruned_loss=0.01551, audio_tagging_loss=0.009902, over 16919.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09092, pruned_loss=0.01287, audio_tagging_loss=0.008806, over 3039537.92 frames. ], batch size: 63, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:18:57,744 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.669e+01 9.375e+01 1.024e+02 1.386e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-27 19:19:05,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3199586.6666666665, ans=0.2 2023-11-27 19:19:07,835 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:19:18,250 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 479950 2023-11-27 19:19:27,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3199653.3333333335, ans=0.04949747468305833 2023-11-27 19:19:38,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3199720.0, ans=0.125 2023-11-27 19:19:43,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3199786.6666666665, ans=0.125 2023-11-27 19:19:51,837 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11050, loss[loss=0.08386, simple_loss=0.1173, pruned_loss=0.01824, audio_tagging_loss=0.00696, over 15146.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09125, pruned_loss=0.01274, audio_tagging_loss=0.008862, over 3040250.74 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:20:15,019 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480000 2023-11-27 19:20:24,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3199986.6666666665, ans=0.0 2023-11-27 19:20:27,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3200053.3333333335, ans=0.1 2023-11-27 19:20:51,387 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11100, loss[loss=0.05457, simple_loss=0.07255, pruned_loss=0.00807, audio_tagging_loss=0.01023, over 14535.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09151, pruned_loss=0.01283, audio_tagging_loss=0.008864, over 3043154.01 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:20:51,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3200186.6666666665, ans=0.125 2023-11-27 19:20:52,151 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.86 vs. limit=15.0 2023-11-27 19:20:56,276 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.813e+01 9.363e+01 1.015e+02 1.283e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-27 19:21:05,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3200253.3333333335, ans=0.125 2023-11-27 19:21:13,911 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480050 2023-11-27 19:21:34,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3200386.6666666665, ans=0.125 2023-11-27 19:21:40,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3200453.3333333335, ans=0.1 2023-11-27 19:21:47,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3200453.3333333335, ans=0.2 2023-11-27 19:21:49,151 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11150, loss[loss=0.04775, simple_loss=0.05557, pruned_loss=0.01027, audio_tagging_loss=0.009698, over 14248.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09061, pruned_loss=0.01273, audio_tagging_loss=0.009096, over 3040061.97 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:21:49,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3200520.0, ans=0.125 2023-11-27 19:21:54,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3200520.0, ans=0.0 2023-11-27 19:21:55,003 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2023-11-27 19:22:12,461 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480100 2023-11-27 19:22:12,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3200653.3333333335, ans=0.04949747468305833 2023-11-27 19:22:37,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3200786.6666666665, ans=0.0 2023-11-27 19:22:37,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3200786.6666666665, ans=0.09899494936611666 2023-11-27 19:22:46,467 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11200, loss[loss=0.0667, simple_loss=0.08574, pruned_loss=0.0132, audio_tagging_loss=0.01063, over 15893.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09039, pruned_loss=0.01248, audio_tagging_loss=0.009177, over 3045847.48 frames. ], batch size: 60, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:22:51,510 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.792e+01 8.755e+01 9.520e+01 1.002e+02 1.290e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-27 19:23:07,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3200920.0, ans=6.0 2023-11-27 19:23:07,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2023-11-27 19:23:10,194 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480150 2023-11-27 19:23:11,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3200986.6666666665, ans=0.125 2023-11-27 19:23:12,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3200986.6666666665, ans=0.0 2023-11-27 19:23:19,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3200986.6666666665, ans=0.0 2023-11-27 19:23:44,367 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11250, loss[loss=0.0675, simple_loss=0.09133, pruned_loss=0.01396, audio_tagging_loss=0.007874, over 15614.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09015, pruned_loss=0.01237, audio_tagging_loss=0.00909, over 3048490.99 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:23:49,028 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2023-11-27 19:23:51,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3201186.6666666665, ans=0.0 2023-11-27 19:23:51,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3201186.6666666665, ans=0.0 2023-11-27 19:24:04,432 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2023-11-27 19:24:07,127 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480200 2023-11-27 19:24:28,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3201386.6666666665, ans=0.125 2023-11-27 19:24:33,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3201453.3333333335, ans=0.0 2023-11-27 19:24:42,361 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11300, loss[loss=0.06114, simple_loss=0.0922, pruned_loss=0.009594, audio_tagging_loss=0.005445, over 16323.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09078, pruned_loss=0.0126, audio_tagging_loss=0.008795, over 3045711.60 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:24:45,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3201520.0, ans=0.125 2023-11-27 19:24:47,317 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.07 vs. limit=10.0 2023-11-27 19:24:47,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.989e+01 8.774e+01 9.523e+01 1.010e+02 1.222e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-27 19:24:51,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3201520.0, ans=0.125 2023-11-27 19:24:56,464 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.44 vs. limit=15.0 2023-11-27 19:25:05,807 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480250 2023-11-27 19:25:07,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3201653.3333333335, ans=0.2 2023-11-27 19:25:10,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3201653.3333333335, ans=0.0 2023-11-27 19:25:11,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3201653.3333333335, ans=0.125 2023-11-27 19:25:31,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3201786.6666666665, ans=0.0 2023-11-27 19:25:38,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3201853.3333333335, ans=0.125 2023-11-27 19:25:39,715 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11350, loss[loss=0.05487, simple_loss=0.0762, pruned_loss=0.01031, audio_tagging_loss=0.006461, over 15152.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09103, pruned_loss=0.01274, audio_tagging_loss=0.008604, over 3046367.73 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:25:41,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3201853.3333333335, ans=10.0 2023-11-27 19:25:43,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3201853.3333333335, ans=0.1 2023-11-27 19:25:56,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3201920.0, ans=0.125 2023-11-27 19:26:01,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3201920.0, ans=0.125 2023-11-27 19:26:03,199 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480300 2023-11-27 19:26:13,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3202053.3333333335, ans=0.125 2023-11-27 19:26:16,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3202053.3333333335, ans=0.5 2023-11-27 19:26:20,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3202053.3333333335, ans=0.0 2023-11-27 19:26:37,700 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11400, loss[loss=0.05782, simple_loss=0.07731, pruned_loss=0.009829, audio_tagging_loss=0.009337, over 13393.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09136, pruned_loss=0.01276, audio_tagging_loss=0.00853, over 3041120.10 frames. ], batch size: 52, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:26:43,650 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.795e+01 9.431e+01 1.004e+02 1.426e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-27 19:26:47,472 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=12.0 2023-11-27 19:26:49,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3202253.3333333335, ans=0.125 2023-11-27 19:27:00,033 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480350 2023-11-27 19:27:27,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3202453.3333333335, ans=0.125 2023-11-27 19:27:31,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3202453.3333333335, ans=0.125 2023-11-27 19:27:35,244 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11450, loss[loss=0.05111, simple_loss=0.07304, pruned_loss=0.007025, audio_tagging_loss=0.00756, over 16134.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09092, pruned_loss=0.01274, audio_tagging_loss=0.008556, over 3035707.56 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:27:42,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3202520.0, ans=0.0 2023-11-27 19:27:44,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3202520.0, ans=0.0 2023-11-27 19:27:48,906 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-11-27 19:27:52,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3202586.6666666665, ans=0.0 2023-11-27 19:27:57,810 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480400 2023-11-27 19:27:59,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3202653.3333333335, ans=0.125 2023-11-27 19:28:26,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3202786.6666666665, ans=0.125 2023-11-27 19:28:32,566 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11500, loss[loss=0.05188, simple_loss=0.07294, pruned_loss=0.006541, audio_tagging_loss=0.008868, over 14549.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08965, pruned_loss=0.01256, audio_tagging_loss=0.008682, over 3036268.51 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:28:38,526 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 9.056e+01 9.508e+01 1.039e+02 1.307e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 19:28:39,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3202853.3333333335, ans=0.125 2023-11-27 19:28:56,691 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480450 2023-11-27 19:29:16,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3203053.3333333335, ans=0.125 2023-11-27 19:29:30,420 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11550, loss[loss=0.06078, simple_loss=0.08957, pruned_loss=0.01033, audio_tagging_loss=0.005662, over 15388.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08977, pruned_loss=0.01249, audio_tagging_loss=0.008675, over 3041784.26 frames. ], batch size: 56, lr: 1.68e-03, grad_scale: 8.0 2023-11-27 19:29:40,747 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.89 vs. limit=10.0 2023-11-27 19:29:41,959 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.25 vs. limit=22.5 2023-11-27 19:29:42,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3203253.3333333335, ans=0.2 2023-11-27 19:29:49,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3203253.3333333335, ans=0.125 2023-11-27 19:29:53,562 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480500 2023-11-27 19:29:57,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3203320.0, ans=0.125 2023-11-27 19:30:06,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3203386.6666666665, ans=0.125 2023-11-27 19:30:09,459 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:30:18,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3203453.3333333335, ans=0.07 2023-11-27 19:30:28,613 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11600, loss[loss=0.08657, simple_loss=0.1181, pruned_loss=0.01946, audio_tagging_loss=0.008061, over 16012.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.0907, pruned_loss=0.01264, audio_tagging_loss=0.008662, over 3038792.36 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:30:33,944 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.753e+01 9.625e+01 1.023e+02 1.677e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-27 19:30:36,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3203520.0, ans=0.0 2023-11-27 19:30:42,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3203586.6666666665, ans=0.0 2023-11-27 19:30:50,948 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480550 2023-11-27 19:31:01,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3203720.0, ans=0.0 2023-11-27 19:31:06,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3203720.0, ans=0.125 2023-11-27 19:31:21,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3203786.6666666665, ans=0.125 2023-11-27 19:31:24,674 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11650, loss[loss=0.06329, simple_loss=0.0908, pruned_loss=0.01079, audio_tagging_loss=0.0071, over 15178.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09088, pruned_loss=0.01269, audio_tagging_loss=0.008738, over 3046820.39 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:31:31,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3203853.3333333335, ans=0.125 2023-11-27 19:31:34,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3203853.3333333335, ans=0.2 2023-11-27 19:31:47,477 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480600 2023-11-27 19:32:17,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3204120.0, ans=0.0 2023-11-27 19:32:22,255 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11700, loss[loss=0.07549, simple_loss=0.1078, pruned_loss=0.01464, audio_tagging_loss=0.006952, over 16249.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09025, pruned_loss=0.01254, audio_tagging_loss=0.008814, over 3044722.91 frames. ], batch size: 59, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:32:28,153 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.724e+01 9.258e+01 1.003e+02 1.518e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-27 19:32:28,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3204186.6666666665, ans=0.0 2023-11-27 19:32:28,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3204186.6666666665, ans=0.125 2023-11-27 19:32:44,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3204320.0, ans=0.0 2023-11-27 19:32:45,815 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480650 2023-11-27 19:32:45,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3204320.0, ans=0.125 2023-11-27 19:32:51,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3204320.0, ans=0.0 2023-11-27 19:32:59,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3204386.6666666665, ans=0.0 2023-11-27 19:33:07,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3204453.3333333335, ans=0.07 2023-11-27 19:33:08,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3204453.3333333335, ans=0.125 2023-11-27 19:33:20,272 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11750, loss[loss=0.05605, simple_loss=0.06725, pruned_loss=0.01228, audio_tagging_loss=0.01015, over 14353.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08931, pruned_loss=0.01236, audio_tagging_loss=0.008819, over 3045378.03 frames. ], batch size: 54, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:33:20,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3204520.0, ans=0.0 2023-11-27 19:33:34,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3204586.6666666665, ans=0.04949747468305833 2023-11-27 19:33:36,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3204586.6666666665, ans=0.125 2023-11-27 19:33:40,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3204586.6666666665, ans=0.1 2023-11-27 19:33:43,490 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480700 2023-11-27 19:34:02,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2023-11-27 19:34:12,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3204786.6666666665, ans=0.125 2023-11-27 19:34:18,044 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11800, loss[loss=0.05704, simple_loss=0.08283, pruned_loss=0.008936, audio_tagging_loss=0.006683, over 14799.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09023, pruned_loss=0.01242, audio_tagging_loss=0.008868, over 3041235.63 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:34:23,474 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.512e+01 9.140e+01 9.806e+01 1.375e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-27 19:34:40,888 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480750 2023-11-27 19:34:44,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3204986.6666666665, ans=0.1 2023-11-27 19:35:15,435 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11850, loss[loss=0.04139, simple_loss=0.05252, pruned_loss=0.003647, audio_tagging_loss=0.01148, over 15419.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08964, pruned_loss=0.01217, audio_tagging_loss=0.008969, over 3040148.69 frames. ], batch size: 62, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:35:16,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3205186.6666666665, ans=0.2 2023-11-27 19:35:23,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3205186.6666666665, ans=0.125 2023-11-27 19:35:39,084 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480800 2023-11-27 19:35:44,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3205320.0, ans=0.125 2023-11-27 19:35:50,110 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.68 vs. limit=22.5 2023-11-27 19:35:56,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3205386.6666666665, ans=0.2 2023-11-27 19:36:13,826 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11900, loss[loss=0.07388, simple_loss=0.102, pruned_loss=0.01577, audio_tagging_loss=0.007099, over 15750.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08965, pruned_loss=0.01229, audio_tagging_loss=0.009055, over 3049245.85 frames. ], batch size: 61, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:36:14,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3205520.0, ans=0.1 2023-11-27 19:36:19,241 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.961e+01 9.645e+01 1.029e+02 1.669e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-27 19:36:20,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3205520.0, ans=0.2 2023-11-27 19:36:27,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3205586.6666666665, ans=0.1 2023-11-27 19:36:37,021 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480850 2023-11-27 19:36:52,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3205720.0, ans=0.125 2023-11-27 19:36:54,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3205720.0, ans=0.025 2023-11-27 19:36:58,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3205786.6666666665, ans=0.125 2023-11-27 19:37:11,380 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 11950, loss[loss=0.07204, simple_loss=0.09885, pruned_loss=0.01476, audio_tagging_loss=0.00786, over 15599.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.08967, pruned_loss=0.0125, audio_tagging_loss=0.009146, over 3042438.76 frames. ], batch size: 57, lr: 1.68e-03, grad_scale: 16.0 2023-11-27 19:37:25,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3205920.0, ans=0.0 2023-11-27 19:37:27,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3205920.0, ans=0.1 2023-11-27 19:37:34,356 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480900 2023-11-27 19:37:36,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3205986.6666666665, ans=0.125 2023-11-27 19:37:44,022 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=15.0 2023-11-27 19:37:57,855 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.63 vs. limit=15.0 2023-11-27 19:38:07,387 INFO [train_asr.py:1235] (2/4) Epoch 40, batch 12000, loss[loss=0.06998, simple_loss=0.09786, pruned_loss=0.01426, audio_tagging_loss=0.006784, over 15231.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08886, pruned_loss=0.01242, audio_tagging_loss=0.009193, over 3038498.29 frames. ], batch size: 58, lr: 1.68e-03, grad_scale: 32.0 2023-11-27 19:38:07,388 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 19:38:41,941 INFO [train_asr.py:1267] (2/4) Epoch 40, validation: loss=0.05781, simple_loss=0.05069, pruned_loss=0.005234, audio_tagging_loss=0.02723, over 4681554.00 frames. 2023-11-27 19:38:41,941 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 19:38:46,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3206186.6666666665, ans=0.125 2023-11-27 19:38:47,251 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.817e+01 8.885e+01 9.490e+01 1.034e+02 1.237e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 19:39:02,895 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 480950 2023-11-27 19:39:26,315 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 0, loss[loss=0.08139, simple_loss=0.09751, pruned_loss=0.01448, audio_tagging_loss=0.01815, over 15658.00 frames. ], tot_loss[loss=0.08139, simple_loss=0.09751, pruned_loss=0.01448, audio_tagging_loss=0.01815, over 15658.00 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 19:39:26,315 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 19:40:00,220 INFO [train_asr.py:1267] (2/4) Epoch 41, validation: loss=0.05782, simple_loss=0.05064, pruned_loss=0.005197, audio_tagging_loss=0.0273, over 4681554.00 frames. 2023-11-27 19:40:00,221 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 19:40:00,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3206360.0, ans=0.0 2023-11-27 19:40:18,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3206426.6666666665, ans=0.0 2023-11-27 19:40:21,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3206426.6666666665, ans=0.2 2023-11-27 19:40:33,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3206560.0, ans=0.0 2023-11-27 19:40:50,934 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481000 2023-11-27 19:40:57,761 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 50, loss[loss=0.07748, simple_loss=0.1032, pruned_loss=0.01155, audio_tagging_loss=0.01435, over 16993.00 frames. ], tot_loss[loss=0.07512, simple_loss=0.09139, pruned_loss=0.01272, audio_tagging_loss=0.01671, over 695104.00 frames. ], batch size: 61, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 19:40:59,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3206693.3333333335, ans=0.125 2023-11-27 19:41:03,239 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2023-11-27 19:41:18,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3206760.0, ans=0.125 2023-11-27 19:41:26,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3206826.6666666665, ans=0.125 2023-11-27 19:41:29,534 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2023-11-27 19:41:31,553 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.740e+01 9.368e+01 1.003e+02 1.103e+02 1.548e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-27 19:41:36,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3206893.3333333335, ans=0.125 2023-11-27 19:41:48,719 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481050 2023-11-27 19:41:53,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3206960.0, ans=0.015 2023-11-27 19:41:54,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=3206960.0, ans=15.0 2023-11-27 19:41:55,739 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 100, loss[loss=0.07473, simple_loss=0.0932, pruned_loss=0.01391, audio_tagging_loss=0.01422, over 16143.00 frames. ], tot_loss[loss=0.07473, simple_loss=0.09202, pruned_loss=0.0128, audio_tagging_loss=0.01593, over 1217138.16 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 19:41:58,169 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:42:00,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3207026.6666666665, ans=0.1 2023-11-27 19:42:04,029 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2023-11-27 19:42:06,144 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.33 vs. limit=8.0 2023-11-27 19:42:06,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3207093.3333333335, ans=0.125 2023-11-27 19:42:07,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3207093.3333333335, ans=0.1 2023-11-27 19:42:11,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3207093.3333333335, ans=0.2 2023-11-27 19:42:36,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3207226.6666666665, ans=0.1 2023-11-27 19:42:46,574 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481100 2023-11-27 19:42:53,753 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 150, loss[loss=0.05913, simple_loss=0.07137, pruned_loss=0.01118, audio_tagging_loss=0.01226, over 15235.00 frames. ], tot_loss[loss=0.07228, simple_loss=0.09092, pruned_loss=0.01254, audio_tagging_loss=0.01427, over 1626624.70 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:42:57,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3207360.0, ans=0.125 2023-11-27 19:43:03,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3207360.0, ans=0.04949747468305833 2023-11-27 19:43:10,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=3207426.6666666665, ans=15.0 2023-11-27 19:43:12,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3207426.6666666665, ans=0.0 2023-11-27 19:43:28,180 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.150e+01 9.036e+01 9.587e+01 1.014e+02 1.345e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 19:43:35,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3207560.0, ans=0.05 2023-11-27 19:43:42,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3207626.6666666665, ans=0.125 2023-11-27 19:43:44,372 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481150 2023-11-27 19:43:51,436 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 200, loss[loss=0.09047, simple_loss=0.1253, pruned_loss=0.01887, audio_tagging_loss=0.008936, over 16133.00 frames. ], tot_loss[loss=0.07133, simple_loss=0.09185, pruned_loss=0.01272, audio_tagging_loss=0.01269, over 1951486.00 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:43:52,097 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=15.0 2023-11-27 19:43:56,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3207693.3333333335, ans=0.1 2023-11-27 19:44:17,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3207826.6666666665, ans=0.125 2023-11-27 19:44:24,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3207893.3333333335, ans=0.0 2023-11-27 19:44:42,397 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481200 2023-11-27 19:44:42,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=22.5 2023-11-27 19:44:49,822 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 250, loss[loss=0.06413, simple_loss=0.08808, pruned_loss=0.01159, audio_tagging_loss=0.008503, over 15658.00 frames. ], tot_loss[loss=0.06981, simple_loss=0.0913, pruned_loss=0.01264, audio_tagging_loss=0.01151, over 2194284.28 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:44:52,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3208026.6666666665, ans=0.125 2023-11-27 19:45:23,708 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 9.216e+01 9.866e+01 1.064e+02 1.717e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-27 19:45:31,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.39 vs. limit=15.0 2023-11-27 19:45:40,408 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481250 2023-11-27 19:45:40,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3208293.3333333335, ans=0.125 2023-11-27 19:45:47,466 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 300, loss[loss=0.08093, simple_loss=0.1193, pruned_loss=0.01589, audio_tagging_loss=0.005402, over 16218.00 frames. ], tot_loss[loss=0.06929, simple_loss=0.092, pruned_loss=0.0127, audio_tagging_loss=0.0106, over 2387601.17 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:46:03,362 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.50 vs. limit=22.5 2023-11-27 19:46:12,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3208493.3333333335, ans=0.125 2023-11-27 19:46:26,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3208560.0, ans=0.2 2023-11-27 19:46:38,122 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481300 2023-11-27 19:46:41,999 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2023-11-27 19:46:44,689 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 350, loss[loss=0.05916, simple_loss=0.08652, pruned_loss=0.007685, audio_tagging_loss=0.008218, over 14711.00 frames. ], tot_loss[loss=0.06817, simple_loss=0.09116, pruned_loss=0.01263, audio_tagging_loss=0.009967, over 2532202.83 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:46:46,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3208693.3333333335, ans=0.125 2023-11-27 19:46:54,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3208693.3333333335, ans=0.2 2023-11-27 19:46:56,054 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.80 vs. limit=6.0 2023-11-27 19:47:10,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3208826.6666666665, ans=0.125 2023-11-27 19:47:19,469 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.707e+01 9.326e+01 9.986e+01 1.163e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 19:47:36,070 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481350 2023-11-27 19:47:39,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3208960.0, ans=0.125 2023-11-27 19:47:43,118 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 400, loss[loss=0.07846, simple_loss=0.1165, pruned_loss=0.0148, audio_tagging_loss=0.005393, over 15072.00 frames. ], tot_loss[loss=0.06814, simple_loss=0.09172, pruned_loss=0.01267, audio_tagging_loss=0.009614, over 2647886.35 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 19:47:43,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3209026.6666666665, ans=0.125 2023-11-27 19:47:44,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3209026.6666666665, ans=0.125 2023-11-27 19:47:50,210 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.05 vs. limit=22.5 2023-11-27 19:47:50,223 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.91 vs. limit=22.5 2023-11-27 19:47:50,425 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.33 vs. limit=15.0 2023-11-27 19:48:00,735 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.08 vs. limit=6.0 2023-11-27 19:48:02,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3209093.3333333335, ans=0.1 2023-11-27 19:48:03,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3209093.3333333335, ans=0.125 2023-11-27 19:48:10,809 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2023-11-27 19:48:33,298 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481400 2023-11-27 19:48:33,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.39 vs. limit=10.0 2023-11-27 19:48:40,652 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 450, loss[loss=0.0683, simple_loss=0.09307, pruned_loss=0.01323, audio_tagging_loss=0.008534, over 14962.00 frames. ], tot_loss[loss=0.06744, simple_loss=0.09089, pruned_loss=0.01264, audio_tagging_loss=0.009353, over 2730306.05 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:49:15,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3209560.0, ans=6.0 2023-11-27 19:49:16,483 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.566e+01 9.069e+01 9.742e+01 1.634e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-27 19:49:16,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3209560.0, ans=0.0 2023-11-27 19:49:27,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3209626.6666666665, ans=0.125 2023-11-27 19:49:31,312 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481450 2023-11-27 19:49:37,846 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 500, loss[loss=0.06453, simple_loss=0.09219, pruned_loss=0.01149, audio_tagging_loss=0.006936, over 16086.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09013, pruned_loss=0.01241, audio_tagging_loss=0.009232, over 2801773.37 frames. ], batch size: 61, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:49:42,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3209693.3333333335, ans=0.125 2023-11-27 19:49:52,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3209760.0, ans=15.0 2023-11-27 19:49:56,549 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.61 vs. limit=6.0 2023-11-27 19:50:04,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3209826.6666666665, ans=0.1 2023-11-27 19:50:09,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3209826.6666666665, ans=0.125 2023-11-27 19:50:24,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3209960.0, ans=0.125 2023-11-27 19:50:28,324 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481500 2023-11-27 19:50:33,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3209960.0, ans=0.125 2023-11-27 19:50:36,102 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 550, loss[loss=0.05853, simple_loss=0.07723, pruned_loss=0.01236, audio_tagging_loss=0.00755, over 15113.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09027, pruned_loss=0.01246, audio_tagging_loss=0.009147, over 2854530.48 frames. ], batch size: 60, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:50:57,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3210093.3333333335, ans=15.0 2023-11-27 19:51:11,648 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.210e+01 8.776e+01 9.400e+01 1.030e+02 1.375e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-27 19:51:20,322 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.36 vs. limit=6.0 2023-11-27 19:51:24,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3210293.3333333335, ans=0.1 2023-11-27 19:51:27,016 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481550 2023-11-27 19:51:33,513 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 600, loss[loss=0.06595, simple_loss=0.08663, pruned_loss=0.01233, audio_tagging_loss=0.0103, over 14665.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09059, pruned_loss=0.01243, audio_tagging_loss=0.00915, over 2893316.44 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:51:37,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3210360.0, ans=10.0 2023-11-27 19:51:45,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3210426.6666666665, ans=0.1 2023-11-27 19:51:56,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3210493.3333333335, ans=0.125 2023-11-27 19:51:57,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3210493.3333333335, ans=0.0 2023-11-27 19:52:01,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3210493.3333333335, ans=0.125 2023-11-27 19:52:04,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3210493.3333333335, ans=0.0 2023-11-27 19:52:25,136 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481600 2023-11-27 19:52:32,053 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 650, loss[loss=0.09651, simple_loss=0.1298, pruned_loss=0.02473, audio_tagging_loss=0.006882, over 15241.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09112, pruned_loss=0.01253, audio_tagging_loss=0.009104, over 2924099.26 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:53:09,338 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.855e+01 9.490e+01 1.019e+02 1.294e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-27 19:53:22,835 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481650 2023-11-27 19:53:29,902 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 700, loss[loss=0.06406, simple_loss=0.0941, pruned_loss=0.008461, audio_tagging_loss=0.008546, over 15316.00 frames. ], tot_loss[loss=0.06736, simple_loss=0.0916, pruned_loss=0.01258, audio_tagging_loss=0.008978, over 2953707.17 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 8.0 2023-11-27 19:53:36,348 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2023-11-27 19:53:50,801 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.39 vs. limit=22.5 2023-11-27 19:54:01,301 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:54:04,529 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:54:12,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3211226.6666666665, ans=0.0 2023-11-27 19:54:17,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2023-11-27 19:54:20,595 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481700 2023-11-27 19:54:25,698 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:54:27,687 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 750, loss[loss=0.05021, simple_loss=0.07147, pruned_loss=0.00852, audio_tagging_loss=0.005953, over 13905.00 frames. ], tot_loss[loss=0.06777, simple_loss=0.09211, pruned_loss=0.01278, audio_tagging_loss=0.008941, over 2979336.89 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 8.0 2023-11-27 19:55:04,491 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.938e+01 9.552e+01 1.040e+02 1.357e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-27 19:55:05,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3211560.0, ans=0.125 2023-11-27 19:55:08,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3211560.0, ans=0.07 2023-11-27 19:55:19,156 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481750 2023-11-27 19:55:22,001 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=12.0 2023-11-27 19:55:25,730 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 800, loss[loss=0.07242, simple_loss=0.09961, pruned_loss=0.01195, audio_tagging_loss=0.01067, over 15611.00 frames. ], tot_loss[loss=0.06806, simple_loss=0.09258, pruned_loss=0.01289, audio_tagging_loss=0.008881, over 2994127.36 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:55:42,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3211760.0, ans=0.0 2023-11-27 19:56:16,539 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481800 2023-11-27 19:56:22,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3212026.6666666665, ans=0.125 2023-11-27 19:56:23,305 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 850, loss[loss=0.06315, simple_loss=0.07795, pruned_loss=0.01516, audio_tagging_loss=0.009011, over 14983.00 frames. ], tot_loss[loss=0.06816, simple_loss=0.0927, pruned_loss=0.01296, audio_tagging_loss=0.008856, over 3003546.68 frames. ], batch size: 57, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:56:27,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3212026.6666666665, ans=0.2 2023-11-27 19:56:35,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3212093.3333333335, ans=0.1 2023-11-27 19:56:42,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3212093.3333333335, ans=0.125 2023-11-27 19:56:45,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3212093.3333333335, ans=0.2 2023-11-27 19:56:56,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3212160.0, ans=0.1 2023-11-27 19:57:00,640 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.782e+01 9.230e+01 1.009e+02 1.508e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 19:57:10,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3212293.3333333335, ans=0.125 2023-11-27 19:57:14,849 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481850 2023-11-27 19:57:21,337 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 900, loss[loss=0.07361, simple_loss=0.1004, pruned_loss=0.01468, audio_tagging_loss=0.008726, over 16599.00 frames. ], tot_loss[loss=0.06799, simple_loss=0.09231, pruned_loss=0.01287, audio_tagging_loss=0.008964, over 3017963.22 frames. ], batch size: 61, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:57:34,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3212426.6666666665, ans=0.1 2023-11-27 19:57:37,010 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 19:57:37,250 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=22.5 2023-11-27 19:57:50,567 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=22.5 2023-11-27 19:58:12,787 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481900 2023-11-27 19:58:19,349 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 950, loss[loss=0.06986, simple_loss=0.09067, pruned_loss=0.01608, audio_tagging_loss=0.008448, over 15258.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.09186, pruned_loss=0.01287, audio_tagging_loss=0.008928, over 3025998.05 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:58:21,300 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=15.0 2023-11-27 19:58:25,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2023-11-27 19:58:27,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3212693.3333333335, ans=0.125 2023-11-27 19:58:31,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3212760.0, ans=0.1 2023-11-27 19:58:56,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 9.001e+01 9.830e+01 1.057e+02 1.367e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-27 19:59:10,353 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 481950 2023-11-27 19:59:11,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3212960.0, ans=0.1 2023-11-27 19:59:13,044 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2023-11-27 19:59:16,913 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1000, loss[loss=0.05633, simple_loss=0.06905, pruned_loss=0.01108, audio_tagging_loss=0.01073, over 13839.00 frames. ], tot_loss[loss=0.06718, simple_loss=0.09129, pruned_loss=0.01271, audio_tagging_loss=0.008826, over 3027294.79 frames. ], batch size: 52, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 19:59:22,289 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.57 vs. limit=22.5 2023-11-27 19:59:30,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3213093.3333333335, ans=0.1 2023-11-27 19:59:44,567 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 19:59:49,288 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=15.0 2023-11-27 20:00:08,207 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482000 2023-11-27 20:00:09,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3213293.3333333335, ans=0.125 2023-11-27 20:00:15,082 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1050, loss[loss=0.0628, simple_loss=0.09333, pruned_loss=0.009234, audio_tagging_loss=0.0069, over 14352.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09067, pruned_loss=0.01263, audio_tagging_loss=0.008707, over 3028671.59 frames. ], batch size: 53, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:00:29,964 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-27 20:00:30,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3213426.6666666665, ans=0.125 2023-11-27 20:00:30,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3213426.6666666665, ans=0.125 2023-11-27 20:00:49,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3213560.0, ans=0.0 2023-11-27 20:00:51,867 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.496e+01 9.150e+01 1.002e+02 1.300e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-27 20:01:05,878 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482050 2023-11-27 20:01:13,678 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1100, loss[loss=0.06868, simple_loss=0.09362, pruned_loss=0.01441, audio_tagging_loss=0.007462, over 14957.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09018, pruned_loss=0.01259, audio_tagging_loss=0.00867, over 3036210.53 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:01:18,138 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:01:26,312 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.89 vs. limit=12.0 2023-11-27 20:01:29,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3213760.0, ans=0.09899494936611666 2023-11-27 20:01:33,454 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.02 vs. limit=10.0 2023-11-27 20:01:34,606 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.04 vs. limit=15.0 2023-11-27 20:01:58,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3213960.0, ans=0.125 2023-11-27 20:02:04,175 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482100 2023-11-27 20:02:10,928 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1150, loss[loss=0.05581, simple_loss=0.07584, pruned_loss=0.01024, audio_tagging_loss=0.007653, over 14742.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08973, pruned_loss=0.01243, audio_tagging_loss=0.008704, over 3035337.62 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:02:16,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3214026.6666666665, ans=0.0 2023-11-27 20:02:33,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3214160.0, ans=0.125 2023-11-27 20:02:48,773 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 8.534e+01 9.220e+01 9.959e+01 1.599e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-27 20:03:02,483 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482150 2023-11-27 20:03:09,504 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1200, loss[loss=0.07157, simple_loss=0.09771, pruned_loss=0.01389, audio_tagging_loss=0.008825, over 15235.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08964, pruned_loss=0.01251, audio_tagging_loss=0.008677, over 3030794.54 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:03:21,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3214426.6666666665, ans=0.0 2023-11-27 20:03:22,927 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.93 vs. limit=15.0 2023-11-27 20:03:24,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3214426.6666666665, ans=0.125 2023-11-27 20:03:27,294 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2023-11-27 20:03:40,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3214493.3333333335, ans=0.0 2023-11-27 20:04:00,209 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482200 2023-11-27 20:04:07,632 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1250, loss[loss=0.08392, simple_loss=0.1202, pruned_loss=0.01757, audio_tagging_loss=0.006262, over 16016.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08919, pruned_loss=0.01247, audio_tagging_loss=0.008687, over 3036985.91 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:04:24,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3214760.0, ans=0.125 2023-11-27 20:04:35,765 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2023-11-27 20:04:44,173 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.423e+01 8.597e+01 9.538e+01 1.019e+02 1.522e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-27 20:04:58,096 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482250 2023-11-27 20:05:05,191 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1300, loss[loss=0.07423, simple_loss=0.1056, pruned_loss=0.01329, audio_tagging_loss=0.008128, over 14303.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08895, pruned_loss=0.01234, audio_tagging_loss=0.008667, over 3031654.48 frames. ], batch size: 52, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:05:12,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3215026.6666666665, ans=0.0 2023-11-27 20:05:15,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3215093.3333333335, ans=10.0 2023-11-27 20:05:40,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3215226.6666666665, ans=0.0 2023-11-27 20:05:41,179 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2023-11-27 20:05:41,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3215226.6666666665, ans=0.0 2023-11-27 20:05:55,749 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482300 2023-11-27 20:06:03,056 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1350, loss[loss=0.05947, simple_loss=0.08241, pruned_loss=0.01006, audio_tagging_loss=0.00821, over 16366.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08891, pruned_loss=0.01231, audio_tagging_loss=0.008665, over 3038289.99 frames. ], batch size: 62, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:06:08,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3215360.0, ans=0.125 2023-11-27 20:06:22,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3215426.6666666665, ans=0.125 2023-11-27 20:06:23,951 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.70 vs. limit=6.0 2023-11-27 20:06:26,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2023-11-27 20:06:40,513 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 8.606e+01 9.244e+01 9.716e+01 1.166e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 20:06:40,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3215560.0, ans=0.125 2023-11-27 20:06:47,026 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:06:53,627 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482350 2023-11-27 20:06:54,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3215626.6666666665, ans=0.0 2023-11-27 20:07:00,786 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1400, loss[loss=0.07197, simple_loss=0.09951, pruned_loss=0.01542, audio_tagging_loss=0.006799, over 14524.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08985, pruned_loss=0.01251, audio_tagging_loss=0.008712, over 3036202.19 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:07:20,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3215760.0, ans=0.125 2023-11-27 20:07:42,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3215893.3333333335, ans=0.125 2023-11-27 20:07:51,636 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482400 2023-11-27 20:07:58,320 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1450, loss[loss=0.06789, simple_loss=0.09706, pruned_loss=0.01096, audio_tagging_loss=0.008409, over 15298.00 frames. ], tot_loss[loss=0.06717, simple_loss=0.09166, pruned_loss=0.01272, audio_tagging_loss=0.008622, over 3048761.50 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:08:11,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3216093.3333333335, ans=0.125 2023-11-27 20:08:36,396 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.289e+01 8.746e+01 9.275e+01 1.017e+02 1.401e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 20:08:49,245 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482450 2023-11-27 20:08:56,206 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1500, loss[loss=0.05673, simple_loss=0.07313, pruned_loss=0.01018, audio_tagging_loss=0.009979, over 15726.00 frames. ], tot_loss[loss=0.0672, simple_loss=0.09147, pruned_loss=0.01269, audio_tagging_loss=0.008768, over 3051282.40 frames. ], batch size: 59, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:09:00,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3216360.0, ans=0.125 2023-11-27 20:09:12,133 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=15.0 2023-11-27 20:09:12,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3216426.6666666665, ans=0.125 2023-11-27 20:09:13,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3216426.6666666665, ans=0.5 2023-11-27 20:09:24,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3216493.3333333335, ans=0.0 2023-11-27 20:09:24,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3216493.3333333335, ans=0.0 2023-11-27 20:09:25,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3216493.3333333335, ans=0.0 2023-11-27 20:09:43,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3216626.6666666665, ans=0.0 2023-11-27 20:09:47,263 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482500 2023-11-27 20:09:53,905 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1550, loss[loss=0.0671, simple_loss=0.08397, pruned_loss=0.01183, audio_tagging_loss=0.01329, over 15459.00 frames. ], tot_loss[loss=0.06737, simple_loss=0.09149, pruned_loss=0.01288, audio_tagging_loss=0.008751, over 3049420.92 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:09:56,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3216693.3333333335, ans=0.125 2023-11-27 20:09:57,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3216693.3333333335, ans=0.125 2023-11-27 20:10:13,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3216760.0, ans=0.0 2023-11-27 20:10:23,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3216826.6666666665, ans=0.125 2023-11-27 20:10:32,571 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 8.858e+01 9.389e+01 9.907e+01 1.182e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 20:10:45,407 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482550 2023-11-27 20:10:51,984 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1600, loss[loss=0.07666, simple_loss=0.1031, pruned_loss=0.0157, audio_tagging_loss=0.00942, over 14383.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09107, pruned_loss=0.01273, audio_tagging_loss=0.008753, over 3052023.93 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:10:56,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3217026.6666666665, ans=0.125 2023-11-27 20:10:58,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3217026.6666666665, ans=0.125 2023-11-27 20:11:12,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2023-11-27 20:11:25,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3217160.0, ans=0.0 2023-11-27 20:11:42,424 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482600 2023-11-27 20:11:42,893 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.73 vs. limit=15.0 2023-11-27 20:11:49,946 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1650, loss[loss=0.06362, simple_loss=0.08437, pruned_loss=0.01178, audio_tagging_loss=0.009661, over 16285.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.09115, pruned_loss=0.01266, audio_tagging_loss=0.008791, over 3051664.77 frames. ], batch size: 63, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:11:50,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3217360.0, ans=0.125 2023-11-27 20:12:08,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3217426.6666666665, ans=0.0 2023-11-27 20:12:27,442 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.730e+01 9.445e+01 1.002e+02 1.391e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 20:12:40,720 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482650 2023-11-27 20:12:47,280 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1700, loss[loss=0.06568, simple_loss=0.09868, pruned_loss=0.009192, audio_tagging_loss=0.007145, over 15341.00 frames. ], tot_loss[loss=0.06728, simple_loss=0.09153, pruned_loss=0.01266, audio_tagging_loss=0.008852, over 3057506.00 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:12:51,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3217693.3333333335, ans=0.125 2023-11-27 20:13:00,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3217760.0, ans=0.2 2023-11-27 20:13:04,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3217760.0, ans=0.1 2023-11-27 20:13:21,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3217893.3333333335, ans=0.125 2023-11-27 20:13:28,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3217893.3333333335, ans=0.1 2023-11-27 20:13:31,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3217893.3333333335, ans=0.125 2023-11-27 20:13:35,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3217960.0, ans=0.0 2023-11-27 20:13:38,650 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482700 2023-11-27 20:13:38,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3217960.0, ans=0.125 2023-11-27 20:13:45,082 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1750, loss[loss=0.06769, simple_loss=0.08927, pruned_loss=0.01171, audio_tagging_loss=0.01135, over 14967.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09104, pruned_loss=0.01254, audio_tagging_loss=0.00886, over 3055283.38 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:14:04,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3218093.3333333335, ans=0.0 2023-11-27 20:14:23,551 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.743e+01 9.232e+01 9.959e+01 1.189e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-27 20:14:32,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3218293.3333333335, ans=0.0 2023-11-27 20:14:34,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3218293.3333333335, ans=0.1 2023-11-27 20:14:35,668 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482750 2023-11-27 20:14:42,277 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1800, loss[loss=0.06562, simple_loss=0.09111, pruned_loss=0.01292, audio_tagging_loss=0.007145, over 15599.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09051, pruned_loss=0.01257, audio_tagging_loss=0.008746, over 3056258.65 frames. ], batch size: 60, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:14:46,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3218360.0, ans=0.125 2023-11-27 20:14:57,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3218426.6666666665, ans=0.0 2023-11-27 20:15:06,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3218493.3333333335, ans=0.125 2023-11-27 20:15:08,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3218493.3333333335, ans=0.2 2023-11-27 20:15:16,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3218560.0, ans=0.0 2023-11-27 20:15:22,916 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=15.0 2023-11-27 20:15:28,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3218626.6666666665, ans=0.125 2023-11-27 20:15:32,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3218626.6666666665, ans=0.04949747468305833 2023-11-27 20:15:33,367 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482800 2023-11-27 20:15:35,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3218626.6666666665, ans=0.5 2023-11-27 20:15:40,753 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1850, loss[loss=0.07416, simple_loss=0.09992, pruned_loss=0.01475, audio_tagging_loss=0.009445, over 14982.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09068, pruned_loss=0.01254, audio_tagging_loss=0.00868, over 3053758.97 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:15:50,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3218693.3333333335, ans=0.2 2023-11-27 20:15:53,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3218760.0, ans=0.0 2023-11-27 20:16:01,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3218760.0, ans=0.0 2023-11-27 20:16:06,075 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2023-11-27 20:16:18,491 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.712e+01 9.397e+01 9.825e+01 1.168e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 20:16:23,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3218893.3333333335, ans=0.0 2023-11-27 20:16:28,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3218960.0, ans=0.5 2023-11-27 20:16:31,996 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482850 2023-11-27 20:16:38,531 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1900, loss[loss=0.07047, simple_loss=0.09736, pruned_loss=0.01187, audio_tagging_loss=0.009915, over 15288.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09073, pruned_loss=0.01263, audio_tagging_loss=0.008605, over 3053327.81 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:16:59,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3219093.3333333335, ans=0.0 2023-11-27 20:16:59,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3219093.3333333335, ans=0.2 2023-11-27 20:17:17,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3219226.6666666665, ans=0.0 2023-11-27 20:17:20,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3219226.6666666665, ans=0.125 2023-11-27 20:17:29,256 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482900 2023-11-27 20:17:32,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3219293.3333333335, ans=0.1 2023-11-27 20:17:35,816 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 1950, loss[loss=0.05834, simple_loss=0.07739, pruned_loss=0.009466, audio_tagging_loss=0.01018, over 14320.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09095, pruned_loss=0.01255, audio_tagging_loss=0.008599, over 3053487.97 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:17:40,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3219360.0, ans=0.0 2023-11-27 20:17:48,224 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.98 vs. limit=10.0 2023-11-27 20:17:56,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3219426.6666666665, ans=0.125 2023-11-27 20:18:15,538 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.488e+01 8.669e+01 9.288e+01 9.966e+01 1.212e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-27 20:18:27,150 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 482950 2023-11-27 20:18:32,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3219626.6666666665, ans=0.125 2023-11-27 20:18:32,449 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2023-11-27 20:18:34,221 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2000, loss[loss=0.05935, simple_loss=0.07334, pruned_loss=0.01171, audio_tagging_loss=0.01097, over 16760.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08985, pruned_loss=0.01247, audio_tagging_loss=0.008761, over 3047412.81 frames. ], batch size: 62, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:18:34,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3219693.3333333335, ans=0.125 2023-11-27 20:18:34,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3219693.3333333335, ans=0.0 2023-11-27 20:18:56,207 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.85 vs. limit=15.0 2023-11-27 20:18:57,222 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=22.5 2023-11-27 20:18:59,588 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.42 vs. limit=8.0 2023-11-27 20:19:01,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3219826.6666666665, ans=0.1 2023-11-27 20:19:03,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3219826.6666666665, ans=0.125 2023-11-27 20:19:25,572 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483000 2023-11-27 20:19:32,543 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2050, loss[loss=0.0664, simple_loss=0.09144, pruned_loss=0.0108, audio_tagging_loss=0.009871, over 15760.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09132, pruned_loss=0.01271, audio_tagging_loss=0.008665, over 3046684.37 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 32.0 2023-11-27 20:19:58,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3220160.0, ans=0.0 2023-11-27 20:20:02,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3220160.0, ans=0.1 2023-11-27 20:20:12,037 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.951e+01 8.893e+01 9.583e+01 1.011e+02 1.256e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-27 20:20:12,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3220226.6666666665, ans=0.1 2023-11-27 20:20:13,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3220226.6666666665, ans=0.125 2023-11-27 20:20:13,707 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.57 vs. limit=22.5 2023-11-27 20:20:22,987 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483050 2023-11-27 20:20:27,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3220293.3333333335, ans=0.125 2023-11-27 20:20:29,580 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2100, loss[loss=0.07839, simple_loss=0.1132, pruned_loss=0.01601, audio_tagging_loss=0.005764, over 15362.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09014, pruned_loss=0.01246, audio_tagging_loss=0.008678, over 3045897.83 frames. ], batch size: 58, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:20:57,539 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.75 vs. limit=15.0 2023-11-27 20:21:09,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3220560.0, ans=0.1 2023-11-27 20:21:18,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3220626.6666666665, ans=0.125 2023-11-27 20:21:20,565 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483100 2023-11-27 20:21:27,427 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2150, loss[loss=0.06549, simple_loss=0.08665, pruned_loss=0.0144, audio_tagging_loss=0.007764, over 14717.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08993, pruned_loss=0.01255, audio_tagging_loss=0.008689, over 3042869.62 frames. ], batch size: 54, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:21:29,299 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-27 20:21:30,283 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.05 vs. limit=15.0 2023-11-27 20:21:40,660 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.05 vs. limit=15.0 2023-11-27 20:21:40,799 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.37 vs. limit=22.5 2023-11-27 20:22:03,877 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:22:07,611 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.323e+01 8.704e+01 9.254e+01 9.792e+01 1.378e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-27 20:22:12,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3220960.0, ans=0.0 2023-11-27 20:22:15,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3220960.0, ans=0.125 2023-11-27 20:22:17,573 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483150 2023-11-27 20:22:25,334 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2200, loss[loss=0.08036, simple_loss=0.1158, pruned_loss=0.0128, audio_tagging_loss=0.009651, over 15552.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09044, pruned_loss=0.01272, audio_tagging_loss=0.008684, over 3049768.24 frames. ], batch size: 56, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:22:33,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3221026.6666666665, ans=0.125 2023-11-27 20:22:41,043 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:22:42,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3221093.3333333335, ans=0.0 2023-11-27 20:22:44,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3221093.3333333335, ans=0.2 2023-11-27 20:22:49,580 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=15.0 2023-11-27 20:22:52,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3221160.0, ans=0.1 2023-11-27 20:22:54,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3221160.0, ans=0.2 2023-11-27 20:22:55,347 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.86 vs. limit=10.0 2023-11-27 20:23:04,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3221226.6666666665, ans=0.125 2023-11-27 20:23:16,025 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483200 2023-11-27 20:23:19,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3221293.3333333335, ans=0.0 2023-11-27 20:23:23,024 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2250, loss[loss=0.06025, simple_loss=0.07898, pruned_loss=0.01127, audio_tagging_loss=0.009486, over 14932.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09056, pruned_loss=0.01269, audio_tagging_loss=0.008667, over 3041756.97 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:23:25,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.64 vs. limit=22.5 2023-11-27 20:23:47,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3221493.3333333335, ans=0.125 2023-11-27 20:23:48,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3221493.3333333335, ans=0.025 2023-11-27 20:24:00,338 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.76 vs. limit=15.0 2023-11-27 20:24:00,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3221560.0, ans=0.125 2023-11-27 20:24:03,954 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.981e+01 8.930e+01 9.422e+01 1.015e+02 1.618e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-27 20:24:14,037 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483250 2023-11-27 20:24:15,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3221626.6666666665, ans=0.07 2023-11-27 20:24:21,364 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2300, loss[loss=0.08162, simple_loss=0.1227, pruned_loss=0.01369, audio_tagging_loss=0.006574, over 15520.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09033, pruned_loss=0.01256, audio_tagging_loss=0.008686, over 3044040.27 frames. ], batch size: 55, lr: 1.66e-03, grad_scale: 16.0 2023-11-27 20:24:31,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3221760.0, ans=0.125 2023-11-27 20:24:33,548 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:25:02,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3221893.3333333335, ans=0.1 2023-11-27 20:25:11,956 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483300 2023-11-27 20:25:14,152 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:25:15,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3221960.0, ans=0.5 2023-11-27 20:25:19,107 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2350, loss[loss=0.08719, simple_loss=0.1181, pruned_loss=0.01829, audio_tagging_loss=0.00984, over 16121.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09032, pruned_loss=0.01258, audio_tagging_loss=0.008789, over 3044959.71 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:25:20,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3222026.6666666665, ans=0.0 2023-11-27 20:25:45,744 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2023-11-27 20:25:54,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3222226.6666666665, ans=0.125 2023-11-27 20:25:58,846 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.811e+01 9.279e+01 1.007e+02 1.436e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-27 20:26:09,494 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483350 2023-11-27 20:26:16,872 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2400, loss[loss=0.06539, simple_loss=0.0895, pruned_loss=0.01019, audio_tagging_loss=0.01046, over 15006.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.0908, pruned_loss=0.01262, audio_tagging_loss=0.008873, over 3045331.69 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:26:17,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3222360.0, ans=0.2 2023-11-27 20:26:34,514 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:26:45,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3222493.3333333335, ans=0.1 2023-11-27 20:26:49,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3222493.3333333335, ans=0.125 2023-11-27 20:27:00,633 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2023-11-27 20:27:02,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3222626.6666666665, ans=0.1 2023-11-27 20:27:07,828 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483400 2023-11-27 20:27:15,126 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2450, loss[loss=0.06411, simple_loss=0.08588, pruned_loss=0.01201, audio_tagging_loss=0.009156, over 15164.00 frames. ], tot_loss[loss=0.06695, simple_loss=0.09053, pruned_loss=0.01277, audio_tagging_loss=0.008914, over 3045925.60 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:27:17,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3222693.3333333335, ans=0.125 2023-11-27 20:27:56,804 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.146e+01 8.599e+01 9.201e+01 9.948e+01 1.437e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-27 20:27:57,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3222893.3333333335, ans=0.025 2023-11-27 20:28:00,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3222960.0, ans=0.2 2023-11-27 20:28:06,178 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483450 2023-11-27 20:28:12,695 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2500, loss[loss=0.05592, simple_loss=0.07794, pruned_loss=0.00857, audio_tagging_loss=0.008375, over 14702.00 frames. ], tot_loss[loss=0.06773, simple_loss=0.0917, pruned_loss=0.013, audio_tagging_loss=0.008883, over 3041642.01 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:28:54,343 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.93 vs. limit=15.0 2023-11-27 20:28:55,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3223226.6666666665, ans=0.2 2023-11-27 20:29:04,290 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483500 2023-11-27 20:29:06,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3223293.3333333335, ans=0.2 2023-11-27 20:29:10,718 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2550, loss[loss=0.05511, simple_loss=0.07185, pruned_loss=0.009638, audio_tagging_loss=0.009549, over 16045.00 frames. ], tot_loss[loss=0.06808, simple_loss=0.09234, pruned_loss=0.01309, audio_tagging_loss=0.008824, over 3045411.14 frames. ], batch size: 63, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:29:15,267 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2023-11-27 20:29:27,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3223426.6666666665, ans=0.0 2023-11-27 20:29:47,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3223560.0, ans=0.1 2023-11-27 20:29:52,375 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.657e+01 9.326e+01 1.025e+02 1.204e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 20:30:01,945 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483550 2023-11-27 20:30:08,843 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2600, loss[loss=0.09522, simple_loss=0.1259, pruned_loss=0.0269, audio_tagging_loss=0.005366, over 14288.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.09227, pruned_loss=0.0131, audio_tagging_loss=0.008698, over 3042838.53 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:30:15,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3223693.3333333335, ans=0.1 2023-11-27 20:30:18,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3223693.3333333335, ans=0.1 2023-11-27 20:30:25,147 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.88 vs. limit=22.5 2023-11-27 20:30:27,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-27 20:30:56,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3223960.0, ans=0.125 2023-11-27 20:30:59,539 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483600 2023-11-27 20:31:06,351 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2650, loss[loss=0.0677, simple_loss=0.09842, pruned_loss=0.01169, audio_tagging_loss=0.006802, over 14398.00 frames. ], tot_loss[loss=0.0674, simple_loss=0.09141, pruned_loss=0.01295, audio_tagging_loss=0.008739, over 3043570.40 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:31:10,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3224026.6666666665, ans=0.125 2023-11-27 20:31:14,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2023-11-27 20:31:41,159 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.65 vs. limit=22.5 2023-11-27 20:31:45,555 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2023-11-27 20:31:48,425 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 8.676e+01 9.510e+01 9.992e+01 1.225e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 20:31:54,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3224293.3333333335, ans=0.125 2023-11-27 20:31:57,841 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483650 2023-11-27 20:32:04,331 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2700, loss[loss=0.05805, simple_loss=0.07892, pruned_loss=0.01107, audio_tagging_loss=0.007522, over 15489.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08974, pruned_loss=0.01255, audio_tagging_loss=0.008681, over 3039581.90 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:32:04,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3224360.0, ans=0.1 2023-11-27 20:32:04,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3224360.0, ans=0.125 2023-11-27 20:32:19,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.78 vs. limit=10.0 2023-11-27 20:32:19,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3224426.6666666665, ans=0.1 2023-11-27 20:32:24,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3224426.6666666665, ans=0.0 2023-11-27 20:32:37,217 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.82 vs. limit=6.0 2023-11-27 20:32:41,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3224560.0, ans=0.2 2023-11-27 20:32:43,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3224560.0, ans=0.2 2023-11-27 20:32:55,151 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483700 2023-11-27 20:33:02,273 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2750, loss[loss=0.06042, simple_loss=0.08659, pruned_loss=0.007838, audio_tagging_loss=0.009287, over 15643.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08919, pruned_loss=0.01242, audio_tagging_loss=0.008702, over 3037671.87 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:33:06,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3224693.3333333335, ans=0.2 2023-11-27 20:33:30,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3224826.6666666665, ans=0.0 2023-11-27 20:33:43,551 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.983e+01 8.556e+01 9.189e+01 9.890e+01 1.172e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-27 20:33:53,811 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:33:53,844 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483750 2023-11-27 20:34:00,309 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2800, loss[loss=0.05763, simple_loss=0.07542, pruned_loss=0.009381, audio_tagging_loss=0.01054, over 14903.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08953, pruned_loss=0.0125, audio_tagging_loss=0.008686, over 3036628.07 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:34:09,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3225026.6666666665, ans=0.0 2023-11-27 20:34:12,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3225093.3333333335, ans=0.2 2023-11-27 20:34:26,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3225160.0, ans=0.0 2023-11-27 20:34:30,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3225160.0, ans=0.1 2023-11-27 20:34:33,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3225226.6666666665, ans=0.0 2023-11-27 20:34:35,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3225226.6666666665, ans=0.125 2023-11-27 20:34:42,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3225226.6666666665, ans=0.0 2023-11-27 20:34:51,462 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483800 2023-11-27 20:34:52,772 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:34:58,466 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2850, loss[loss=0.06149, simple_loss=0.08601, pruned_loss=0.01109, audio_tagging_loss=0.007388, over 15240.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08923, pruned_loss=0.01234, audio_tagging_loss=0.008649, over 3037604.98 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:34:58,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3225360.0, ans=0.125 2023-11-27 20:35:03,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3225360.0, ans=0.0 2023-11-27 20:35:07,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3225360.0, ans=0.0 2023-11-27 20:35:09,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3225426.6666666665, ans=0.125 2023-11-27 20:35:28,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3225493.3333333335, ans=0.0 2023-11-27 20:35:41,036 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.142e+01 8.834e+01 9.311e+01 1.027e+02 1.174e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-27 20:35:48,734 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483850 2023-11-27 20:35:49,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3225626.6666666665, ans=0.0 2023-11-27 20:35:53,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3225626.6666666665, ans=0.0 2023-11-27 20:35:53,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.96 vs. limit=10.0 2023-11-27 20:35:55,316 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2900, loss[loss=0.05553, simple_loss=0.07111, pruned_loss=0.01033, audio_tagging_loss=0.009644, over 14333.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08902, pruned_loss=0.01239, audio_tagging_loss=0.00866, over 3038717.08 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:36:03,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3225693.3333333335, ans=15.0 2023-11-27 20:36:46,531 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483900 2023-11-27 20:36:50,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3225960.0, ans=0.1 2023-11-27 20:36:53,787 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 2950, loss[loss=0.08031, simple_loss=0.1043, pruned_loss=0.01354, audio_tagging_loss=0.01463, over 15919.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09009, pruned_loss=0.01263, audio_tagging_loss=0.008678, over 3034583.63 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:36:54,292 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.03 vs. limit=12.0 2023-11-27 20:36:59,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3226026.6666666665, ans=0.0 2023-11-27 20:37:05,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3226093.3333333335, ans=0.1 2023-11-27 20:37:06,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3226093.3333333335, ans=0.1 2023-11-27 20:37:24,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3226160.0, ans=0.0 2023-11-27 20:37:30,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3226226.6666666665, ans=0.0 2023-11-27 20:37:36,761 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 8.664e+01 9.410e+01 9.930e+01 1.488e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 20:37:44,503 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 483950 2023-11-27 20:37:51,783 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3000, loss[loss=0.08918, simple_loss=0.1245, pruned_loss=0.02061, audio_tagging_loss=0.006326, over 16385.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09085, pruned_loss=0.01274, audio_tagging_loss=0.008644, over 3042505.75 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:37:51,784 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 20:38:22,573 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9963, 5.8758, 5.6383, 5.6016], device='cuda:2') 2023-11-27 20:38:26,089 INFO [train_asr.py:1267] (2/4) Epoch 41, validation: loss=0.0572, simple_loss=0.05061, pruned_loss=0.005192, audio_tagging_loss=0.0267, over 4681554.00 frames. 2023-11-27 20:38:26,090 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 20:38:32,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3226360.0, ans=0.1 2023-11-27 20:39:17,122 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484000 2023-11-27 20:39:26,481 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3050, loss[loss=0.09503, simple_loss=0.1356, pruned_loss=0.01891, audio_tagging_loss=0.008301, over 16810.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.09201, pruned_loss=0.0129, audio_tagging_loss=0.008711, over 3041474.75 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:40:01,232 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:40:01,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3226893.3333333335, ans=0.1 2023-11-27 20:40:03,941 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=12.0 2023-11-27 20:40:09,336 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.867e+01 9.400e+01 1.012e+02 1.240e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-27 20:40:10,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3226893.3333333335, ans=0.0 2023-11-27 20:40:14,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3226960.0, ans=0.0 2023-11-27 20:40:17,837 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484050 2023-11-27 20:40:19,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3226960.0, ans=0.1 2023-11-27 20:40:24,355 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3100, loss[loss=0.06804, simple_loss=0.09327, pruned_loss=0.0139, audio_tagging_loss=0.007512, over 14772.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09107, pruned_loss=0.01276, audio_tagging_loss=0.008772, over 3037994.55 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:40:28,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3227026.6666666665, ans=0.125 2023-11-27 20:40:54,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3227160.0, ans=0.125 2023-11-27 20:40:57,514 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.11 vs. limit=15.0 2023-11-27 20:41:05,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=3227226.6666666665, ans=0.02 2023-11-27 20:41:14,717 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484100 2023-11-27 20:41:21,280 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3150, loss[loss=0.04802, simple_loss=0.06291, pruned_loss=0.007635, audio_tagging_loss=0.008928, over 14315.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09164, pruned_loss=0.01277, audio_tagging_loss=0.00875, over 3041094.35 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:41:24,586 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.05 vs. limit=22.5 2023-11-27 20:41:32,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3227426.6666666665, ans=0.2 2023-11-27 20:41:34,611 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:41:38,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3227426.6666666665, ans=0.125 2023-11-27 20:41:40,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3227426.6666666665, ans=0.125 2023-11-27 20:41:53,708 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.53 vs. limit=10.0 2023-11-27 20:41:54,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3227493.3333333335, ans=0.07 2023-11-27 20:42:03,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3227560.0, ans=0.125 2023-11-27 20:42:04,119 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.840e+01 9.395e+01 9.954e+01 1.405e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 20:42:09,413 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2023-11-27 20:42:12,783 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484150 2023-11-27 20:42:14,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3227626.6666666665, ans=0.125 2023-11-27 20:42:14,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3227626.6666666665, ans=0.0 2023-11-27 20:42:19,254 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3200, loss[loss=0.05742, simple_loss=0.07389, pruned_loss=0.0117, audio_tagging_loss=0.008777, over 14545.00 frames. ], tot_loss[loss=0.06793, simple_loss=0.09241, pruned_loss=0.01293, audio_tagging_loss=0.008798, over 3043516.35 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:42:35,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3227760.0, ans=0.2 2023-11-27 20:42:49,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3227826.6666666665, ans=0.0 2023-11-27 20:42:56,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3227893.3333333335, ans=0.125 2023-11-27 20:43:04,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3227960.0, ans=0.0 2023-11-27 20:43:10,574 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484200 2023-11-27 20:43:18,039 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3250, loss[loss=0.07181, simple_loss=0.1086, pruned_loss=0.007491, audio_tagging_loss=0.01001, over 15986.00 frames. ], tot_loss[loss=0.06751, simple_loss=0.09172, pruned_loss=0.01277, audio_tagging_loss=0.00888, over 3045815.22 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:43:22,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3228026.6666666665, ans=0.0 2023-11-27 20:43:30,939 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2023-11-27 20:43:31,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3228093.3333333335, ans=0.1 2023-11-27 20:43:44,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3228160.0, ans=0.125 2023-11-27 20:43:47,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3228160.0, ans=0.125 2023-11-27 20:43:47,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3228160.0, ans=0.2 2023-11-27 20:44:00,694 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 8.543e+01 9.307e+01 1.025e+02 1.528e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-27 20:44:08,499 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484250 2023-11-27 20:44:08,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=22.5 2023-11-27 20:44:14,963 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3300, loss[loss=0.0537, simple_loss=0.07237, pruned_loss=0.007875, audio_tagging_loss=0.009639, over 15637.00 frames. ], tot_loss[loss=0.06721, simple_loss=0.0913, pruned_loss=0.01261, audio_tagging_loss=0.008948, over 3050643.52 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:44:33,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3228426.6666666665, ans=0.125 2023-11-27 20:44:34,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3228426.6666666665, ans=0.1 2023-11-27 20:44:54,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3228560.0, ans=0.125 2023-11-27 20:44:56,386 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=12.0 2023-11-27 20:45:06,259 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484300 2023-11-27 20:45:08,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3228626.6666666665, ans=0.1 2023-11-27 20:45:12,790 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3350, loss[loss=0.09745, simple_loss=0.1345, pruned_loss=0.0214, audio_tagging_loss=0.008789, over 16049.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09101, pruned_loss=0.01253, audio_tagging_loss=0.008882, over 3044201.06 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:45:31,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3228760.0, ans=0.015 2023-11-27 20:45:43,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3228826.6666666665, ans=0.1 2023-11-27 20:45:44,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3228826.6666666665, ans=0.125 2023-11-27 20:45:50,716 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2023-11-27 20:45:52,551 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:45:52,811 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-11-27 20:45:55,618 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 8.657e+01 9.246e+01 1.011e+02 1.317e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-27 20:46:02,344 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484350 2023-11-27 20:46:10,126 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3400, loss[loss=0.08933, simple_loss=0.1157, pruned_loss=0.02175, audio_tagging_loss=0.009734, over 15575.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09093, pruned_loss=0.01253, audio_tagging_loss=0.008831, over 3047861.43 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:46:22,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3229093.3333333335, ans=0.0 2023-11-27 20:46:45,822 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:47:00,255 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484400 2023-11-27 20:47:07,146 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3450, loss[loss=0.05578, simple_loss=0.08066, pruned_loss=0.008275, audio_tagging_loss=0.007172, over 14579.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.0905, pruned_loss=0.01241, audio_tagging_loss=0.008793, over 3050559.76 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:47:08,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3229360.0, ans=0.0 2023-11-27 20:47:12,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3229360.0, ans=0.0 2023-11-27 20:47:16,135 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.15 vs. limit=10.0 2023-11-27 20:47:26,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3229426.6666666665, ans=0.1 2023-11-27 20:47:50,694 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.393e+01 9.066e+01 9.893e+01 1.377e+02, threshold=1.813e+02, percent-clipped=0.0 2023-11-27 20:47:57,370 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484450 2023-11-27 20:48:04,357 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3500, loss[loss=0.08195, simple_loss=0.1162, pruned_loss=0.01607, audio_tagging_loss=0.007764, over 15042.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09034, pruned_loss=0.0125, audio_tagging_loss=0.008785, over 3047982.61 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:48:29,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3229826.6666666665, ans=0.95 2023-11-27 20:48:35,200 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:48:44,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3229893.3333333335, ans=0.2 2023-11-27 20:48:50,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3229960.0, ans=0.1 2023-11-27 20:48:54,654 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484500 2023-11-27 20:48:58,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=3229960.0, ans=12.0 2023-11-27 20:49:01,726 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3550, loss[loss=0.06389, simple_loss=0.08982, pruned_loss=0.007611, audio_tagging_loss=0.01137, over 14818.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08937, pruned_loss=0.01235, audio_tagging_loss=0.008769, over 3046762.76 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:49:10,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3230026.6666666665, ans=0.1 2023-11-27 20:49:11,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3230026.6666666665, ans=0.125 2023-11-27 20:49:41,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3230226.6666666665, ans=0.125 2023-11-27 20:49:45,462 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.934e+01 8.597e+01 9.146e+01 1.002e+02 1.167e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-27 20:49:49,464 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.71 vs. limit=12.0 2023-11-27 20:49:52,867 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484550 2023-11-27 20:49:59,414 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3600, loss[loss=0.06223, simple_loss=0.08126, pruned_loss=0.01044, audio_tagging_loss=0.01116, over 15056.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08952, pruned_loss=0.01232, audio_tagging_loss=0.008714, over 3041509.97 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:50:13,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3230426.6666666665, ans=0.1 2023-11-27 20:50:29,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3230493.3333333335, ans=0.125 2023-11-27 20:50:31,712 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.35 vs. limit=15.0 2023-11-27 20:50:45,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3230626.6666666665, ans=0.125 2023-11-27 20:50:49,717 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484600 2023-11-27 20:50:54,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3230626.6666666665, ans=0.125 2023-11-27 20:50:57,222 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3650, loss[loss=0.06991, simple_loss=0.099, pruned_loss=0.01237, audio_tagging_loss=0.008033, over 15716.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08959, pruned_loss=0.01233, audio_tagging_loss=0.008687, over 3052075.84 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:50:57,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3230693.3333333335, ans=0.0 2023-11-27 20:50:57,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3230693.3333333335, ans=0.0 2023-11-27 20:51:04,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3230693.3333333335, ans=0.125 2023-11-27 20:51:12,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3230760.0, ans=0.2 2023-11-27 20:51:18,123 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2023-11-27 20:51:18,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3230826.6666666665, ans=0.0 2023-11-27 20:51:41,949 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.915e+01 9.732e+01 1.035e+02 1.318e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-27 20:51:46,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3230960.0, ans=0.125 2023-11-27 20:51:47,428 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484650 2023-11-27 20:51:53,095 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:51:53,885 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3700, loss[loss=0.07477, simple_loss=0.1035, pruned_loss=0.01229, audio_tagging_loss=0.01074, over 14756.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09004, pruned_loss=0.01247, audio_tagging_loss=0.008662, over 3052911.86 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:52:05,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3231093.3333333335, ans=0.1 2023-11-27 20:52:10,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3231093.3333333335, ans=0.125 2023-11-27 20:52:29,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3231226.6666666665, ans=0.0 2023-11-27 20:52:45,158 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484700 2023-11-27 20:52:51,691 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3750, loss[loss=0.05455, simple_loss=0.07588, pruned_loss=0.007604, audio_tagging_loss=0.009003, over 14876.00 frames. ], tot_loss[loss=0.06687, simple_loss=0.09082, pruned_loss=0.01283, audio_tagging_loss=0.00863, over 3057310.63 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:53:14,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3231493.3333333335, ans=0.5 2023-11-27 20:53:26,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3231560.0, ans=0.1 2023-11-27 20:53:27,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3231560.0, ans=0.0 2023-11-27 20:53:33,162 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:53:36,405 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.777e+01 9.406e+01 1.027e+02 1.290e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-27 20:53:42,600 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484750 2023-11-27 20:53:47,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3231626.6666666665, ans=0.125 2023-11-27 20:53:49,564 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3800, loss[loss=0.07486, simple_loss=0.1055, pruned_loss=0.01458, audio_tagging_loss=0.007523, over 15860.00 frames. ], tot_loss[loss=0.06756, simple_loss=0.09194, pruned_loss=0.01296, audio_tagging_loss=0.008629, over 3052307.27 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:53:49,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3231693.3333333335, ans=0.125 2023-11-27 20:53:54,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3231693.3333333335, ans=0.125 2023-11-27 20:54:16,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3231826.6666666665, ans=0.025 2023-11-27 20:54:30,159 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.45 vs. limit=6.0 2023-11-27 20:54:36,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3231960.0, ans=0.0 2023-11-27 20:54:39,470 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484800 2023-11-27 20:54:46,232 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3850, loss[loss=0.06284, simple_loss=0.08141, pruned_loss=0.01538, audio_tagging_loss=0.006758, over 14747.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09178, pruned_loss=0.01279, audio_tagging_loss=0.00865, over 3050468.30 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:54:55,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3232026.6666666665, ans=0.0 2023-11-27 20:55:10,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3232160.0, ans=0.125 2023-11-27 20:55:17,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3232160.0, ans=0.0 2023-11-27 20:55:31,028 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.176e+01 8.660e+01 9.334e+01 1.001e+02 1.347e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 20:55:37,210 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484850 2023-11-27 20:55:37,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3232293.3333333335, ans=0.125 2023-11-27 20:55:39,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3232293.3333333335, ans=0.0 2023-11-27 20:55:43,722 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3900, loss[loss=0.06728, simple_loss=0.09217, pruned_loss=0.0143, audio_tagging_loss=0.006897, over 14780.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09171, pruned_loss=0.01273, audio_tagging_loss=0.008708, over 3039926.53 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:55:49,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3232360.0, ans=0.1 2023-11-27 20:55:54,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3232426.6666666665, ans=0.0 2023-11-27 20:56:01,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3232426.6666666665, ans=0.125 2023-11-27 20:56:04,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3232426.6666666665, ans=0.125 2023-11-27 20:56:32,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3232626.6666666665, ans=0.0 2023-11-27 20:56:34,422 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484900 2023-11-27 20:56:37,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3232626.6666666665, ans=0.2 2023-11-27 20:56:42,137 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 3950, loss[loss=0.06795, simple_loss=0.09294, pruned_loss=0.01246, audio_tagging_loss=0.009019, over 15252.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.09196, pruned_loss=0.0127, audio_tagging_loss=0.008701, over 3044398.87 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 20:56:51,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3232693.3333333335, ans=0.125 2023-11-27 20:56:57,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3232760.0, ans=0.0 2023-11-27 20:56:59,634 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=12.0 2023-11-27 20:57:03,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3232760.0, ans=0.0 2023-11-27 20:57:09,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3232826.6666666665, ans=0.2 2023-11-27 20:57:17,307 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 20:57:26,949 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.571e+01 9.462e+01 1.016e+02 1.341e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-27 20:57:32,698 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 484950 2023-11-27 20:57:39,308 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4000, loss[loss=0.06514, simple_loss=0.08137, pruned_loss=0.01206, audio_tagging_loss=0.01239, over 14992.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09071, pruned_loss=0.01259, audio_tagging_loss=0.008971, over 3038479.88 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:57:40,841 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2023-11-27 20:58:04,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3233160.0, ans=0.1 2023-11-27 20:58:07,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3233160.0, ans=0.125 2023-11-27 20:58:12,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3233226.6666666665, ans=0.125 2023-11-27 20:58:29,434 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485000 2023-11-27 20:58:36,268 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4050, loss[loss=0.06927, simple_loss=0.09864, pruned_loss=0.01504, audio_tagging_loss=0.004905, over 14934.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09061, pruned_loss=0.01255, audio_tagging_loss=0.00894, over 3047990.78 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:58:40,673 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 20:58:50,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3233426.6666666665, ans=0.125 2023-11-27 20:59:00,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3233493.3333333335, ans=0.04949747468305833 2023-11-27 20:59:15,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3233560.0, ans=0.125 2023-11-27 20:59:20,314 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.753e+01 8.859e+01 9.526e+01 1.036e+02 1.251e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-27 20:59:25,732 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485050 2023-11-27 20:59:32,392 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4100, loss[loss=0.05889, simple_loss=0.08217, pruned_loss=0.009955, audio_tagging_loss=0.007844, over 15790.00 frames. ], tot_loss[loss=0.06741, simple_loss=0.09157, pruned_loss=0.01271, audio_tagging_loss=0.008921, over 3047306.21 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 20:59:34,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3233693.3333333335, ans=0.125 2023-11-27 20:59:57,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3233826.6666666665, ans=0.0 2023-11-27 21:00:11,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3233893.3333333335, ans=0.125 2023-11-27 21:00:14,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=22.5 2023-11-27 21:00:23,213 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485100 2023-11-27 21:00:30,319 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4150, loss[loss=0.05731, simple_loss=0.07741, pruned_loss=0.009348, audio_tagging_loss=0.009257, over 14870.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09128, pruned_loss=0.01265, audio_tagging_loss=0.008825, over 3050418.90 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 21:00:39,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3234026.6666666665, ans=0.125 2023-11-27 21:00:46,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3234093.3333333335, ans=0.125 2023-11-27 21:00:47,482 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:00:48,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3234093.3333333335, ans=0.125 2023-11-27 21:01:12,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3234226.6666666665, ans=0.125 2023-11-27 21:01:13,444 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 21:01:14,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3234293.3333333335, ans=0.025 2023-11-27 21:01:15,580 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.836e+01 8.501e+01 9.252e+01 1.004e+02 1.216e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 21:01:20,617 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485150 2023-11-27 21:01:27,037 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4200, loss[loss=0.05862, simple_loss=0.0746, pruned_loss=0.01036, audio_tagging_loss=0.01096, over 15730.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09061, pruned_loss=0.01257, audio_tagging_loss=0.008761, over 3048719.91 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:01:27,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3234360.0, ans=0.2 2023-11-27 21:01:30,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3234360.0, ans=0.125 2023-11-27 21:01:42,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3234426.6666666665, ans=0.125 2023-11-27 21:01:43,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3234426.6666666665, ans=0.125 2023-11-27 21:01:51,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3234493.3333333335, ans=0.125 2023-11-27 21:01:52,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3234493.3333333335, ans=0.125 2023-11-27 21:02:06,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3234560.0, ans=0.125 2023-11-27 21:02:14,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3234626.6666666665, ans=0.1 2023-11-27 21:02:17,440 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485200 2023-11-27 21:02:24,401 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4250, loss[loss=0.04467, simple_loss=0.05469, pruned_loss=0.005407, audio_tagging_loss=0.01192, over 16413.00 frames. ], tot_loss[loss=0.06794, simple_loss=0.09256, pruned_loss=0.01311, audio_tagging_loss=0.008551, over 3053683.98 frames. ], batch size: 64, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:02:38,749 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=12.0 2023-11-27 21:02:46,184 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.14 vs. limit=22.5 2023-11-27 21:02:48,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3234826.6666666665, ans=0.2 2023-11-27 21:02:53,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3234826.6666666665, ans=0.0 2023-11-27 21:03:10,366 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 9.065e+01 9.544e+01 1.011e+02 1.214e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-27 21:03:15,312 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485250 2023-11-27 21:03:19,228 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.31 vs. limit=15.0 2023-11-27 21:03:21,921 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4300, loss[loss=0.06269, simple_loss=0.07958, pruned_loss=0.01263, audio_tagging_loss=0.01027, over 16170.00 frames. ], tot_loss[loss=0.068, simple_loss=0.09257, pruned_loss=0.01317, audio_tagging_loss=0.008542, over 3052442.07 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:03:25,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3235026.6666666665, ans=0.125 2023-11-27 21:04:06,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3235293.3333333335, ans=0.125 2023-11-27 21:04:12,583 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485300 2023-11-27 21:04:16,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3235293.3333333335, ans=0.1 2023-11-27 21:04:19,751 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4350, loss[loss=0.0698, simple_loss=0.1014, pruned_loss=0.0118, audio_tagging_loss=0.007292, over 15226.00 frames. ], tot_loss[loss=0.0681, simple_loss=0.0926, pruned_loss=0.01319, audio_tagging_loss=0.008605, over 3049408.06 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 21:05:04,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3235626.6666666665, ans=0.125 2023-11-27 21:05:06,671 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 9.007e+01 9.649e+01 1.037e+02 1.293e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-27 21:05:07,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.13 vs. limit=15.0 2023-11-27 21:05:09,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3235626.6666666665, ans=0.125 2023-11-27 21:05:10,082 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485350 2023-11-27 21:05:10,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3235626.6666666665, ans=0.0 2023-11-27 21:05:16,685 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4400, loss[loss=0.08702, simple_loss=0.1221, pruned_loss=0.01843, audio_tagging_loss=0.007551, over 15286.00 frames. ], tot_loss[loss=0.06754, simple_loss=0.09155, pruned_loss=0.01306, audio_tagging_loss=0.008709, over 3046387.72 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:05:21,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3235693.3333333335, ans=0.125 2023-11-27 21:05:37,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3235760.0, ans=0.0 2023-11-27 21:05:55,602 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2023-11-27 21:05:57,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3235893.3333333335, ans=0.5 2023-11-27 21:06:06,495 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485400 2023-11-27 21:06:13,252 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4450, loss[loss=0.03434, simple_loss=0.04294, pruned_loss=0.002838, audio_tagging_loss=0.01003, over 14717.00 frames. ], tot_loss[loss=0.06743, simple_loss=0.09146, pruned_loss=0.01301, audio_tagging_loss=0.00869, over 3050204.56 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:06:14,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3236026.6666666665, ans=0.125 2023-11-27 21:06:14,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3236026.6666666665, ans=0.09899494936611666 2023-11-27 21:06:14,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3236026.6666666665, ans=0.09899494936611666 2023-11-27 21:06:33,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3236093.3333333335, ans=0.0 2023-11-27 21:06:38,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3236160.0, ans=0.125 2023-11-27 21:06:41,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3236160.0, ans=0.125 2023-11-27 21:07:00,275 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.283e+01 8.705e+01 9.463e+01 1.011e+02 1.177e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-27 21:07:03,722 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485450 2023-11-27 21:07:10,141 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:07:11,610 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4500, loss[loss=0.03929, simple_loss=0.05107, pruned_loss=0.005552, audio_tagging_loss=0.00821, over 15126.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09026, pruned_loss=0.01277, audio_tagging_loss=0.008768, over 3044623.44 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:07:28,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3236426.6666666665, ans=0.2 2023-11-27 21:07:48,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3236560.0, ans=0.125 2023-11-27 21:07:49,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3236560.0, ans=0.07 2023-11-27 21:07:59,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3236626.6666666665, ans=0.125 2023-11-27 21:08:01,646 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485500 2023-11-27 21:08:08,299 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4550, loss[loss=0.05248, simple_loss=0.07925, pruned_loss=0.005552, audio_tagging_loss=0.0073, over 13749.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08982, pruned_loss=0.01253, audio_tagging_loss=0.00874, over 3037358.96 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:08:10,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3236693.3333333335, ans=0.125 2023-11-27 21:08:10,692 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:08:10,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3236693.3333333335, ans=0.0 2023-11-27 21:08:12,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3236693.3333333335, ans=0.125 2023-11-27 21:08:12,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3236693.3333333335, ans=0.1 2023-11-27 21:08:19,107 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.19 vs. limit=10.0 2023-11-27 21:08:29,243 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.37 vs. limit=10.0 2023-11-27 21:08:33,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3236826.6666666665, ans=0.04949747468305833 2023-11-27 21:08:52,652 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 21:08:54,911 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.586e+01 8.582e+01 9.237e+01 9.730e+01 1.372e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-27 21:08:58,308 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485550 2023-11-27 21:08:58,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3236960.0, ans=0.0 2023-11-27 21:09:05,343 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4600, loss[loss=0.07841, simple_loss=0.103, pruned_loss=0.01861, audio_tagging_loss=0.008318, over 15419.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08945, pruned_loss=0.01251, audio_tagging_loss=0.008746, over 3041972.65 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:09:42,862 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:09:55,233 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485600 2023-11-27 21:09:56,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3237293.3333333335, ans=0.125 2023-11-27 21:10:02,109 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4650, loss[loss=0.06941, simple_loss=0.09143, pruned_loss=0.01533, audio_tagging_loss=0.008361, over 14848.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09025, pruned_loss=0.01255, audio_tagging_loss=0.008769, over 3044719.73 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:10:11,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3237360.0, ans=0.0 2023-11-27 21:10:17,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3237426.6666666665, ans=0.125 2023-11-27 21:10:23,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3237426.6666666665, ans=0.0 2023-11-27 21:10:42,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3237560.0, ans=0.125 2023-11-27 21:10:43,750 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2023-11-27 21:10:49,290 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 8.818e+01 9.409e+01 9.994e+01 1.817e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-27 21:10:52,662 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485650 2023-11-27 21:10:59,161 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2023-11-27 21:10:59,621 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4700, loss[loss=0.05867, simple_loss=0.0774, pruned_loss=0.009782, audio_tagging_loss=0.01019, over 14832.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09053, pruned_loss=0.01243, audio_tagging_loss=0.008795, over 3055531.36 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:11:17,924 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:11:23,519 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:11:26,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3237826.6666666665, ans=0.0 2023-11-27 21:11:37,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3237893.3333333335, ans=0.125 2023-11-27 21:11:39,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3237893.3333333335, ans=0.2 2023-11-27 21:11:49,937 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485700 2023-11-27 21:11:52,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3237960.0, ans=0.125 2023-11-27 21:11:56,975 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4750, loss[loss=0.08251, simple_loss=0.1034, pruned_loss=0.02149, audio_tagging_loss=0.009335, over 13703.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09016, pruned_loss=0.0124, audio_tagging_loss=0.008863, over 3062072.58 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:12:01,829 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.43 vs. limit=15.0 2023-11-27 21:12:14,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3238093.3333333335, ans=0.2 2023-11-27 21:12:22,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3238160.0, ans=0.0 2023-11-27 21:12:38,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3238226.6666666665, ans=0.125 2023-11-27 21:12:43,625 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.636e+01 8.911e+01 9.575e+01 1.033e+02 1.448e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 21:12:46,967 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485750 2023-11-27 21:12:53,378 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4800, loss[loss=0.06215, simple_loss=0.08567, pruned_loss=0.009954, audio_tagging_loss=0.009359, over 16275.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09, pruned_loss=0.01242, audio_tagging_loss=0.008963, over 3057360.22 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 21:12:53,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3238360.0, ans=0.0 2023-11-27 21:12:54,016 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.25 vs. limit=22.5 2023-11-27 21:13:08,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3238426.6666666665, ans=0.125 2023-11-27 21:13:11,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3238426.6666666665, ans=0.0 2023-11-27 21:13:21,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3238493.3333333335, ans=0.0 2023-11-27 21:13:23,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3238493.3333333335, ans=0.0 2023-11-27 21:13:23,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3238493.3333333335, ans=0.0 2023-11-27 21:13:44,063 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485800 2023-11-27 21:13:50,916 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4850, loss[loss=0.07888, simple_loss=0.1067, pruned_loss=0.01652, audio_tagging_loss=0.009017, over 14844.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08936, pruned_loss=0.01233, audio_tagging_loss=0.0091, over 3056677.10 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:13:52,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3238693.3333333335, ans=0.125 2023-11-27 21:13:53,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3238693.3333333335, ans=0.125 2023-11-27 21:14:36,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3238960.0, ans=0.125 2023-11-27 21:14:39,407 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.86 vs. limit=10.0 2023-11-27 21:14:39,767 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.239e+01 8.891e+01 9.390e+01 1.010e+02 1.385e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-27 21:14:43,041 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485850 2023-11-27 21:14:49,863 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4900, loss[loss=0.07878, simple_loss=0.1031, pruned_loss=0.01619, audio_tagging_loss=0.01104, over 15349.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08985, pruned_loss=0.01238, audio_tagging_loss=0.00896, over 3055726.07 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:15:39,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3239293.3333333335, ans=0.125 2023-11-27 21:15:45,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3239293.3333333335, ans=0.2 2023-11-27 21:15:47,429 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485900 2023-11-27 21:15:47,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3239293.3333333335, ans=0.125 2023-11-27 21:15:49,586 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.39 vs. limit=15.0 2023-11-27 21:15:57,733 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 4950, loss[loss=0.05207, simple_loss=0.07158, pruned_loss=0.005685, audio_tagging_loss=0.01059, over 15921.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08998, pruned_loss=0.0124, audio_tagging_loss=0.008761, over 3057558.56 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:16:13,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3239360.0, ans=0.0 2023-11-27 21:16:25,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3239426.6666666665, ans=0.0 2023-11-27 21:16:33,150 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2023-11-27 21:17:02,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3239560.0, ans=0.2 2023-11-27 21:17:30,901 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.683e+01 9.334e+01 9.956e+01 1.191e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 21:17:36,368 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 485950 2023-11-27 21:17:48,889 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5000, loss[loss=0.07857, simple_loss=0.1049, pruned_loss=0.01909, audio_tagging_loss=0.007055, over 15597.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09014, pruned_loss=0.01256, audio_tagging_loss=0.008605, over 3047276.02 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:18:11,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3239760.0, ans=0.2 2023-11-27 21:18:18,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3239760.0, ans=0.125 2023-11-27 21:18:30,629 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:18:39,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3239826.6666666665, ans=0.125 2023-11-27 21:18:43,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=15.0 2023-11-27 21:18:45,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3239893.3333333335, ans=0.0 2023-11-27 21:18:53,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3239893.3333333335, ans=0.125 2023-11-27 21:19:11,054 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486000 2023-11-27 21:19:22,870 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5050, loss[loss=0.05338, simple_loss=0.06527, pruned_loss=0.009739, audio_tagging_loss=0.01101, over 14936.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09011, pruned_loss=0.01249, audio_tagging_loss=0.008583, over 3044108.58 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:19:45,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3240093.3333333335, ans=0.125 2023-11-27 21:22:00,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3240293.3333333335, ans=0.125 2023-11-27 21:22:11,288 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.444e+01 8.669e+01 9.381e+01 9.908e+01 1.305e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-27 21:22:22,528 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486050 2023-11-27 21:22:53,548 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5100, loss[loss=0.08286, simple_loss=0.1114, pruned_loss=0.01824, audio_tagging_loss=0.008926, over 14127.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08987, pruned_loss=0.01252, audio_tagging_loss=0.008592, over 3045664.73 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:23:52,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3240426.6666666665, ans=0.0 2023-11-27 21:23:52,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3240426.6666666665, ans=0.1 2023-11-27 21:24:08,829 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.63 vs. limit=10.0 2023-11-27 21:24:11,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3240426.6666666665, ans=0.0 2023-11-27 21:25:16,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3240560.0, ans=0.1 2023-11-27 21:26:14,945 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486100 2023-11-27 21:26:47,656 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5150, loss[loss=0.04098, simple_loss=0.05154, pruned_loss=0.005201, audio_tagging_loss=0.01001, over 15781.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08955, pruned_loss=0.01249, audio_tagging_loss=0.008644, over 3046298.30 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:26:58,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3240693.3333333335, ans=0.125 2023-11-27 21:28:00,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3240760.0, ans=0.0 2023-11-27 21:28:42,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3240826.6666666665, ans=0.125 2023-11-27 21:28:49,031 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2023-11-27 21:29:07,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3240893.3333333335, ans=0.125 2023-11-27 21:29:37,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3240893.3333333335, ans=0.125 2023-11-27 21:29:50,504 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:29:53,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3240960.0, ans=0.125 2023-11-27 21:29:58,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.929e+01 8.896e+01 9.394e+01 1.012e+02 1.340e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 21:30:02,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3240960.0, ans=0.125 2023-11-27 21:30:05,434 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486150 2023-11-27 21:30:28,628 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5200, loss[loss=0.06172, simple_loss=0.09178, pruned_loss=0.00915, audio_tagging_loss=0.006677, over 15972.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.0897, pruned_loss=0.01258, audio_tagging_loss=0.008717, over 3046675.90 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 21:30:54,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3241026.6666666665, ans=0.125 2023-11-27 21:31:06,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3241093.3333333335, ans=0.05 2023-11-27 21:31:27,835 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.86 vs. limit=12.0 2023-11-27 21:31:39,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3241093.3333333335, ans=0.07 2023-11-27 21:31:42,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3241160.0, ans=0.07 2023-11-27 21:33:19,213 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486200 2023-11-27 21:33:45,356 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5250, loss[loss=0.05019, simple_loss=0.06873, pruned_loss=0.008707, audio_tagging_loss=0.00712, over 15542.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08935, pruned_loss=0.01248, audio_tagging_loss=0.008623, over 3046165.51 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:33:48,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3241360.0, ans=0.0 2023-11-27 21:33:48,555 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:34:06,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3241360.0, ans=0.2 2023-11-27 21:35:01,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3241493.3333333335, ans=0.0 2023-11-27 21:35:41,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3241560.0, ans=0.125 2023-11-27 21:36:01,173 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-27 21:36:09,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.968e+01 8.648e+01 9.300e+01 9.886e+01 1.149e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 21:36:11,456 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486250 2023-11-27 21:36:31,511 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5300, loss[loss=0.08228, simple_loss=0.1169, pruned_loss=0.01771, audio_tagging_loss=0.00612, over 15915.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08928, pruned_loss=0.01237, audio_tagging_loss=0.008632, over 3051439.41 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:36:34,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3241693.3333333335, ans=0.125 2023-11-27 21:36:43,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3241693.3333333335, ans=0.0 2023-11-27 21:36:43,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=15.0 2023-11-27 21:38:10,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3241893.3333333335, ans=0.1 2023-11-27 21:38:13,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3241893.3333333335, ans=0.125 2023-11-27 21:38:17,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3241893.3333333335, ans=0.0 2023-11-27 21:38:40,080 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486300 2023-11-27 21:38:57,800 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5350, loss[loss=0.05773, simple_loss=0.07222, pruned_loss=0.0115, audio_tagging_loss=0.01011, over 15244.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09015, pruned_loss=0.01249, audio_tagging_loss=0.008613, over 3056547.00 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:38:58,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3242026.6666666665, ans=0.0 2023-11-27 21:39:42,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3242093.3333333335, ans=0.125 2023-11-27 21:40:02,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3242160.0, ans=0.125 2023-11-27 21:40:29,177 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:40:57,040 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.785e+01 8.875e+01 9.269e+01 1.018e+02 1.292e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-27 21:40:59,671 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486350 2023-11-27 21:41:05,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3242293.3333333335, ans=0.1 2023-11-27 21:41:05,775 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2023-11-27 21:41:13,669 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5400, loss[loss=0.05582, simple_loss=0.0721, pruned_loss=0.01052, audio_tagging_loss=0.009247, over 15075.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09057, pruned_loss=0.0125, audio_tagging_loss=0.008609, over 3055844.57 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:42:10,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3242493.3333333335, ans=0.2 2023-11-27 21:42:20,397 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2023-11-27 21:43:16,232 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486400 2023-11-27 21:43:16,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3242626.6666666665, ans=0.0 2023-11-27 21:43:34,419 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5450, loss[loss=0.06461, simple_loss=0.07896, pruned_loss=0.01686, audio_tagging_loss=0.008273, over 14461.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09081, pruned_loss=0.0127, audio_tagging_loss=0.008573, over 3048659.04 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:43:49,861 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.53 vs. limit=15.0 2023-11-27 21:44:21,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3242760.0, ans=0.1 2023-11-27 21:44:23,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3242826.6666666665, ans=0.125 2023-11-27 21:44:35,544 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 21:45:12,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3242893.3333333335, ans=0.0 2023-11-27 21:45:15,041 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2023-11-27 21:45:25,861 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.703e+01 9.302e+01 9.943e+01 1.219e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-27 21:45:27,672 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486450 2023-11-27 21:45:40,803 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5500, loss[loss=0.06712, simple_loss=0.08452, pruned_loss=0.01572, audio_tagging_loss=0.009134, over 15853.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09003, pruned_loss=0.01255, audio_tagging_loss=0.008679, over 3044059.97 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:46:57,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3243160.0, ans=0.2 2023-11-27 21:47:17,068 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.31 vs. limit=22.5 2023-11-27 21:47:36,449 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486500 2023-11-27 21:47:44,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3243293.3333333335, ans=0.035 2023-11-27 21:47:53,074 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5550, loss[loss=0.09161, simple_loss=0.1323, pruned_loss=0.01801, audio_tagging_loss=0.007463, over 15616.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09011, pruned_loss=0.01256, audio_tagging_loss=0.008763, over 3043289.91 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 21:48:15,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3243426.6666666665, ans=15.0 2023-11-27 21:48:18,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3243426.6666666665, ans=0.1 2023-11-27 21:48:19,499 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.57 vs. limit=22.5 2023-11-27 21:48:41,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3243493.3333333335, ans=0.125 2023-11-27 21:49:13,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3243560.0, ans=0.0 2023-11-27 21:49:22,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3243560.0, ans=0.125 2023-11-27 21:49:33,588 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.15 vs. limit=15.0 2023-11-27 21:49:35,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3243626.6666666665, ans=0.0 2023-11-27 21:49:40,792 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.844e+01 9.360e+01 9.886e+01 1.640e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-27 21:49:41,082 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486550 2023-11-27 21:49:53,579 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5600, loss[loss=0.0758, simple_loss=0.1058, pruned_loss=0.01431, audio_tagging_loss=0.008575, over 14971.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09054, pruned_loss=0.01255, audio_tagging_loss=0.008872, over 3042507.01 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:50:03,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3243693.3333333335, ans=10.0 2023-11-27 21:51:25,642 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 21:51:43,026 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486600 2023-11-27 21:51:52,102 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2023-11-27 21:51:58,835 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5650, loss[loss=0.04501, simple_loss=0.05658, pruned_loss=0.004689, audio_tagging_loss=0.01203, over 14871.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08972, pruned_loss=0.01238, audio_tagging_loss=0.009014, over 3047214.56 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:52:20,003 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2023-11-27 21:52:21,606 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=12.0 2023-11-27 21:53:05,389 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2023-11-27 21:53:32,324 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.720e+01 9.211e+01 9.882e+01 1.405e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-27 21:53:32,578 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486650 2023-11-27 21:53:42,052 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5700, loss[loss=0.06072, simple_loss=0.08084, pruned_loss=0.01158, audio_tagging_loss=0.008716, over 14776.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08911, pruned_loss=0.01223, audio_tagging_loss=0.009046, over 3045653.71 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:54:03,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3244426.6666666665, ans=0.2 2023-11-27 21:54:05,837 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2023-11-27 21:54:08,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3244426.6666666665, ans=0.0 2023-11-27 21:54:21,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3244426.6666666665, ans=0.0 2023-11-27 21:54:31,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3244493.3333333335, ans=0.125 2023-11-27 21:54:38,199 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.88 vs. limit=12.0 2023-11-27 21:55:14,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3244626.6666666665, ans=0.05 2023-11-27 21:55:16,657 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486700 2023-11-27 21:55:28,983 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5750, loss[loss=0.08433, simple_loss=0.1123, pruned_loss=0.02007, audio_tagging_loss=0.008097, over 15206.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08839, pruned_loss=0.01209, audio_tagging_loss=0.00895, over 3046750.81 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:55:29,761 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.42 vs. limit=10.0 2023-11-27 21:55:31,862 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=12.0 2023-11-27 21:55:39,485 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=12.0 2023-11-27 21:55:43,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3244693.3333333335, ans=0.125 2023-11-27 21:56:34,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3244893.3333333335, ans=0.0 2023-11-27 21:56:52,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3244960.0, ans=0.125 2023-11-27 21:56:55,246 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.899e+01 8.667e+01 9.281e+01 1.002e+02 1.374e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-27 21:56:55,410 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486750 2023-11-27 21:56:59,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3244960.0, ans=0.125 2023-11-27 21:57:08,455 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5800, loss[loss=0.06103, simple_loss=0.09129, pruned_loss=0.01068, audio_tagging_loss=0.004705, over 15604.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08832, pruned_loss=0.01212, audio_tagging_loss=0.008865, over 3040683.70 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:57:45,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3245160.0, ans=0.125 2023-11-27 21:58:27,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3245293.3333333335, ans=0.0 2023-11-27 21:58:30,887 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486800 2023-11-27 21:58:39,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3245293.3333333335, ans=0.1 2023-11-27 21:58:42,499 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5850, loss[loss=0.0687, simple_loss=0.08194, pruned_loss=0.01655, audio_tagging_loss=0.01119, over 15101.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.0877, pruned_loss=0.01212, audio_tagging_loss=0.0089, over 3037984.35 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 21:58:49,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.83 vs. limit=10.0 2023-11-27 21:58:49,765 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=15.0 2023-11-27 21:59:02,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3245426.6666666665, ans=0.2 2023-11-27 21:59:05,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3245426.6666666665, ans=0.125 2023-11-27 21:59:24,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3245493.3333333335, ans=0.09899494936611666 2023-11-27 22:00:03,985 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.902e+01 9.558e+01 1.050e+02 1.471e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-27 22:00:04,110 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486850 2023-11-27 22:00:13,963 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5900, loss[loss=0.0846, simple_loss=0.1196, pruned_loss=0.01521, audio_tagging_loss=0.00959, over 15216.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08768, pruned_loss=0.01221, audio_tagging_loss=0.008918, over 3035847.82 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:00:14,573 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2023-11-27 22:00:30,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3245760.0, ans=0.125 2023-11-27 22:00:56,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3245826.6666666665, ans=0.125 2023-11-27 22:01:14,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3245893.3333333335, ans=0.125 2023-11-27 22:01:24,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3245960.0, ans=10.0 2023-11-27 22:01:27,177 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486900 2023-11-27 22:01:36,037 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 5950, loss[loss=0.06349, simple_loss=0.09501, pruned_loss=0.009895, audio_tagging_loss=0.00609, over 14936.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.0884, pruned_loss=0.01224, audio_tagging_loss=0.008852, over 3037776.18 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:02:23,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3246226.6666666665, ans=0.2 2023-11-27 22:02:24,255 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2023-11-27 22:02:29,241 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2023-11-27 22:02:43,390 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.049e+01 8.680e+01 9.187e+01 9.808e+01 1.354e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-27 22:02:43,499 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 486950 2023-11-27 22:02:53,404 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6000, loss[loss=0.05529, simple_loss=0.07087, pruned_loss=0.01114, audio_tagging_loss=0.008723, over 15331.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08934, pruned_loss=0.01245, audio_tagging_loss=0.008746, over 3037933.57 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:02:53,405 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-27 22:03:35,229 INFO [train_asr.py:1267] (2/4) Epoch 41, validation: loss=0.05724, simple_loss=0.05055, pruned_loss=0.005142, audio_tagging_loss=0.02682, over 4681554.00 frames. 2023-11-27 22:03:35,230 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-27 22:03:42,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3246360.0, ans=0.125 2023-11-27 22:04:05,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3246493.3333333335, ans=0.0 2023-11-27 22:04:30,187 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 22:04:38,866 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487000 2023-11-27 22:04:45,285 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=12.0 2023-11-27 22:04:47,099 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6050, loss[loss=0.06494, simple_loss=0.09078, pruned_loss=0.009202, audio_tagging_loss=0.01034, over 14469.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08883, pruned_loss=0.01238, audio_tagging_loss=0.008705, over 3045891.13 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:04:47,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3246693.3333333335, ans=0.125 2023-11-27 22:04:55,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3246693.3333333335, ans=0.125 2023-11-27 22:04:57,086 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.61 vs. limit=22.5 2023-11-27 22:05:01,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3246760.0, ans=0.0 2023-11-27 22:05:11,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3246760.0, ans=0.125 2023-11-27 22:05:21,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3246826.6666666665, ans=0.0 2023-11-27 22:05:27,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3246893.3333333335, ans=0.1 2023-11-27 22:05:34,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3246893.3333333335, ans=0.0 2023-11-27 22:05:44,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3246960.0, ans=0.1 2023-11-27 22:05:47,365 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487050 2023-11-27 22:05:48,473 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.706e+01 9.274e+01 9.905e+01 1.388e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 22:05:53,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3246960.0, ans=0.2 2023-11-27 22:05:56,259 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6100, loss[loss=0.06584, simple_loss=0.08626, pruned_loss=0.0153, audio_tagging_loss=0.00741, over 15183.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08917, pruned_loss=0.01243, audio_tagging_loss=0.008728, over 3048032.98 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:05:58,324 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=15.0 2023-11-27 22:06:07,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3247026.6666666665, ans=0.125 2023-11-27 22:06:11,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3247093.3333333335, ans=0.2 2023-11-27 22:06:20,527 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2023-11-27 22:06:40,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3247226.6666666665, ans=0.1 2023-11-27 22:06:44,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3247226.6666666665, ans=0.125 2023-11-27 22:06:55,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3247293.3333333335, ans=0.125 2023-11-27 22:06:56,231 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487100 2023-11-27 22:07:04,068 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6150, loss[loss=0.07674, simple_loss=0.1151, pruned_loss=0.01263, audio_tagging_loss=0.006562, over 15050.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08969, pruned_loss=0.01258, audio_tagging_loss=0.008687, over 3053386.67 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:07:22,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3247426.6666666665, ans=0.1 2023-11-27 22:07:26,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3247426.6666666665, ans=0.125 2023-11-27 22:07:27,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3247426.6666666665, ans=0.0 2023-11-27 22:07:44,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3247560.0, ans=0.0 2023-11-27 22:07:47,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3247560.0, ans=0.125 2023-11-27 22:07:49,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3247560.0, ans=0.2 2023-11-27 22:07:57,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3247626.6666666665, ans=0.2 2023-11-27 22:08:04,370 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487150 2023-11-27 22:08:05,497 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.962e+01 9.637e+01 1.023e+02 1.658e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-27 22:08:08,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3247626.6666666665, ans=0.125 2023-11-27 22:08:11,794 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6200, loss[loss=0.05143, simple_loss=0.05825, pruned_loss=0.01086, audio_tagging_loss=0.01144, over 17270.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.0904, pruned_loss=0.01278, audio_tagging_loss=0.008743, over 3057270.27 frames. ], batch size: 67, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:08:16,142 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2023-11-27 22:08:50,080 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:08:52,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3247893.3333333335, ans=0.1 2023-11-27 22:08:57,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3247893.3333333335, ans=0.125 2023-11-27 22:08:59,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3247893.3333333335, ans=0.07 2023-11-27 22:09:09,914 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487200 2023-11-27 22:09:17,720 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6250, loss[loss=0.09487, simple_loss=0.1331, pruned_loss=0.02069, audio_tagging_loss=0.007652, over 16392.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09103, pruned_loss=0.01276, audio_tagging_loss=0.008771, over 3057015.79 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:09:24,900 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:09:36,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3248093.3333333335, ans=0.2 2023-11-27 22:09:42,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3248093.3333333335, ans=0.125 2023-11-27 22:09:51,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3248160.0, ans=0.0 2023-11-27 22:09:54,091 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.22 vs. limit=22.5 2023-11-27 22:10:15,222 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487250 2023-11-27 22:10:17,264 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.680e+01 9.045e+01 9.912e+01 1.334e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-27 22:10:23,265 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6300, loss[loss=0.06346, simple_loss=0.08956, pruned_loss=0.007816, audio_tagging_loss=0.01086, over 14725.00 frames. ], tot_loss[loss=0.06723, simple_loss=0.09116, pruned_loss=0.01276, audio_tagging_loss=0.008884, over 3051435.71 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:10:26,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3248360.0, ans=0.1 2023-11-27 22:10:57,571 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:11:11,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3248560.0, ans=0.125 2023-11-27 22:11:20,372 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487300 2023-11-27 22:11:27,398 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6350, loss[loss=0.06749, simple_loss=0.09448, pruned_loss=0.01301, audio_tagging_loss=0.007236, over 14497.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08974, pruned_loss=0.01245, audio_tagging_loss=0.009017, over 3048009.25 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:11:38,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3248760.0, ans=0.125 2023-11-27 22:11:51,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3248826.6666666665, ans=0.0 2023-11-27 22:11:52,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3248826.6666666665, ans=0.2 2023-11-27 22:11:52,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3248826.6666666665, ans=0.125 2023-11-27 22:12:09,639 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2023-11-27 22:12:22,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3248960.0, ans=0.125 2023-11-27 22:12:23,509 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487350 2023-11-27 22:12:24,696 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.655e+01 9.162e+01 9.797e+01 1.327e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-27 22:12:31,047 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6400, loss[loss=0.06781, simple_loss=0.09269, pruned_loss=0.01384, audio_tagging_loss=0.007624, over 15602.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08977, pruned_loss=0.01252, audio_tagging_loss=0.009135, over 3045936.82 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:12:33,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3249026.6666666665, ans=0.125 2023-11-27 22:13:12,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3249226.6666666665, ans=0.0 2023-11-27 22:13:12,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3249226.6666666665, ans=0.125 2023-11-27 22:13:23,778 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:13:28,908 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487400 2023-11-27 22:13:29,319 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2023-11-27 22:13:36,647 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6450, loss[loss=0.07563, simple_loss=0.1038, pruned_loss=0.01586, audio_tagging_loss=0.007894, over 14926.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08886, pruned_loss=0.01225, audio_tagging_loss=0.009251, over 3045660.45 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:13:46,955 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2023-11-27 22:13:56,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3249426.6666666665, ans=0.125 2023-11-27 22:14:00,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3249426.6666666665, ans=0.125 2023-11-27 22:14:34,796 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487450 2023-11-27 22:14:37,159 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.688e+01 9.330e+01 9.887e+01 1.158e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-27 22:14:41,419 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.01 vs. limit=15.0 2023-11-27 22:14:42,166 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6500, loss[loss=0.07455, simple_loss=0.1035, pruned_loss=0.01502, audio_tagging_loss=0.007759, over 15892.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08876, pruned_loss=0.01215, audio_tagging_loss=0.009157, over 3044505.21 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:15:22,232 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=12.0 2023-11-27 22:15:38,399 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487500 2023-11-27 22:15:45,953 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6550, loss[loss=0.07393, simple_loss=0.09798, pruned_loss=0.01745, audio_tagging_loss=0.007489, over 15100.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08977, pruned_loss=0.01234, audio_tagging_loss=0.00887, over 3038577.24 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:16:05,920 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=22.5 2023-11-27 22:16:42,950 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487550 2023-11-27 22:16:45,719 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.718e+01 9.335e+01 9.836e+01 1.577e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-27 22:16:49,204 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2023-11-27 22:16:51,216 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6600, loss[loss=0.06421, simple_loss=0.09438, pruned_loss=0.009547, audio_tagging_loss=0.007474, over 14473.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09004, pruned_loss=0.01243, audio_tagging_loss=0.00876, over 3042115.73 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:17:07,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2023-11-27 22:17:18,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3250493.3333333335, ans=0.09899494936611666 2023-11-27 22:17:38,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3250560.0, ans=0.125 2023-11-27 22:17:47,837 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487600 2023-11-27 22:17:49,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3250626.6666666665, ans=0.1 2023-11-27 22:17:52,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3250626.6666666665, ans=0.2 2023-11-27 22:17:56,530 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6650, loss[loss=0.06719, simple_loss=0.09366, pruned_loss=0.01343, audio_tagging_loss=0.006928, over 15333.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09048, pruned_loss=0.01251, audio_tagging_loss=0.008788, over 3046158.84 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:17:58,526 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2023-11-27 22:18:10,274 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2023-11-27 22:18:13,824 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=12.0 2023-11-27 22:18:31,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3250826.6666666665, ans=0.125 2023-11-27 22:18:35,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3250893.3333333335, ans=0.2 2023-11-27 22:18:50,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3250960.0, ans=0.125 2023-11-27 22:18:52,914 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487650 2023-11-27 22:18:55,275 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.323e+01 8.676e+01 9.213e+01 9.807e+01 1.195e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-27 22:18:59,629 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.21 vs. limit=10.0 2023-11-27 22:19:00,103 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6700, loss[loss=0.05175, simple_loss=0.07464, pruned_loss=0.007368, audio_tagging_loss=0.007061, over 15887.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09084, pruned_loss=0.01263, audio_tagging_loss=0.008616, over 3047865.27 frames. ], batch size: 62, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:19:23,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3251093.3333333335, ans=0.5 2023-11-27 22:19:33,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3251160.0, ans=0.125 2023-11-27 22:19:40,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3251226.6666666665, ans=0.0 2023-11-27 22:19:56,243 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487700 2023-11-27 22:19:58,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3251293.3333333335, ans=0.2 2023-11-27 22:20:04,221 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6750, loss[loss=0.06469, simple_loss=0.08565, pruned_loss=0.01412, audio_tagging_loss=0.007737, over 14124.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.0897, pruned_loss=0.01237, audio_tagging_loss=0.008573, over 3038375.36 frames. ], batch size: 53, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:20:34,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3251493.3333333335, ans=0.125 2023-11-27 22:20:59,914 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487750 2023-11-27 22:21:02,154 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.663e+01 9.320e+01 9.783e+01 1.430e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-27 22:21:06,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3251693.3333333335, ans=0.2 2023-11-27 22:21:07,723 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6800, loss[loss=0.07025, simple_loss=0.0843, pruned_loss=0.0195, audio_tagging_loss=0.008596, over 14553.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08999, pruned_loss=0.01244, audio_tagging_loss=0.008441, over 3037816.07 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:21:36,337 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.41 vs. limit=22.5 2023-11-27 22:21:46,488 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.93 vs. limit=10.0 2023-11-27 22:21:50,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3251893.3333333335, ans=0.125 2023-11-27 22:21:57,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3251960.0, ans=0.125 2023-11-27 22:22:03,486 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487800 2023-11-27 22:22:11,460 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6850, loss[loss=0.06231, simple_loss=0.08513, pruned_loss=0.01196, audio_tagging_loss=0.007789, over 14573.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.0899, pruned_loss=0.01236, audio_tagging_loss=0.00847, over 3042615.58 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 22:22:33,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=3252093.3333333335, ans=15.0 2023-11-27 22:22:38,240 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=22.5 2023-11-27 22:22:51,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3252226.6666666665, ans=0.0 2023-11-27 22:22:59,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3252226.6666666665, ans=0.0 2023-11-27 22:23:06,663 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487850 2023-11-27 22:23:10,106 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.119e+01 8.804e+01 9.259e+01 1.010e+02 1.279e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-27 22:23:14,140 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6900, loss[loss=0.06643, simple_loss=0.08358, pruned_loss=0.01133, audio_tagging_loss=0.01331, over 15704.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09086, pruned_loss=0.01246, audio_tagging_loss=0.00849, over 3038153.23 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:23:15,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3252360.0, ans=0.09899494936611666 2023-11-27 22:23:22,303 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=15.0 2023-11-27 22:23:30,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3252426.6666666665, ans=22.5 2023-11-27 22:23:35,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3252426.6666666665, ans=0.2 2023-11-27 22:23:36,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3252426.6666666665, ans=0.2 2023-11-27 22:23:45,345 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=15.0 2023-11-27 22:23:48,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3252493.3333333335, ans=0.125 2023-11-27 22:23:51,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3252560.0, ans=0.2 2023-11-27 22:23:54,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3252560.0, ans=0.2 2023-11-27 22:23:58,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3252560.0, ans=0.125 2023-11-27 22:24:03,540 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 22:24:07,486 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487900 2023-11-27 22:24:14,288 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 6950, loss[loss=0.07519, simple_loss=0.1082, pruned_loss=0.01322, audio_tagging_loss=0.007864, over 14841.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08992, pruned_loss=0.01239, audio_tagging_loss=0.008573, over 3039248.54 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:24:14,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3252693.3333333335, ans=0.125 2023-11-27 22:24:22,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3252693.3333333335, ans=0.125 2023-11-27 22:24:31,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3252760.0, ans=0.0 2023-11-27 22:24:49,309 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.99 vs. limit=22.5 2023-11-27 22:25:07,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3252960.0, ans=0.125 2023-11-27 22:25:11,126 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 487950 2023-11-27 22:25:17,137 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.695e+01 9.327e+01 1.020e+02 1.737e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-27 22:25:22,555 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7000, loss[loss=0.07522, simple_loss=0.1083, pruned_loss=0.01378, audio_tagging_loss=0.007307, over 15516.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09002, pruned_loss=0.01233, audio_tagging_loss=0.008633, over 3037902.77 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:25:25,915 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.77 vs. limit=6.0 2023-11-27 22:25:33,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3253026.6666666665, ans=6.0 2023-11-27 22:25:48,800 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.20 vs. limit=10.0 2023-11-27 22:26:55,852 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2023-11-27 22:27:26,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3253160.0, ans=0.2 2023-11-27 22:27:38,244 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=12.0 2023-11-27 22:28:41,359 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488000 2023-11-27 22:29:17,429 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7050, loss[loss=0.07305, simple_loss=0.09275, pruned_loss=0.01319, audio_tagging_loss=0.01348, over 14329.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09009, pruned_loss=0.01224, audio_tagging_loss=0.008687, over 3040099.36 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 22:29:17,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3253360.0, ans=0.0 2023-11-27 22:29:54,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3253360.0, ans=0.125 2023-11-27 22:30:04,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3253426.6666666665, ans=0.2 2023-11-27 22:30:47,358 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=22.5 2023-11-27 22:32:03,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3253560.0, ans=0.2 2023-11-27 22:32:41,738 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488050 2023-11-27 22:33:03,775 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.458e+01 9.244e+01 1.037e+02 2.754e+02, threshold=1.849e+02, percent-clipped=1.0 2023-11-27 22:33:11,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3253693.3333333335, ans=0.0 2023-11-27 22:33:16,727 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7100, loss[loss=0.06269, simple_loss=0.08683, pruned_loss=0.01021, audio_tagging_loss=0.009059, over 14612.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09076, pruned_loss=0.01245, audio_tagging_loss=0.008776, over 3037043.42 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 22:33:39,926 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.00 vs. limit=22.5 2023-11-27 22:34:06,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3253760.0, ans=0.2 2023-11-27 22:34:44,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3253760.0, ans=0.2 2023-11-27 22:36:47,362 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488100 2023-11-27 22:36:47,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3253960.0, ans=0.1 2023-11-27 22:37:13,746 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7150, loss[loss=0.07585, simple_loss=0.09932, pruned_loss=0.01723, audio_tagging_loss=0.008963, over 16682.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09052, pruned_loss=0.01251, audio_tagging_loss=0.008834, over 3043286.83 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 8.0 2023-11-27 22:38:11,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3254093.3333333335, ans=0.125 2023-11-27 22:39:17,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3254160.0, ans=0.0 2023-11-27 22:39:23,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3254226.6666666665, ans=0.125 2023-11-27 22:39:38,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3254226.6666666665, ans=0.0 2023-11-27 22:39:50,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3254226.6666666665, ans=0.125 2023-11-27 22:39:55,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3254226.6666666665, ans=0.125 2023-11-27 22:40:28,474 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488150 2023-11-27 22:40:32,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3254293.3333333335, ans=0.0 2023-11-27 22:40:45,317 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.853e+01 8.864e+01 9.452e+01 1.007e+02 1.551e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 22:41:01,828 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7200, loss[loss=0.0752, simple_loss=0.09923, pruned_loss=0.01689, audio_tagging_loss=0.008694, over 16207.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09023, pruned_loss=0.01251, audio_tagging_loss=0.008966, over 3048893.39 frames. ], batch size: 61, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:41:25,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3254360.0, ans=0.0 2023-11-27 22:41:31,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3254426.6666666665, ans=0.125 2023-11-27 22:42:29,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3254493.3333333335, ans=0.125 2023-11-27 22:43:11,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3254560.0, ans=0.2 2023-11-27 22:43:46,482 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488200 2023-11-27 22:43:46,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3254626.6666666665, ans=0.125 2023-11-27 22:43:50,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3254626.6666666665, ans=0.0 2023-11-27 22:44:10,250 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7250, loss[loss=0.07899, simple_loss=0.1065, pruned_loss=0.01489, audio_tagging_loss=0.01085, over 14464.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09066, pruned_loss=0.01268, audio_tagging_loss=0.008992, over 3047416.60 frames. ], batch size: 54, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:44:22,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3254693.3333333335, ans=0.0 2023-11-27 22:44:43,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3254760.0, ans=0.1 2023-11-27 22:44:57,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3254760.0, ans=0.035 2023-11-27 22:45:48,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3254893.3333333335, ans=10.0 2023-11-27 22:46:29,628 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488250 2023-11-27 22:46:41,972 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.745e+01 9.249e+01 1.003e+02 1.162e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-27 22:46:46,967 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7300, loss[loss=0.06969, simple_loss=0.08688, pruned_loss=0.01379, audio_tagging_loss=0.01247, over 14910.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09092, pruned_loss=0.01264, audio_tagging_loss=0.008943, over 3050284.28 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:48:32,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3255226.6666666665, ans=0.1 2023-11-27 22:48:44,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3255226.6666666665, ans=0.2 2023-11-27 22:49:05,078 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488300 2023-11-27 22:49:16,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3255293.3333333335, ans=0.125 2023-11-27 22:49:24,619 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7350, loss[loss=0.05814, simple_loss=0.07696, pruned_loss=0.01251, audio_tagging_loss=0.007151, over 15062.00 frames. ], tot_loss[loss=0.06745, simple_loss=0.09168, pruned_loss=0.01284, audio_tagging_loss=0.008769, over 3045556.18 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:52:03,383 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488350 2023-11-27 22:52:06,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3255626.6666666665, ans=0.0 2023-11-27 22:52:15,254 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 8.717e+01 9.286e+01 1.027e+02 1.219e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-27 22:52:20,660 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7400, loss[loss=0.06363, simple_loss=0.09589, pruned_loss=0.008308, audio_tagging_loss=0.007379, over 15168.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09127, pruned_loss=0.01258, audio_tagging_loss=0.008662, over 3046274.76 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:53:06,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3255760.0, ans=0.0 2023-11-27 22:53:33,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3255826.6666666665, ans=0.125 2023-11-27 22:54:59,941 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488400 2023-11-27 22:55:14,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3255960.0, ans=0.125 2023-11-27 22:55:16,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3255960.0, ans=0.2 2023-11-27 22:55:16,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3255960.0, ans=0.125 2023-11-27 22:55:21,589 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7450, loss[loss=0.08215, simple_loss=0.1171, pruned_loss=0.01841, audio_tagging_loss=0.005186, over 15820.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.0909, pruned_loss=0.01266, audio_tagging_loss=0.008563, over 3051752.63 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:55:39,249 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-27 22:57:45,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3256293.3333333335, ans=0.2 2023-11-27 22:57:48,306 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488450 2023-11-27 22:57:57,541 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.50 vs. limit=22.5 2023-11-27 22:57:59,827 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.653e+01 9.263e+01 9.964e+01 1.295e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-27 22:58:07,600 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7500, loss[loss=0.0602, simple_loss=0.08139, pruned_loss=0.01294, audio_tagging_loss=0.006568, over 15593.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09091, pruned_loss=0.01278, audio_tagging_loss=0.008488, over 3051469.67 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 22:58:18,332 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.61 vs. limit=10.0 2023-11-27 22:58:34,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3256360.0, ans=0.2 2023-11-27 22:59:27,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3256493.3333333335, ans=0.125 2023-11-27 23:00:01,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3256560.0, ans=0.1 2023-11-27 23:00:42,499 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488500 2023-11-27 23:00:46,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3256626.6666666665, ans=0.05 2023-11-27 23:01:02,550 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7550, loss[loss=0.08159, simple_loss=0.1139, pruned_loss=0.01624, audio_tagging_loss=0.008413, over 15032.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08949, pruned_loss=0.01259, audio_tagging_loss=0.008564, over 3038006.18 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:02:17,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3256826.6666666665, ans=0.2 2023-11-27 23:02:43,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3256893.3333333335, ans=0.2 2023-11-27 23:03:08,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3256893.3333333335, ans=0.125 2023-11-27 23:03:30,429 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488550 2023-11-27 23:03:43,846 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.710e+01 9.273e+01 1.023e+02 1.229e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-27 23:03:49,352 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7600, loss[loss=0.07035, simple_loss=0.1016, pruned_loss=0.01188, audio_tagging_loss=0.007662, over 13403.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08943, pruned_loss=0.01247, audio_tagging_loss=0.008646, over 3035271.22 frames. ], batch size: 52, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:04:41,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3257093.3333333335, ans=0.125 2023-11-27 23:04:48,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3257160.0, ans=0.025 2023-11-27 23:04:55,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3257160.0, ans=0.0 2023-11-27 23:04:59,790 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.78 vs. limit=15.0 2023-11-27 23:05:47,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3257226.6666666665, ans=0.125 2023-11-27 23:05:56,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3257293.3333333335, ans=0.125 2023-11-27 23:05:59,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3257293.3333333335, ans=0.0 2023-11-27 23:06:06,793 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488600 2023-11-27 23:06:26,493 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7650, loss[loss=0.03626, simple_loss=0.03948, pruned_loss=0.003953, audio_tagging_loss=0.01257, over 16521.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08896, pruned_loss=0.01236, audio_tagging_loss=0.008575, over 3034128.12 frames. ], batch size: 65, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:06:36,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3257360.0, ans=0.125 2023-11-27 23:07:02,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3257426.6666666665, ans=0.0 2023-11-27 23:07:02,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3257426.6666666665, ans=0.125 2023-11-27 23:07:16,239 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2023-11-27 23:07:32,375 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=22.5 2023-11-27 23:07:40,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3257493.3333333335, ans=0.0 2023-11-27 23:07:43,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3257493.3333333335, ans=0.2 2023-11-27 23:08:21,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3257560.0, ans=0.1 2023-11-27 23:08:36,609 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488650 2023-11-27 23:08:50,818 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.807e+01 9.447e+01 1.017e+02 1.729e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-27 23:08:53,434 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7700, loss[loss=0.06421, simple_loss=0.08732, pruned_loss=0.01117, audio_tagging_loss=0.009377, over 14412.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08964, pruned_loss=0.01267, audio_tagging_loss=0.008616, over 3038363.61 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:09:23,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3257760.0, ans=0.1 2023-11-27 23:09:53,363 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2023-11-27 23:10:53,301 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.64 vs. limit=15.0 2023-11-27 23:10:55,021 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488700 2023-11-27 23:10:58,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3257960.0, ans=0.0 2023-11-27 23:10:58,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3257960.0, ans=0.09899494936611666 2023-11-27 23:11:09,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=12.0 2023-11-27 23:11:16,854 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7750, loss[loss=0.06941, simple_loss=0.08897, pruned_loss=0.01508, audio_tagging_loss=0.009833, over 14667.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08955, pruned_loss=0.01261, audio_tagging_loss=0.008708, over 3034677.84 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:12:26,552 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2023-11-27 23:12:39,552 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.92 vs. limit=15.0 2023-11-27 23:12:53,392 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2023-11-27 23:13:42,189 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488750 2023-11-27 23:13:42,683 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2023-11-27 23:13:55,226 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 8.828e+01 9.509e+01 1.004e+02 1.323e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-27 23:13:57,930 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7800, loss[loss=0.0577, simple_loss=0.07558, pruned_loss=0.01198, audio_tagging_loss=0.007931, over 15459.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09001, pruned_loss=0.01258, audio_tagging_loss=0.008746, over 3030670.84 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:14:19,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3258360.0, ans=0.0 2023-11-27 23:14:21,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3258426.6666666665, ans=0.125 2023-11-27 23:14:28,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3258426.6666666665, ans=0.1 2023-11-27 23:15:50,374 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488800 2023-11-27 23:15:50,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3258626.6666666665, ans=0.0 2023-11-27 23:15:50,961 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=22.5 2023-11-27 23:16:02,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3258626.6666666665, ans=0.0 2023-11-27 23:16:08,329 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7850, loss[loss=0.06548, simple_loss=0.08442, pruned_loss=0.01382, audio_tagging_loss=0.009451, over 15580.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09074, pruned_loss=0.01272, audio_tagging_loss=0.008814, over 3036945.44 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:16:19,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3258693.3333333335, ans=0.2 2023-11-27 23:16:23,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3258693.3333333335, ans=0.125 2023-11-27 23:16:29,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3258760.0, ans=0.125 2023-11-27 23:16:52,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3258760.0, ans=0.125 2023-11-27 23:16:53,217 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.48 vs. limit=5.0 2023-11-27 23:17:32,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3258893.3333333335, ans=0.05 2023-11-27 23:17:38,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3258893.3333333335, ans=0.2 2023-11-27 23:17:39,336 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.27 vs. limit=15.0 2023-11-27 23:17:43,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3258960.0, ans=0.1 2023-11-27 23:17:51,941 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488850 2023-11-27 23:18:02,652 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.704e+01 9.225e+01 1.001e+02 1.986e+02, threshold=1.845e+02, percent-clipped=1.0 2023-11-27 23:18:06,090 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7900, loss[loss=0.06213, simple_loss=0.08213, pruned_loss=0.01117, audio_tagging_loss=0.009892, over 15051.00 frames. ], tot_loss[loss=0.06706, simple_loss=0.09112, pruned_loss=0.01279, audio_tagging_loss=0.008711, over 3050054.09 frames. ], batch size: 55, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:18:34,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3259093.3333333335, ans=0.125 2023-11-27 23:18:47,661 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.57 vs. limit=12.0 2023-11-27 23:18:55,548 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2023-11-27 23:19:03,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3259160.0, ans=0.2 2023-11-27 23:19:25,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3259226.6666666665, ans=0.015 2023-11-27 23:19:48,328 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488900 2023-11-27 23:20:01,448 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 7950, loss[loss=0.0769, simple_loss=0.1043, pruned_loss=0.01206, audio_tagging_loss=0.01268, over 15930.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09092, pruned_loss=0.01277, audio_tagging_loss=0.008817, over 3050998.27 frames. ], batch size: 57, lr: 1.65e-03, grad_scale: 16.0 2023-11-27 23:20:30,365 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 23:20:31,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3259426.6666666665, ans=0.125 2023-11-27 23:21:02,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3259560.0, ans=0.0 2023-11-27 23:21:28,163 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.12 vs. limit=15.0 2023-11-27 23:21:28,853 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 488950 2023-11-27 23:21:32,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3259626.6666666665, ans=0.2 2023-11-27 23:21:36,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2023-11-27 23:21:37,646 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.047e+01 8.632e+01 9.434e+01 1.008e+02 1.251e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-27 23:21:39,783 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8000, loss[loss=0.05958, simple_loss=0.07154, pruned_loss=0.01182, audio_tagging_loss=0.012, over 15601.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09082, pruned_loss=0.01281, audio_tagging_loss=0.008916, over 3046559.24 frames. ], batch size: 60, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:22:35,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3259826.6666666665, ans=0.0 2023-11-27 23:22:38,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3259893.3333333335, ans=0.125 2023-11-27 23:22:42,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3259893.3333333335, ans=0.0 2023-11-27 23:22:58,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3259960.0, ans=0.125 2023-11-27 23:23:06,048 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489000 2023-11-27 23:23:08,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3259960.0, ans=0.1 2023-11-27 23:23:10,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3259960.0, ans=0.125 2023-11-27 23:23:11,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3259960.0, ans=0.2 2023-11-27 23:23:16,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3260026.6666666665, ans=0.125 2023-11-27 23:23:17,653 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8050, loss[loss=0.0778, simple_loss=0.1103, pruned_loss=0.01597, audio_tagging_loss=0.006673, over 14788.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09, pruned_loss=0.01262, audio_tagging_loss=0.009057, over 3046853.11 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:23:36,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3260093.3333333335, ans=0.125 2023-11-27 23:24:20,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3260226.6666666665, ans=0.0 2023-11-27 23:24:29,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3260293.3333333335, ans=0.2 2023-11-27 23:24:37,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3260293.3333333335, ans=0.125 2023-11-27 23:24:40,365 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489050 2023-11-27 23:24:46,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3260293.3333333335, ans=0.125 2023-11-27 23:24:49,213 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.685e+01 9.405e+01 9.974e+01 1.162e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-27 23:24:50,870 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8100, loss[loss=0.05433, simple_loss=0.07264, pruned_loss=0.01077, audio_tagging_loss=0.007238, over 15410.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.08996, pruned_loss=0.01258, audio_tagging_loss=0.008963, over 3045885.73 frames. ], batch size: 56, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:25:26,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.59 vs. limit=22.5 2023-11-27 23:25:40,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3260493.3333333335, ans=0.125 2023-11-27 23:25:42,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3260493.3333333335, ans=0.125 2023-11-27 23:25:57,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3260560.0, ans=0.1 2023-11-27 23:25:59,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3260560.0, ans=0.1 2023-11-27 23:26:02,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2023-11-27 23:26:13,597 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489100 2023-11-27 23:26:15,801 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.58 vs. limit=10.0 2023-11-27 23:26:24,166 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8150, loss[loss=0.08382, simple_loss=0.109, pruned_loss=0.01914, audio_tagging_loss=0.01017, over 15508.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09015, pruned_loss=0.01256, audio_tagging_loss=0.008817, over 3053173.95 frames. ], batch size: 58, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:26:29,547 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-27 23:26:44,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3260760.0, ans=0.0 2023-11-27 23:27:00,308 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2023-11-27 23:27:01,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3260826.6666666665, ans=0.035 2023-11-27 23:27:13,532 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.65 vs. limit=15.0 2023-11-27 23:27:42,814 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489150 2023-11-27 23:27:50,588 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.874e+01 9.379e+01 1.019e+02 1.298e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-27 23:27:52,101 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8200, loss[loss=0.06231, simple_loss=0.08974, pruned_loss=0.01028, audio_tagging_loss=0.00716, over 15435.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08944, pruned_loss=0.01242, audio_tagging_loss=0.008733, over 3049596.47 frames. ], batch size: 59, lr: 1.65e-03, grad_scale: 32.0 2023-11-27 23:27:56,754 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 23:27:57,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.97 vs. limit=12.0 2023-11-27 23:28:26,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3261160.0, ans=0.09899494936611666 2023-11-27 23:29:04,127 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489200 2023-11-27 23:29:07,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3261293.3333333335, ans=0.07 2023-11-27 23:29:12,423 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.49 vs. limit=15.0 2023-11-27 23:29:13,184 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8250, loss[loss=0.06727, simple_loss=0.09486, pruned_loss=0.01147, audio_tagging_loss=0.008366, over 16071.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09024, pruned_loss=0.01247, audio_tagging_loss=0.008634, over 3038775.65 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:29:22,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3261360.0, ans=0.125 2023-11-27 23:29:42,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3261493.3333333335, ans=0.0 2023-11-27 23:29:45,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.12 vs. limit=15.0 2023-11-27 23:30:18,157 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489250 2023-11-27 23:30:23,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3261626.6666666665, ans=0.2 2023-11-27 23:30:27,237 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.905e+01 9.510e+01 1.029e+02 2.089e+02, threshold=1.902e+02, percent-clipped=1.0 2023-11-27 23:30:27,284 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8300, loss[loss=0.06302, simple_loss=0.09021, pruned_loss=0.009601, audio_tagging_loss=0.00831, over 15989.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09069, pruned_loss=0.0126, audio_tagging_loss=0.00863, over 3046069.88 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 16.0 2023-11-27 23:30:28,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3261693.3333333335, ans=0.0 2023-11-27 23:30:29,178 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.34 vs. limit=15.0 2023-11-27 23:30:40,432 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-27 23:31:04,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3261826.6666666665, ans=0.125 2023-11-27 23:31:05,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3261826.6666666665, ans=0.125 2023-11-27 23:31:05,875 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=15.0 2023-11-27 23:31:27,092 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489300 2023-11-27 23:31:35,068 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8350, loss[loss=0.08565, simple_loss=0.1199, pruned_loss=0.01887, audio_tagging_loss=0.00681, over 14773.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08969, pruned_loss=0.01251, audio_tagging_loss=0.008604, over 3048256.89 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-27 23:32:33,272 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489350 2023-11-27 23:32:46,744 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.332e+01 8.783e+01 9.379e+01 1.006e+02 1.235e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-27 23:32:46,785 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8400, loss[loss=0.06697, simple_loss=0.09796, pruned_loss=0.00932, audio_tagging_loss=0.008672, over 15160.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08907, pruned_loss=0.0124, audio_tagging_loss=0.008619, over 3045604.79 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:32:54,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3262360.0, ans=0.0 2023-11-27 23:33:08,841 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2023-11-27 23:33:21,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3262426.6666666665, ans=0.2 2023-11-27 23:33:34,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3262426.6666666665, ans=0.125 2023-11-27 23:35:21,104 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.55 vs. limit=5.0 2023-11-27 23:35:46,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3262560.0, ans=0.1 2023-11-27 23:36:14,926 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2023-11-27 23:36:18,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3262626.6666666665, ans=0.0 2023-11-27 23:36:28,116 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489400 2023-11-27 23:36:30,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3262626.6666666665, ans=0.015 2023-11-27 23:36:58,975 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8450, loss[loss=0.09221, simple_loss=0.1383, pruned_loss=0.01876, audio_tagging_loss=0.004314, over 16467.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08931, pruned_loss=0.01245, audio_tagging_loss=0.008619, over 3044575.65 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:37:19,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3262693.3333333335, ans=0.1 2023-11-27 23:37:19,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3262693.3333333335, ans=0.0 2023-11-27 23:37:45,288 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.92 vs. limit=22.5 2023-11-27 23:37:52,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3262760.0, ans=0.125 2023-11-27 23:38:04,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3262760.0, ans=0.0 2023-11-27 23:38:30,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3262826.6666666665, ans=0.125 2023-11-27 23:39:04,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3262826.6666666665, ans=0.125 2023-11-27 23:39:14,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3262893.3333333335, ans=0.0 2023-11-27 23:39:14,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3262893.3333333335, ans=0.125 2023-11-27 23:40:19,366 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489450 2023-11-27 23:40:53,202 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 8.777e+01 9.452e+01 1.015e+02 1.471e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-27 23:40:53,252 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8500, loss[loss=0.07245, simple_loss=0.09623, pruned_loss=0.01585, audio_tagging_loss=0.008486, over 15878.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08906, pruned_loss=0.01235, audio_tagging_loss=0.008682, over 3046764.91 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:42:33,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3263160.0, ans=0.0 2023-11-27 23:44:18,223 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489500 2023-11-27 23:44:31,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3263293.3333333335, ans=0.0 2023-11-27 23:44:42,871 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8550, loss[loss=0.08469, simple_loss=0.1087, pruned_loss=0.01918, audio_tagging_loss=0.01116, over 16117.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08962, pruned_loss=0.01258, audio_tagging_loss=0.008741, over 3044679.72 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:44:53,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3263360.0, ans=0.1 2023-11-27 23:45:21,010 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.78 vs. limit=22.5 2023-11-27 23:45:36,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3263493.3333333335, ans=0.125 2023-11-27 23:46:01,363 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.46 vs. limit=6.0 2023-11-27 23:46:08,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3263560.0, ans=0.1 2023-11-27 23:46:10,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3263560.0, ans=0.2 2023-11-27 23:46:35,802 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489550 2023-11-27 23:46:36,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3263626.6666666665, ans=0.0 2023-11-27 23:46:43,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3263626.6666666665, ans=0.125 2023-11-27 23:46:50,744 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 8.830e+01 9.577e+01 1.042e+02 1.217e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-27 23:46:50,784 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8600, loss[loss=0.05984, simple_loss=0.08209, pruned_loss=0.009144, audio_tagging_loss=0.009651, over 14395.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08968, pruned_loss=0.01244, audio_tagging_loss=0.008747, over 3047550.57 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:47:08,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2023-11-27 23:47:54,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3263826.6666666665, ans=0.125 2023-11-27 23:47:59,349 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2023-11-27 23:48:39,166 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489600 2023-11-27 23:48:39,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3263960.0, ans=0.125 2023-11-27 23:48:45,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3263960.0, ans=0.125 2023-11-27 23:48:54,431 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8650, loss[loss=0.05435, simple_loss=0.06666, pruned_loss=0.01074, audio_tagging_loss=0.01028, over 15975.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08961, pruned_loss=0.01244, audio_tagging_loss=0.008834, over 3045260.49 frames. ], batch size: 64, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:49:14,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=22.5 2023-11-27 23:50:10,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3264226.6666666665, ans=0.0 2023-11-27 23:50:12,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3264226.6666666665, ans=0.0 2023-11-27 23:50:44,254 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489650 2023-11-27 23:50:58,390 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.714e+01 8.922e+01 9.759e+01 1.039e+02 1.261e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-27 23:50:58,448 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8700, loss[loss=0.06541, simple_loss=0.09318, pruned_loss=0.01188, audio_tagging_loss=0.006937, over 14352.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09096, pruned_loss=0.01263, audio_tagging_loss=0.008873, over 3046458.67 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:51:10,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3264360.0, ans=0.125 2023-11-27 23:51:17,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3264360.0, ans=0.1 2023-11-27 23:51:45,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3264493.3333333335, ans=0.125 2023-11-27 23:51:47,871 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2023-11-27 23:51:54,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3264493.3333333335, ans=0.125 2023-11-27 23:52:17,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3264560.0, ans=0.125 2023-11-27 23:52:22,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3264560.0, ans=0.0 2023-11-27 23:52:47,611 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489700 2023-11-27 23:53:01,251 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8750, loss[loss=0.04753, simple_loss=0.06819, pruned_loss=0.005448, audio_tagging_loss=0.007985, over 14558.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09029, pruned_loss=0.0125, audio_tagging_loss=0.008939, over 3043637.37 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:54:45,047 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.53 vs. limit=22.5 2023-11-27 23:54:51,348 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489750 2023-11-27 23:54:56,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3264960.0, ans=0.0 2023-11-27 23:54:56,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3264960.0, ans=0.2 2023-11-27 23:55:06,019 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.769e+01 9.393e+01 1.008e+02 1.168e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-27 23:55:06,063 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8800, loss[loss=0.04956, simple_loss=0.05989, pruned_loss=0.006547, audio_tagging_loss=0.01306, over 14379.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09107, pruned_loss=0.0126, audio_tagging_loss=0.009026, over 3048357.97 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-27 23:56:40,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3265293.3333333335, ans=0.125 2023-11-27 23:56:52,326 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489800 2023-11-27 23:57:07,601 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8850, loss[loss=0.0847, simple_loss=0.1129, pruned_loss=0.0185, audio_tagging_loss=0.009727, over 15964.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09023, pruned_loss=0.01246, audio_tagging_loss=0.009101, over 3047719.81 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-27 23:57:12,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3265360.0, ans=0.125 2023-11-27 23:57:26,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3265360.0, ans=0.125 2023-11-27 23:57:29,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3265426.6666666665, ans=0.0 2023-11-27 23:57:35,170 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-27 23:57:43,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3265426.6666666665, ans=0.125 2023-11-27 23:58:50,961 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489850 2023-11-27 23:59:03,392 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8900, loss[loss=0.05813, simple_loss=0.08448, pruned_loss=0.01088, audio_tagging_loss=0.005008, over 14748.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09047, pruned_loss=0.01247, audio_tagging_loss=0.008904, over 3044158.39 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-27 23:59:03,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3265693.3333333335, ans=10.0 2023-11-27 23:59:05,808 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 8.603e+01 9.158e+01 9.792e+01 1.158e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-28 00:00:32,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3265893.3333333335, ans=0.2 2023-11-28 00:00:42,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3265893.3333333335, ans=0.125 2023-11-28 00:00:42,346 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:00:44,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3265960.0, ans=0.0 2023-11-28 00:00:56,460 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489900 2023-11-28 00:01:10,657 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 8950, loss[loss=0.08065, simple_loss=0.1143, pruned_loss=0.01563, audio_tagging_loss=0.007892, over 15421.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09009, pruned_loss=0.01233, audio_tagging_loss=0.008811, over 3047429.13 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:01:34,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3266093.3333333335, ans=0.1 2023-11-28 00:01:59,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3266160.0, ans=0.1 2023-11-28 00:02:01,826 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.11 vs. limit=10.0 2023-11-28 00:02:57,041 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 489950 2023-11-28 00:02:57,597 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-11-28 00:02:59,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3266293.3333333335, ans=0.1 2023-11-28 00:03:10,801 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9000, loss[loss=0.06581, simple_loss=0.08815, pruned_loss=0.01088, audio_tagging_loss=0.01086, over 14833.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09026, pruned_loss=0.01247, audio_tagging_loss=0.008702, over 3046501.53 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:03:10,803 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 00:03:48,785 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4493, 3.7611, 3.0557, 3.7934], device='cuda:2') 2023-11-28 00:03:55,675 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3126, 4.2549, 4.4986, 4.4676], device='cuda:2') 2023-11-28 00:04:14,808 INFO [train_asr.py:1267] (2/4) Epoch 41, validation: loss=0.05835, simple_loss=0.05061, pruned_loss=0.005195, audio_tagging_loss=0.02785, over 4681554.00 frames. 2023-11-28 00:04:14,810 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 00:04:16,746 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.840e+01 9.454e+01 9.905e+01 1.337e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 00:04:23,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3266360.0, ans=0.125 2023-11-28 00:04:44,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3266426.6666666665, ans=0.05 2023-11-28 00:05:03,541 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.04 vs. limit=12.0 2023-11-28 00:05:48,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3266560.0, ans=0.2 2023-11-28 00:06:02,419 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490000 2023-11-28 00:06:09,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3266626.6666666665, ans=0.125 2023-11-28 00:06:17,825 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9050, loss[loss=0.05701, simple_loss=0.07768, pruned_loss=0.009424, audio_tagging_loss=0.008748, over 15154.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09139, pruned_loss=0.01267, audio_tagging_loss=0.008476, over 3050290.41 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:06:18,717 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.64 vs. limit=15.0 2023-11-28 00:06:30,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3266693.3333333335, ans=0.1 2023-11-28 00:06:38,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3266693.3333333335, ans=0.0 2023-11-28 00:07:15,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3266826.6666666665, ans=0.125 2023-11-28 00:07:40,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3266893.3333333335, ans=0.0 2023-11-28 00:07:59,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3266960.0, ans=0.125 2023-11-28 00:08:02,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3266960.0, ans=0.1 2023-11-28 00:08:04,762 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490050 2023-11-28 00:08:19,705 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9100, loss[loss=0.07258, simple_loss=0.08937, pruned_loss=0.02082, audio_tagging_loss=0.007079, over 14412.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09112, pruned_loss=0.0126, audio_tagging_loss=0.008471, over 3051353.08 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:08:22,027 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 8.819e+01 9.395e+01 1.013e+02 1.222e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 00:08:28,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3267026.6666666665, ans=0.125 2023-11-28 00:08:31,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3267026.6666666665, ans=0.1 2023-11-28 00:09:50,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3267226.6666666665, ans=0.125 2023-11-28 00:10:04,154 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490100 2023-11-28 00:10:17,682 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9150, loss[loss=0.07197, simple_loss=0.1055, pruned_loss=0.01151, audio_tagging_loss=0.007696, over 15288.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09027, pruned_loss=0.01245, audio_tagging_loss=0.008627, over 3050032.64 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:10:40,036 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.57 vs. limit=12.0 2023-11-28 00:11:01,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3267493.3333333335, ans=0.0 2023-11-28 00:11:10,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3267493.3333333335, ans=0.125 2023-11-28 00:11:10,267 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2023-11-28 00:11:29,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3267560.0, ans=0.0 2023-11-28 00:11:57,869 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490150 2023-11-28 00:12:08,769 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9200, loss[loss=0.06806, simple_loss=0.09549, pruned_loss=0.01237, audio_tagging_loss=0.007941, over 15419.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09013, pruned_loss=0.0124, audio_tagging_loss=0.008726, over 3048261.15 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:12:11,695 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 8.944e+01 9.391e+01 1.026e+02 1.333e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 00:12:32,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3267760.0, ans=0.125 2023-11-28 00:12:54,011 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:12:59,656 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2023-11-28 00:13:56,153 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490200 2023-11-28 00:14:08,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3267960.0, ans=0.125 2023-11-28 00:14:13,558 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9250, loss[loss=0.07467, simple_loss=0.1026, pruned_loss=0.0176, audio_tagging_loss=0.005795, over 15551.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08933, pruned_loss=0.0123, audio_tagging_loss=0.008739, over 3052568.34 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:15:15,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3268160.0, ans=0.0 2023-11-28 00:15:20,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3268160.0, ans=0.125 2023-11-28 00:15:34,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3268226.6666666665, ans=0.95 2023-11-28 00:16:09,117 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490250 2023-11-28 00:16:23,569 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9300, loss[loss=0.05444, simple_loss=0.06876, pruned_loss=0.007127, audio_tagging_loss=0.01293, over 14727.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08907, pruned_loss=0.0122, audio_tagging_loss=0.008708, over 3054378.11 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:16:27,357 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.477e+01 9.136e+01 9.623e+01 1.227e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-28 00:16:55,528 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-28 00:17:20,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3268493.3333333335, ans=0.1 2023-11-28 00:18:10,803 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490300 2023-11-28 00:18:23,581 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9350, loss[loss=0.05466, simple_loss=0.07392, pruned_loss=0.01054, audio_tagging_loss=0.007156, over 15848.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.0892, pruned_loss=0.01211, audio_tagging_loss=0.008747, over 3056260.99 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:18:46,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3268760.0, ans=0.125 2023-11-28 00:19:15,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3268826.6666666665, ans=0.125 2023-11-28 00:19:30,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3268893.3333333335, ans=0.2 2023-11-28 00:20:01,424 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490350 2023-11-28 00:20:14,278 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9400, loss[loss=0.08933, simple_loss=0.1332, pruned_loss=0.01633, audio_tagging_loss=0.006401, over 15556.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09, pruned_loss=0.0122, audio_tagging_loss=0.008723, over 3065044.94 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:20:18,708 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.645e+01 9.230e+01 9.959e+01 1.190e+02, threshold=1.846e+02, percent-clipped=0.0 2023-11-28 00:20:25,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3269026.6666666665, ans=0.125 2023-11-28 00:20:27,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3269026.6666666665, ans=0.125 2023-11-28 00:21:53,624 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490400 2023-11-28 00:21:59,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3269293.3333333335, ans=0.1 2023-11-28 00:22:05,749 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9450, loss[loss=0.07974, simple_loss=0.1094, pruned_loss=0.01515, audio_tagging_loss=0.009902, over 15210.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09069, pruned_loss=0.01243, audio_tagging_loss=0.008828, over 3068498.85 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:22:05,852 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:22:06,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3269360.0, ans=0.1 2023-11-28 00:22:50,770 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.07 vs. limit=22.5 2023-11-28 00:23:35,082 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2023-11-28 00:23:45,990 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490450 2023-11-28 00:23:54,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3269626.6666666665, ans=0.125 2023-11-28 00:23:58,252 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9500, loss[loss=0.066, simple_loss=0.08968, pruned_loss=0.01307, audio_tagging_loss=0.008084, over 15771.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09076, pruned_loss=0.01253, audio_tagging_loss=0.008883, over 3065153.73 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:24:04,016 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.586e+01 9.559e+01 1.044e+02 1.238e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 00:24:11,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3269693.3333333335, ans=0.125 2023-11-28 00:24:31,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3269760.0, ans=0.05 2023-11-28 00:24:47,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3269826.6666666665, ans=0.1 2023-11-28 00:24:49,026 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=15.0 2023-11-28 00:25:14,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3269960.0, ans=0.125 2023-11-28 00:25:23,704 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2023-11-28 00:25:25,034 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490500 2023-11-28 00:25:35,573 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9550, loss[loss=0.04647, simple_loss=0.05273, pruned_loss=0.007724, audio_tagging_loss=0.01238, over 15732.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.0896, pruned_loss=0.01242, audio_tagging_loss=0.008981, over 3058203.34 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:25:49,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3270026.6666666665, ans=0.2 2023-11-28 00:25:50,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3270026.6666666665, ans=0.125 2023-11-28 00:26:26,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3270226.6666666665, ans=0.125 2023-11-28 00:26:26,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3270226.6666666665, ans=0.0 2023-11-28 00:26:49,656 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490550 2023-11-28 00:26:58,236 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9600, loss[loss=0.05949, simple_loss=0.07719, pruned_loss=0.01251, audio_tagging_loss=0.008381, over 14132.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08893, pruned_loss=0.01234, audio_tagging_loss=0.009025, over 3051158.84 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:27:02,584 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.793e+01 9.266e+01 1.006e+02 1.228e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-28 00:27:28,433 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.45 vs. limit=10.0 2023-11-28 00:27:32,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3270493.3333333335, ans=0.125 2023-11-28 00:27:32,721 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2023-11-28 00:27:52,372 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.68 vs. limit=15.0 2023-11-28 00:27:59,702 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490600 2023-11-28 00:28:08,095 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9650, loss[loss=0.06029, simple_loss=0.08476, pruned_loss=0.01165, audio_tagging_loss=0.006259, over 15908.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08851, pruned_loss=0.0123, audio_tagging_loss=0.008919, over 3044890.68 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:28:21,100 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.97 vs. limit=15.0 2023-11-28 00:28:43,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3270826.6666666665, ans=0.035 2023-11-28 00:29:03,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3270960.0, ans=0.125 2023-11-28 00:29:05,838 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490650 2023-11-28 00:29:14,532 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9700, loss[loss=0.05084, simple_loss=0.06966, pruned_loss=0.007967, audio_tagging_loss=0.008046, over 15464.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08844, pruned_loss=0.01219, audio_tagging_loss=0.008883, over 3039036.67 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:29:18,296 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.733e+01 9.513e+01 1.030e+02 1.343e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 00:29:34,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3271093.3333333335, ans=0.05 2023-11-28 00:29:35,661 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=22.5 2023-11-28 00:29:36,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3271093.3333333335, ans=0.125 2023-11-28 00:29:47,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3271160.0, ans=0.125 2023-11-28 00:30:10,943 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490700 2023-11-28 00:30:18,804 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9750, loss[loss=0.0693, simple_loss=0.09336, pruned_loss=0.01378, audio_tagging_loss=0.008839, over 14568.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.0896, pruned_loss=0.01231, audio_tagging_loss=0.008759, over 3047376.69 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:30:18,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3271360.0, ans=0.125 2023-11-28 00:30:20,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3271360.0, ans=0.0 2023-11-28 00:30:39,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3271426.6666666665, ans=0.125 2023-11-28 00:30:57,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3271560.0, ans=0.0 2023-11-28 00:31:13,619 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490750 2023-11-28 00:31:16,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3271626.6666666665, ans=0.125 2023-11-28 00:31:20,498 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9800, loss[loss=0.05108, simple_loss=0.07201, pruned_loss=0.008562, audio_tagging_loss=0.006509, over 14855.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08999, pruned_loss=0.01233, audio_tagging_loss=0.008604, over 3046833.78 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:31:23,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.662e+01 9.364e+01 1.024e+02 1.595e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 00:31:26,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3271693.3333333335, ans=0.125 2023-11-28 00:31:59,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.89 vs. limit=22.5 2023-11-28 00:32:09,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3271960.0, ans=10.0 2023-11-28 00:32:12,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3271960.0, ans=0.125 2023-11-28 00:32:13,080 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490800 2023-11-28 00:32:15,793 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:32:17,612 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.66 vs. limit=15.0 2023-11-28 00:32:20,847 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9850, loss[loss=0.07506, simple_loss=0.09729, pruned_loss=0.01651, audio_tagging_loss=0.009911, over 15375.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09069, pruned_loss=0.01246, audio_tagging_loss=0.008521, over 3045572.64 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:32:40,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3272093.3333333335, ans=0.0 2023-11-28 00:32:46,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3272160.0, ans=0.125 2023-11-28 00:32:49,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3272160.0, ans=0.1 2023-11-28 00:33:12,909 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490850 2023-11-28 00:33:20,804 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9900, loss[loss=0.05899, simple_loss=0.07996, pruned_loss=0.008527, audio_tagging_loss=0.01049, over 15241.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09121, pruned_loss=0.01253, audio_tagging_loss=0.00853, over 3048210.96 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:33:24,130 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.601e+01 9.033e+01 9.485e+01 1.050e+02 1.243e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 00:33:54,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3272560.0, ans=0.125 2023-11-28 00:33:59,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3272560.0, ans=0.1 2023-11-28 00:34:05,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3272560.0, ans=0.0 2023-11-28 00:34:09,983 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=12.0 2023-11-28 00:34:10,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3272626.6666666665, ans=0.0 2023-11-28 00:34:11,798 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490900 2023-11-28 00:34:18,416 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 9950, loss[loss=0.06799, simple_loss=0.08865, pruned_loss=0.01244, audio_tagging_loss=0.01123, over 15807.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09078, pruned_loss=0.0125, audio_tagging_loss=0.008511, over 3044538.68 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:34:37,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3272760.0, ans=0.0 2023-11-28 00:34:37,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3272760.0, ans=0.0 2023-11-28 00:34:39,288 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:34:43,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3272826.6666666665, ans=0.125 2023-11-28 00:35:09,067 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 490950 2023-11-28 00:35:16,006 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10000, loss[loss=0.09116, simple_loss=0.1248, pruned_loss=0.02114, audio_tagging_loss=0.007639, over 13730.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09102, pruned_loss=0.01262, audio_tagging_loss=0.008519, over 3046899.90 frames. ], batch size: 52, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:35:19,717 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.605e+01 9.101e+01 9.831e+01 1.246e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-28 00:35:28,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3273093.3333333335, ans=0.125 2023-11-28 00:35:31,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3273093.3333333335, ans=0.125 2023-11-28 00:35:40,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3273160.0, ans=0.125 2023-11-28 00:36:06,563 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491000 2023-11-28 00:36:13,244 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10050, loss[loss=0.05906, simple_loss=0.07914, pruned_loss=0.01091, audio_tagging_loss=0.008583, over 15639.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09058, pruned_loss=0.01259, audio_tagging_loss=0.008444, over 3043438.16 frames. ], batch size: 60, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:36:15,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3273360.0, ans=0.0 2023-11-28 00:36:16,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3273360.0, ans=0.125 2023-11-28 00:36:22,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3273360.0, ans=0.0 2023-11-28 00:36:25,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3273426.6666666665, ans=0.0 2023-11-28 00:36:34,050 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.96 vs. limit=10.0 2023-11-28 00:36:34,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3273426.6666666665, ans=0.2 2023-11-28 00:36:47,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3273560.0, ans=0.125 2023-11-28 00:36:54,742 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.29 vs. limit=22.5 2023-11-28 00:36:57,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3273560.0, ans=0.0 2023-11-28 00:37:05,419 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491050 2023-11-28 00:37:08,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3273626.6666666665, ans=0.0 2023-11-28 00:37:11,861 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10100, loss[loss=0.07168, simple_loss=0.0915, pruned_loss=0.01723, audio_tagging_loss=0.0087, over 15947.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09028, pruned_loss=0.01251, audio_tagging_loss=0.008555, over 3050094.48 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:37:11,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3273693.3333333335, ans=0.125 2023-11-28 00:37:12,524 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.90 vs. limit=22.5 2023-11-28 00:37:14,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3273693.3333333335, ans=0.125 2023-11-28 00:37:17,289 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 8.687e+01 9.300e+01 1.008e+02 1.276e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 00:37:40,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3273826.6666666665, ans=0.125 2023-11-28 00:37:41,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3273826.6666666665, ans=0.125 2023-11-28 00:37:58,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3273960.0, ans=0.04949747468305833 2023-11-28 00:38:01,027 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:38:02,177 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491100 2023-11-28 00:38:02,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3273960.0, ans=0.0 2023-11-28 00:38:03,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3273960.0, ans=0.125 2023-11-28 00:38:09,087 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10150, loss[loss=0.06544, simple_loss=0.09526, pruned_loss=0.01166, audio_tagging_loss=0.006148, over 15062.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09032, pruned_loss=0.01266, audio_tagging_loss=0.00865, over 3053350.43 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:38:11,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3274026.6666666665, ans=0.09899494936611666 2023-11-28 00:38:16,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3274026.6666666665, ans=0.125 2023-11-28 00:38:20,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3274093.3333333335, ans=0.0 2023-11-28 00:38:31,061 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:38:39,090 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:38:39,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3274160.0, ans=0.125 2023-11-28 00:38:47,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3274226.6666666665, ans=0.125 2023-11-28 00:38:48,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3274226.6666666665, ans=0.2 2023-11-28 00:38:59,903 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491150 2023-11-28 00:39:04,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3274293.3333333335, ans=0.125 2023-11-28 00:39:06,394 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10200, loss[loss=0.0777, simple_loss=0.1072, pruned_loss=0.01424, audio_tagging_loss=0.009867, over 15330.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08995, pruned_loss=0.01251, audio_tagging_loss=0.008781, over 3063016.04 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:39:12,515 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 8.857e+01 9.633e+01 1.053e+02 1.293e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 00:39:16,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3274360.0, ans=0.125 2023-11-28 00:39:19,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3274426.6666666665, ans=0.125 2023-11-28 00:39:21,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3274426.6666666665, ans=0.125 2023-11-28 00:39:24,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3274426.6666666665, ans=0.0 2023-11-28 00:39:31,024 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:39:36,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3274493.3333333335, ans=0.04949747468305833 2023-11-28 00:39:57,839 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491200 2023-11-28 00:39:59,089 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:40:02,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3274626.6666666665, ans=0.125 2023-11-28 00:40:05,363 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10250, loss[loss=0.08754, simple_loss=0.1259, pruned_loss=0.01692, audio_tagging_loss=0.007649, over 15339.00 frames. ], tot_loss[loss=0.06707, simple_loss=0.0912, pruned_loss=0.01264, audio_tagging_loss=0.008832, over 3059224.09 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:40:10,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3274693.3333333335, ans=0.125 2023-11-28 00:40:18,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3274760.0, ans=0.125 2023-11-28 00:40:27,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3274826.6666666665, ans=0.125 2023-11-28 00:40:50,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3274960.0, ans=0.0 2023-11-28 00:40:55,955 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491250 2023-11-28 00:41:02,343 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10300, loss[loss=0.05397, simple_loss=0.07427, pruned_loss=0.009892, audio_tagging_loss=0.006945, over 14439.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09047, pruned_loss=0.01257, audio_tagging_loss=0.008843, over 3060104.67 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:41:08,322 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.570e+01 8.818e+01 9.627e+01 1.031e+02 1.268e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 00:41:15,543 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:41:21,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3275093.3333333335, ans=0.5 2023-11-28 00:41:51,147 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2023-11-28 00:41:53,388 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491300 2023-11-28 00:41:59,903 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10350, loss[loss=0.07953, simple_loss=0.1117, pruned_loss=0.01688, audio_tagging_loss=0.006805, over 14918.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09113, pruned_loss=0.01269, audio_tagging_loss=0.008873, over 3056172.26 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 00:42:10,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3275426.6666666665, ans=0.1 2023-11-28 00:42:11,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3275426.6666666665, ans=0.0 2023-11-28 00:42:13,795 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:42:22,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3275493.3333333335, ans=0.125 2023-11-28 00:42:28,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3275493.3333333335, ans=0.125 2023-11-28 00:42:49,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3275626.6666666665, ans=0.1 2023-11-28 00:42:50,250 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491350 2023-11-28 00:42:56,800 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10400, loss[loss=0.06907, simple_loss=0.08727, pruned_loss=0.01469, audio_tagging_loss=0.01075, over 15177.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09052, pruned_loss=0.01255, audio_tagging_loss=0.008992, over 3051605.87 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:43:02,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.297e+01 8.640e+01 9.257e+01 1.001e+02 1.271e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-28 00:43:06,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3275693.3333333335, ans=0.125 2023-11-28 00:43:08,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3275760.0, ans=0.1 2023-11-28 00:43:17,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3275760.0, ans=0.1 2023-11-28 00:43:25,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3275826.6666666665, ans=0.2 2023-11-28 00:43:26,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3275826.6666666665, ans=0.2 2023-11-28 00:43:27,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3275826.6666666665, ans=0.04949747468305833 2023-11-28 00:43:31,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3275893.3333333335, ans=0.1 2023-11-28 00:43:39,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3275893.3333333335, ans=0.0 2023-11-28 00:43:46,898 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491400 2023-11-28 00:43:54,165 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10450, loss[loss=0.04613, simple_loss=0.0582, pruned_loss=0.01011, audio_tagging_loss=0.006915, over 14582.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.08993, pruned_loss=0.01269, audio_tagging_loss=0.008953, over 3047771.54 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:44:09,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3276093.3333333335, ans=0.1 2023-11-28 00:44:16,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3276160.0, ans=0.2 2023-11-28 00:44:24,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3276160.0, ans=0.125 2023-11-28 00:44:31,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3276226.6666666665, ans=0.0 2023-11-28 00:44:32,266 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=6.0 2023-11-28 00:44:44,526 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491450 2023-11-28 00:44:51,493 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10500, loss[loss=0.06405, simple_loss=0.08851, pruned_loss=0.0106, audio_tagging_loss=0.009198, over 15174.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08901, pruned_loss=0.01244, audio_tagging_loss=0.008849, over 3045772.66 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:44:52,193 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2023-11-28 00:44:53,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3276360.0, ans=0.125 2023-11-28 00:44:56,996 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.171e+01 8.695e+01 9.363e+01 1.021e+02 1.243e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 00:45:03,321 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=22.5 2023-11-28 00:45:12,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3276493.3333333335, ans=0.0 2023-11-28 00:45:13,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3276493.3333333335, ans=0.0 2023-11-28 00:45:21,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3276493.3333333335, ans=0.125 2023-11-28 00:45:41,401 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491500 2023-11-28 00:45:43,319 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:45:44,811 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.95 vs. limit=10.0 2023-11-28 00:45:48,518 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10550, loss[loss=0.04383, simple_loss=0.05936, pruned_loss=0.005497, audio_tagging_loss=0.008656, over 15963.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.08972, pruned_loss=0.01254, audio_tagging_loss=0.008729, over 3047844.31 frames. ], batch size: 63, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:46:03,964 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.42 vs. limit=10.0 2023-11-28 00:46:05,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.69 vs. limit=15.0 2023-11-28 00:46:09,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3276760.0, ans=0.125 2023-11-28 00:46:17,010 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2023-11-28 00:46:17,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3276826.6666666665, ans=0.1 2023-11-28 00:46:24,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3276893.3333333335, ans=0.125 2023-11-28 00:46:27,882 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.27 vs. limit=12.0 2023-11-28 00:46:30,807 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.38 vs. limit=15.0 2023-11-28 00:46:32,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3276893.3333333335, ans=0.035 2023-11-28 00:46:39,191 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491550 2023-11-28 00:46:45,633 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10600, loss[loss=0.08172, simple_loss=0.1211, pruned_loss=0.01602, audio_tagging_loss=0.005173, over 15501.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09015, pruned_loss=0.01262, audio_tagging_loss=0.008679, over 3042508.51 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:46:51,903 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.296e+01 8.682e+01 9.138e+01 9.881e+01 1.216e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-28 00:46:54,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3277026.6666666665, ans=0.0 2023-11-28 00:47:22,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3277226.6666666665, ans=0.1 2023-11-28 00:47:31,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3277293.3333333335, ans=0.0 2023-11-28 00:47:36,666 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491600 2023-11-28 00:47:37,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3277293.3333333335, ans=0.125 2023-11-28 00:47:44,081 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10650, loss[loss=0.07811, simple_loss=0.1166, pruned_loss=0.01171, audio_tagging_loss=0.008117, over 15048.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09061, pruned_loss=0.01262, audio_tagging_loss=0.008532, over 3047869.19 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:47:48,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3277360.0, ans=0.0 2023-11-28 00:47:51,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3277360.0, ans=0.125 2023-11-28 00:48:02,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3277426.6666666665, ans=0.1 2023-11-28 00:48:06,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3277493.3333333335, ans=0.125 2023-11-28 00:48:13,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3277493.3333333335, ans=0.125 2023-11-28 00:48:17,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3277560.0, ans=0.125 2023-11-28 00:48:30,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3277626.6666666665, ans=0.1 2023-11-28 00:48:34,203 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491650 2023-11-28 00:48:35,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3277626.6666666665, ans=0.125 2023-11-28 00:48:41,245 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10700, loss[loss=0.06441, simple_loss=0.08248, pruned_loss=0.01184, audio_tagging_loss=0.01133, over 14517.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.0908, pruned_loss=0.01286, audio_tagging_loss=0.008542, over 3044531.56 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:48:41,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3277693.3333333335, ans=0.125 2023-11-28 00:48:43,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3277693.3333333335, ans=0.125 2023-11-28 00:48:46,553 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.912e+01 8.856e+01 9.300e+01 9.841e+01 1.574e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 00:48:53,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3277760.0, ans=0.07 2023-11-28 00:48:54,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3277760.0, ans=0.125 2023-11-28 00:48:55,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3277760.0, ans=0.125 2023-11-28 00:48:57,813 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2023-11-28 00:49:09,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3277826.6666666665, ans=0.0 2023-11-28 00:49:30,804 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491700 2023-11-28 00:49:34,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3277960.0, ans=0.05 2023-11-28 00:49:37,222 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10750, loss[loss=0.06921, simple_loss=0.1001, pruned_loss=0.01186, audio_tagging_loss=0.007285, over 14741.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09048, pruned_loss=0.01266, audio_tagging_loss=0.008436, over 3047760.60 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:49:48,997 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.66 vs. limit=22.5 2023-11-28 00:49:53,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3278093.3333333335, ans=0.125 2023-11-28 00:49:58,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3278093.3333333335, ans=0.05 2023-11-28 00:49:59,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3278093.3333333335, ans=0.0 2023-11-28 00:50:17,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3278226.6666666665, ans=0.2 2023-11-28 00:50:25,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3278293.3333333335, ans=0.07 2023-11-28 00:50:27,869 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491750 2023-11-28 00:50:35,821 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10800, loss[loss=0.06255, simple_loss=0.09114, pruned_loss=0.008535, audio_tagging_loss=0.008445, over 14924.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08975, pruned_loss=0.01249, audio_tagging_loss=0.008443, over 3049205.60 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:50:41,278 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.816e+01 8.647e+01 9.300e+01 1.005e+02 1.391e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 00:51:26,200 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491800 2023-11-28 00:51:26,805 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.63 vs. limit=10.0 2023-11-28 00:51:31,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3278626.6666666665, ans=0.125 2023-11-28 00:51:33,572 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10850, loss[loss=0.07535, simple_loss=0.09836, pruned_loss=0.01438, audio_tagging_loss=0.0118, over 14638.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08996, pruned_loss=0.0125, audio_tagging_loss=0.008533, over 3051763.87 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:51:33,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3278693.3333333335, ans=0.125 2023-11-28 00:51:43,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3278760.0, ans=0.125 2023-11-28 00:51:54,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3278826.6666666665, ans=0.125 2023-11-28 00:52:06,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3278893.3333333335, ans=0.125 2023-11-28 00:52:10,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3278893.3333333335, ans=0.0 2023-11-28 00:52:22,954 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491850 2023-11-28 00:52:27,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3278960.0, ans=0.125 2023-11-28 00:52:28,250 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:52:29,389 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10900, loss[loss=0.04547, simple_loss=0.06056, pruned_loss=0.005139, audio_tagging_loss=0.01005, over 15067.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08905, pruned_loss=0.01234, audio_tagging_loss=0.008666, over 3050786.21 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:52:34,710 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.617e+01 8.904e+01 9.696e+01 1.053e+02 1.235e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 00:52:46,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3279093.3333333335, ans=0.2 2023-11-28 00:52:51,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3279160.0, ans=0.125 2023-11-28 00:53:12,306 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.22 vs. limit=15.0 2023-11-28 00:53:19,348 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491900 2023-11-28 00:53:25,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3279360.0, ans=0.0 2023-11-28 00:53:25,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3279360.0, ans=0.0 2023-11-28 00:53:26,250 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 10950, loss[loss=0.06859, simple_loss=0.08713, pruned_loss=0.01588, audio_tagging_loss=0.009144, over 15755.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08958, pruned_loss=0.01231, audio_tagging_loss=0.008769, over 3054179.57 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:53:31,736 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.14 vs. limit=10.0 2023-11-28 00:53:37,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3279426.6666666665, ans=0.0 2023-11-28 00:53:39,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3279426.6666666665, ans=0.125 2023-11-28 00:53:42,296 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:53:54,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3279493.3333333335, ans=0.05 2023-11-28 00:54:03,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3279560.0, ans=0.125 2023-11-28 00:54:17,702 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 491950 2023-11-28 00:54:22,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3279626.6666666665, ans=0.125 2023-11-28 00:54:24,147 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11000, loss[loss=0.05181, simple_loss=0.05381, pruned_loss=0.007918, audio_tagging_loss=0.01699, over 14130.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09004, pruned_loss=0.01254, audio_tagging_loss=0.008822, over 3050873.82 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:54:30,054 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.813e+01 8.785e+01 9.323e+01 1.002e+02 1.243e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 00:54:35,484 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 00:54:43,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3279760.0, ans=0.1 2023-11-28 00:54:45,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3279826.6666666665, ans=0.07 2023-11-28 00:54:51,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3279826.6666666665, ans=0.2 2023-11-28 00:55:14,314 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492000 2023-11-28 00:55:22,874 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11050, loss[loss=0.07743, simple_loss=0.1134, pruned_loss=0.01341, audio_tagging_loss=0.007327, over 14753.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09016, pruned_loss=0.01257, audio_tagging_loss=0.008902, over 3048378.78 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:55:29,887 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.82 vs. limit=15.0 2023-11-28 00:55:30,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3280026.6666666665, ans=0.125 2023-11-28 00:55:31,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3280026.6666666665, ans=0.05 2023-11-28 00:55:50,939 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2023-11-28 00:56:06,652 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2023-11-28 00:56:12,653 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492050 2023-11-28 00:56:17,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3280293.3333333335, ans=0.125 2023-11-28 00:56:19,110 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11100, loss[loss=0.08324, simple_loss=0.1136, pruned_loss=0.01739, audio_tagging_loss=0.009035, over 14796.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09012, pruned_loss=0.01254, audio_tagging_loss=0.008992, over 3052770.09 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:56:26,426 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.422e+01 8.769e+01 9.313e+01 9.922e+01 1.261e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-28 00:56:28,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3280360.0, ans=0.125 2023-11-28 00:56:36,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3280426.6666666665, ans=0.125 2023-11-28 00:56:42,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3280493.3333333335, ans=0.125 2023-11-28 00:56:43,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3280493.3333333335, ans=0.125 2023-11-28 00:56:58,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3280560.0, ans=0.1 2023-11-28 00:56:59,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3280560.0, ans=0.125 2023-11-28 00:57:09,873 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492100 2023-11-28 00:57:13,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3280626.6666666665, ans=0.125 2023-11-28 00:57:16,938 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11150, loss[loss=0.06842, simple_loss=0.09582, pruned_loss=0.01065, audio_tagging_loss=0.009856, over 16084.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09039, pruned_loss=0.01252, audio_tagging_loss=0.009005, over 3054138.46 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:57:17,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3280693.3333333335, ans=0.125 2023-11-28 00:57:38,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3280826.6666666665, ans=0.1 2023-11-28 00:57:51,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3280893.3333333335, ans=0.0 2023-11-28 00:58:07,330 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492150 2023-11-28 00:58:07,658 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=15.0 2023-11-28 00:58:10,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3280960.0, ans=10.0 2023-11-28 00:58:13,839 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11200, loss[loss=0.07233, simple_loss=0.09252, pruned_loss=0.01651, audio_tagging_loss=0.009557, over 14709.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09026, pruned_loss=0.01244, audio_tagging_loss=0.009071, over 3050170.66 frames. ], batch size: 54, lr: 1.64e-03, grad_scale: 32.0 2023-11-28 00:58:14,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3281026.6666666665, ans=0.0 2023-11-28 00:58:19,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3281026.6666666665, ans=0.125 2023-11-28 00:58:20,459 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 8.951e+01 9.684e+01 1.065e+02 1.269e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-28 00:58:21,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3281026.6666666665, ans=0.125 2023-11-28 00:58:26,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3281093.3333333335, ans=0.1 2023-11-28 00:58:29,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3281093.3333333335, ans=0.0 2023-11-28 00:58:34,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3281093.3333333335, ans=0.125 2023-11-28 00:58:43,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3281160.0, ans=0.125 2023-11-28 00:58:52,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3281226.6666666665, ans=0.1 2023-11-28 00:58:59,062 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 00:59:02,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3281293.3333333335, ans=0.1 2023-11-28 00:59:04,323 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492200 2023-11-28 00:59:08,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=3281293.3333333335, ans=15.0 2023-11-28 00:59:11,108 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11250, loss[loss=0.0745, simple_loss=0.0976, pruned_loss=0.01684, audio_tagging_loss=0.008864, over 15840.00 frames. ], tot_loss[loss=0.06676, simple_loss=0.09037, pruned_loss=0.01249, audio_tagging_loss=0.009077, over 3052988.13 frames. ], batch size: 58, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 00:59:13,483 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.03 vs. limit=15.0 2023-11-28 00:59:25,559 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.29 vs. limit=15.0 2023-11-28 00:59:48,390 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2023-11-28 00:59:50,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3281560.0, ans=0.0 2023-11-28 01:00:01,793 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492250 2023-11-28 01:00:08,267 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11300, loss[loss=0.07861, simple_loss=0.1077, pruned_loss=0.0147, audio_tagging_loss=0.01004, over 15063.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09041, pruned_loss=0.0125, audio_tagging_loss=0.008836, over 3044891.25 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:00:17,046 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 8.951e+01 9.399e+01 1.008e+02 1.489e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 01:00:20,656 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:00:23,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3281760.0, ans=0.1 2023-11-28 01:00:27,412 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=12.0 2023-11-28 01:00:38,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3281826.6666666665, ans=0.1 2023-11-28 01:00:58,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3281960.0, ans=0.125 2023-11-28 01:00:59,887 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492300 2023-11-28 01:01:06,436 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11350, loss[loss=0.08492, simple_loss=0.1263, pruned_loss=0.0172, audio_tagging_loss=0.004556, over 15896.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.0903, pruned_loss=0.01242, audio_tagging_loss=0.008665, over 3053365.77 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:01:06,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3282026.6666666665, ans=0.125 2023-11-28 01:01:21,098 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.89 vs. limit=22.5 2023-11-28 01:01:56,570 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492350 2023-11-28 01:02:02,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3282360.0, ans=0.0 2023-11-28 01:02:02,939 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11400, loss[loss=0.0714, simple_loss=0.1003, pruned_loss=0.01135, audio_tagging_loss=0.009887, over 15079.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09038, pruned_loss=0.01255, audio_tagging_loss=0.008481, over 3050796.22 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:02:11,607 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.540e+01 8.755e+01 9.421e+01 1.020e+02 1.331e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 01:02:11,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3282360.0, ans=0.125 2023-11-28 01:02:13,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3282426.6666666665, ans=0.09899494936611666 2023-11-28 01:02:22,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3282426.6666666665, ans=0.125 2023-11-28 01:02:40,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3282560.0, ans=0.125 2023-11-28 01:02:42,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3282560.0, ans=0.2 2023-11-28 01:02:53,659 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492400 2023-11-28 01:02:55,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3282626.6666666665, ans=0.125 2023-11-28 01:02:56,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3282626.6666666665, ans=0.1 2023-11-28 01:03:00,925 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11450, loss[loss=0.04736, simple_loss=0.06399, pruned_loss=0.005435, audio_tagging_loss=0.009934, over 15527.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08973, pruned_loss=0.01249, audio_tagging_loss=0.008524, over 3041330.07 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:03:17,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.39 vs. limit=10.0 2023-11-28 01:03:37,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3282893.3333333335, ans=0.125 2023-11-28 01:03:42,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3282893.3333333335, ans=0.0 2023-11-28 01:03:50,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3282960.0, ans=0.125 2023-11-28 01:03:52,004 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492450 2023-11-28 01:03:57,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3283026.6666666665, ans=0.125 2023-11-28 01:03:58,519 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11500, loss[loss=0.06674, simple_loss=0.09115, pruned_loss=0.01297, audio_tagging_loss=0.008193, over 15233.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08874, pruned_loss=0.0124, audio_tagging_loss=0.008525, over 3042683.16 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:04:06,710 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.678e+01 9.475e+01 1.027e+02 1.615e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 01:04:11,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3283093.3333333335, ans=0.125 2023-11-28 01:04:21,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3283160.0, ans=0.0 2023-11-28 01:04:31,112 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.70 vs. limit=22.5 2023-11-28 01:04:46,616 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.19 vs. limit=10.0 2023-11-28 01:04:49,221 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492500 2023-11-28 01:04:52,631 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:04:55,706 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11550, loss[loss=0.07741, simple_loss=0.103, pruned_loss=0.01619, audio_tagging_loss=0.009729, over 14940.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08927, pruned_loss=0.01241, audio_tagging_loss=0.008537, over 3050881.42 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:05:02,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3283360.0, ans=0.125 2023-11-28 01:05:02,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3283360.0, ans=0.125 2023-11-28 01:05:32,973 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:05:42,150 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2023-11-28 01:05:46,485 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492550 2023-11-28 01:05:53,380 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11600, loss[loss=0.05926, simple_loss=0.08548, pruned_loss=0.008863, audio_tagging_loss=0.007655, over 15112.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08908, pruned_loss=0.01232, audio_tagging_loss=0.008747, over 3046462.93 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:05:54,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3283693.3333333335, ans=0.0 2023-11-28 01:06:03,692 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 9.077e+01 9.650e+01 1.023e+02 1.320e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 01:06:08,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3283760.0, ans=0.0 2023-11-28 01:06:25,083 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2023-11-28 01:06:39,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3283960.0, ans=0.1 2023-11-28 01:06:43,870 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492600 2023-11-28 01:06:50,992 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11650, loss[loss=0.07199, simple_loss=0.09765, pruned_loss=0.01548, audio_tagging_loss=0.00768, over 15562.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08917, pruned_loss=0.01235, audio_tagging_loss=0.008744, over 3045904.05 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:06:53,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3284026.6666666665, ans=0.125 2023-11-28 01:06:59,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3284026.6666666665, ans=0.1 2023-11-28 01:07:11,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3284093.3333333335, ans=0.125 2023-11-28 01:07:30,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3284226.6666666665, ans=0.125 2023-11-28 01:07:32,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3284226.6666666665, ans=0.125 2023-11-28 01:07:41,350 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492650 2023-11-28 01:07:48,463 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11700, loss[loss=0.06458, simple_loss=0.09578, pruned_loss=0.01049, audio_tagging_loss=0.006194, over 16736.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08922, pruned_loss=0.01237, audio_tagging_loss=0.008814, over 3048197.93 frames. ], batch size: 59, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:07:49,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3284360.0, ans=0.09899494936611666 2023-11-28 01:07:58,788 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.595e+01 8.938e+01 9.502e+01 1.017e+02 1.872e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 01:08:02,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3284426.6666666665, ans=0.125 2023-11-28 01:08:19,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3284493.3333333335, ans=0.0 2023-11-28 01:08:26,277 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:08:39,182 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492700 2023-11-28 01:08:45,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3284693.3333333335, ans=0.125 2023-11-28 01:08:46,008 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11750, loss[loss=0.05403, simple_loss=0.06811, pruned_loss=0.008741, audio_tagging_loss=0.01123, over 15839.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08899, pruned_loss=0.01236, audio_tagging_loss=0.008821, over 3045021.53 frames. ], batch size: 61, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:08:48,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3284693.3333333335, ans=0.125 2023-11-28 01:08:51,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3284693.3333333335, ans=0.125 2023-11-28 01:09:04,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3284760.0, ans=0.0 2023-11-28 01:09:36,236 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492750 2023-11-28 01:09:43,363 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11800, loss[loss=0.08559, simple_loss=0.1092, pruned_loss=0.02214, audio_tagging_loss=0.00883, over 15416.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08923, pruned_loss=0.01246, audio_tagging_loss=0.008888, over 3043078.76 frames. ], batch size: 57, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:09:53,164 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 8.795e+01 9.542e+01 1.022e+02 1.386e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 01:09:56,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3285093.3333333335, ans=0.0 2023-11-28 01:09:56,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3285093.3333333335, ans=0.0 2023-11-28 01:10:07,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3285160.0, ans=0.2 2023-11-28 01:10:16,473 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.59 vs. limit=22.5 2023-11-28 01:10:33,695 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492800 2023-11-28 01:10:34,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3285293.3333333335, ans=0.125 2023-11-28 01:10:40,539 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11850, loss[loss=0.07352, simple_loss=0.1005, pruned_loss=0.01545, audio_tagging_loss=0.007808, over 15139.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08849, pruned_loss=0.01231, audio_tagging_loss=0.008989, over 3034698.35 frames. ], batch size: 55, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:11:02,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3285493.3333333335, ans=0.0 2023-11-28 01:11:18,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3285560.0, ans=0.1 2023-11-28 01:11:18,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3285560.0, ans=0.0 2023-11-28 01:11:19,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3285560.0, ans=0.125 2023-11-28 01:11:31,338 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492850 2023-11-28 01:11:36,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3285626.6666666665, ans=0.0 2023-11-28 01:11:38,308 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11900, loss[loss=0.1025, simple_loss=0.1419, pruned_loss=0.02402, audio_tagging_loss=0.007488, over 14823.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08942, pruned_loss=0.01239, audio_tagging_loss=0.009105, over 3043726.87 frames. ], batch size: 53, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:11:39,015 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2023-11-28 01:11:44,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3285693.3333333335, ans=0.04949747468305833 2023-11-28 01:11:48,641 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.534e+01 8.587e+01 9.286e+01 9.981e+01 1.301e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 01:12:03,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3285826.6666666665, ans=0.125 2023-11-28 01:12:09,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3285826.6666666665, ans=0.0 2023-11-28 01:12:29,074 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492900 2023-11-28 01:12:36,125 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 11950, loss[loss=0.05236, simple_loss=0.06862, pruned_loss=0.008439, audio_tagging_loss=0.009605, over 17305.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08933, pruned_loss=0.01225, audio_tagging_loss=0.009075, over 3049765.83 frames. ], batch size: 66, lr: 1.64e-03, grad_scale: 8.0 2023-11-28 01:12:40,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3286026.6666666665, ans=0.125 2023-11-28 01:12:47,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3286093.3333333335, ans=0.125 2023-11-28 01:13:23,100 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2023-11-28 01:13:24,820 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 492950 2023-11-28 01:13:30,982 INFO [train_asr.py:1235] (2/4) Epoch 41, batch 12000, loss[loss=0.07895, simple_loss=0.112, pruned_loss=0.01556, audio_tagging_loss=0.007396, over 15101.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09034, pruned_loss=0.0125, audio_tagging_loss=0.009129, over 3055335.35 frames. ], batch size: 56, lr: 1.64e-03, grad_scale: 16.0 2023-11-28 01:13:30,982 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 01:13:46,485 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.1130, 5.8100, 5.5104, 5.5630], device='cuda:2') 2023-11-28 01:13:47,646 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9626, 3.5031, 5.2382, 3.2579], device='cuda:2') 2023-11-28 01:14:05,675 INFO [train_asr.py:1267] (2/4) Epoch 41, validation: loss=0.05796, simple_loss=0.05063, pruned_loss=0.005209, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-28 01:14:05,676 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 01:14:14,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-11-28 01:14:15,051 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.802e+01 9.296e+01 1.010e+02 1.466e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 01:14:48,424 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 0, loss[loss=0.07069, simple_loss=0.07086, pruned_loss=0.01163, audio_tagging_loss=0.02363, over 16824.00 frames. ], tot_loss[loss=0.07069, simple_loss=0.07086, pruned_loss=0.01163, audio_tagging_loss=0.02363, over 16824.00 frames. ], batch size: 63, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:14:48,425 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 01:15:03,323 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.7715, 3.2801, 4.5354, 3.5453], device='cuda:2') 2023-11-28 01:15:22,242 INFO [train_asr.py:1267] (2/4) Epoch 42, validation: loss=0.05771, simple_loss=0.05063, pruned_loss=0.005208, audio_tagging_loss=0.02719, over 4681554.00 frames. 2023-11-28 01:15:22,242 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 01:15:24,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3286513.3333333335, ans=10.0 2023-11-28 01:15:24,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3286513.3333333335, ans=0.95 2023-11-28 01:15:31,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3286513.3333333335, ans=0.125 2023-11-28 01:15:31,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3286513.3333333335, ans=0.5 2023-11-28 01:15:39,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3286580.0, ans=0.0 2023-11-28 01:15:45,133 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2023-11-28 01:15:45,691 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493000 2023-11-28 01:15:48,820 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-28 01:16:03,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3286713.3333333335, ans=0.035 2023-11-28 01:16:08,443 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=12.0 2023-11-28 01:16:14,523 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=15.0 2023-11-28 01:16:17,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3286780.0, ans=0.125 2023-11-28 01:16:19,448 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 50, loss[loss=0.07391, simple_loss=0.09151, pruned_loss=0.01538, audio_tagging_loss=0.01277, over 14412.00 frames. ], tot_loss[loss=0.07294, simple_loss=0.08813, pruned_loss=0.01206, audio_tagging_loss=0.01681, over 684411.99 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:16:24,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3286846.6666666665, ans=0.0 2023-11-28 01:16:43,987 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493050 2023-11-28 01:16:56,413 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-28 01:17:01,282 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 9.634e+01 1.031e+02 1.127e+02 1.457e+02, threshold=2.062e+02, percent-clipped=0.0 2023-11-28 01:17:16,853 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 100, loss[loss=0.09154, simple_loss=0.1358, pruned_loss=0.01641, audio_tagging_loss=0.007252, over 15551.00 frames. ], tot_loss[loss=0.07322, simple_loss=0.08939, pruned_loss=0.01254, audio_tagging_loss=0.01599, over 1202746.72 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:17:34,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3287246.6666666665, ans=0.0 2023-11-28 01:17:42,044 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493100 2023-11-28 01:17:43,457 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.47 vs. limit=10.0 2023-11-28 01:17:47,102 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.42 vs. limit=12.0 2023-11-28 01:18:15,153 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 150, loss[loss=0.07987, simple_loss=0.1042, pruned_loss=0.01521, audio_tagging_loss=0.01257, over 16135.00 frames. ], tot_loss[loss=0.07181, simple_loss=0.08942, pruned_loss=0.01259, audio_tagging_loss=0.01451, over 1613899.78 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:18:16,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3287513.3333333335, ans=0.2 2023-11-28 01:18:30,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3287580.0, ans=0.125 2023-11-28 01:18:30,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2023-11-28 01:18:38,952 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493150 2023-11-28 01:18:39,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3287646.6666666665, ans=0.1 2023-11-28 01:18:57,021 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 8.959e+01 9.616e+01 1.058e+02 1.322e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-28 01:18:59,693 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.65 vs. limit=15.0 2023-11-28 01:19:12,871 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 200, loss[loss=0.07982, simple_loss=0.1033, pruned_loss=0.01763, audio_tagging_loss=0.01054, over 15955.00 frames. ], tot_loss[loss=0.07042, simple_loss=0.09047, pruned_loss=0.01249, audio_tagging_loss=0.0127, over 1934502.78 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:19:25,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3287913.3333333335, ans=0.125 2023-11-28 01:19:37,403 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493200 2023-11-28 01:19:40,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3287980.0, ans=0.125 2023-11-28 01:19:45,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3287980.0, ans=0.125 2023-11-28 01:19:59,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=3288113.3333333335, ans=22.5 2023-11-28 01:20:08,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3288113.3333333335, ans=0.0 2023-11-28 01:20:09,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3288113.3333333335, ans=0.125 2023-11-28 01:20:11,194 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 250, loss[loss=0.06901, simple_loss=0.09117, pruned_loss=0.01135, audio_tagging_loss=0.01207, over 13935.00 frames. ], tot_loss[loss=0.06952, simple_loss=0.09127, pruned_loss=0.01251, audio_tagging_loss=0.01137, over 2182383.92 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:20:23,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3288246.6666666665, ans=0.2 2023-11-28 01:20:27,980 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=12.0 2023-11-28 01:20:36,481 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493250 2023-11-28 01:20:37,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3288313.3333333335, ans=0.0 2023-11-28 01:20:42,702 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=15.0 2023-11-28 01:20:49,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3288380.0, ans=0.125 2023-11-28 01:20:52,861 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 9.203e+01 9.691e+01 1.057e+02 1.267e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-28 01:20:53,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3288380.0, ans=0.2 2023-11-28 01:21:09,018 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 300, loss[loss=0.06721, simple_loss=0.09175, pruned_loss=0.01265, audio_tagging_loss=0.008687, over 14540.00 frames. ], tot_loss[loss=0.06869, simple_loss=0.09108, pruned_loss=0.01257, audio_tagging_loss=0.01058, over 2368121.52 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:21:09,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3288513.3333333335, ans=0.025 2023-11-28 01:21:33,285 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493300 2023-11-28 01:21:37,362 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.89 vs. limit=15.0 2023-11-28 01:21:40,287 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.70 vs. limit=10.0 2023-11-28 01:21:44,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3288713.3333333335, ans=0.0 2023-11-28 01:21:55,705 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.60 vs. limit=15.0 2023-11-28 01:22:00,124 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:22:04,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3288780.0, ans=0.09899494936611666 2023-11-28 01:22:06,950 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 350, loss[loss=0.06197, simple_loss=0.08253, pruned_loss=0.01175, audio_tagging_loss=0.008953, over 14541.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09005, pruned_loss=0.01226, audio_tagging_loss=0.01003, over 2519326.68 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:22:11,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3288846.6666666665, ans=0.0 2023-11-28 01:22:31,255 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493350 2023-11-28 01:22:41,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3289046.6666666665, ans=0.1 2023-11-28 01:22:49,682 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.671e+01 9.361e+01 1.014e+02 1.227e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 01:22:53,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3289113.3333333335, ans=0.125 2023-11-28 01:22:55,819 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2023-11-28 01:23:03,883 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 400, loss[loss=0.06903, simple_loss=0.1006, pruned_loss=0.01033, audio_tagging_loss=0.008423, over 14410.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.08902, pruned_loss=0.01208, audio_tagging_loss=0.009814, over 2634503.32 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:23:27,948 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493400 2023-11-28 01:23:28,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289313.3333333335, ans=0.1 2023-11-28 01:23:33,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3289313.3333333335, ans=0.125 2023-11-28 01:23:35,051 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.69 vs. limit=15.0 2023-11-28 01:23:36,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3289313.3333333335, ans=0.125 2023-11-28 01:23:43,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3289380.0, ans=0.125 2023-11-28 01:23:48,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3289380.0, ans=0.125 2023-11-28 01:23:50,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3289446.6666666665, ans=0.0 2023-11-28 01:23:51,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3289446.6666666665, ans=0.125 2023-11-28 01:23:54,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3289446.6666666665, ans=0.0 2023-11-28 01:23:56,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3289446.6666666665, ans=0.0 2023-11-28 01:23:58,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3289446.6666666665, ans=0.125 2023-11-28 01:24:01,998 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 450, loss[loss=0.05435, simple_loss=0.07613, pruned_loss=0.005115, audio_tagging_loss=0.01118, over 14316.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.08967, pruned_loss=0.01236, audio_tagging_loss=0.009547, over 2727606.02 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:24:06,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3289513.3333333335, ans=0.125 2023-11-28 01:24:17,718 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:24:18,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3289580.0, ans=0.2 2023-11-28 01:24:26,407 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493450 2023-11-28 01:24:36,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289713.3333333335, ans=0.1 2023-11-28 01:24:45,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.317e+01 8.828e+01 9.254e+01 1.009e+02 1.850e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-28 01:24:50,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3289780.0, ans=0.1 2023-11-28 01:24:52,909 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-11-28 01:24:59,848 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 500, loss[loss=0.05816, simple_loss=0.07823, pruned_loss=0.00881, audio_tagging_loss=0.01024, over 16695.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08924, pruned_loss=0.0122, audio_tagging_loss=0.009442, over 2800037.97 frames. ], batch size: 62, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:25:12,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3289913.3333333335, ans=0.125 2023-11-28 01:25:13,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3289913.3333333335, ans=0.125 2023-11-28 01:25:15,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3289913.3333333335, ans=0.1 2023-11-28 01:25:17,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3289913.3333333335, ans=0.125 2023-11-28 01:25:19,849 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.25 vs. limit=15.0 2023-11-28 01:25:23,582 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493500 2023-11-28 01:25:28,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.68 vs. limit=12.0 2023-11-28 01:25:31,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3289980.0, ans=0.125 2023-11-28 01:25:57,457 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 550, loss[loss=0.07334, simple_loss=0.1008, pruned_loss=0.01489, audio_tagging_loss=0.008039, over 14160.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08949, pruned_loss=0.0123, audio_tagging_loss=0.009304, over 2859601.09 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:26:15,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3290246.6666666665, ans=0.125 2023-11-28 01:26:21,487 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493550 2023-11-28 01:26:31,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3290380.0, ans=0.125 2023-11-28 01:26:41,288 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.718e+01 9.476e+01 1.036e+02 1.288e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 01:26:55,500 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 600, loss[loss=0.06841, simple_loss=0.09397, pruned_loss=0.01234, audio_tagging_loss=0.009091, over 14787.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09039, pruned_loss=0.01231, audio_tagging_loss=0.009053, over 2903881.61 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:27:02,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3290513.3333333335, ans=0.125 2023-11-28 01:27:02,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3290513.3333333335, ans=0.2 2023-11-28 01:27:03,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3290513.3333333335, ans=0.125 2023-11-28 01:27:17,935 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2023-11-28 01:27:19,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3290646.6666666665, ans=0.2 2023-11-28 01:27:20,224 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493600 2023-11-28 01:27:21,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3290646.6666666665, ans=0.125 2023-11-28 01:27:35,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3290713.3333333335, ans=0.125 2023-11-28 01:27:39,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2023-11-28 01:27:42,031 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.80 vs. limit=15.0 2023-11-28 01:27:53,954 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 650, loss[loss=0.05652, simple_loss=0.06877, pruned_loss=0.007914, audio_tagging_loss=0.01422, over 14911.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.0905, pruned_loss=0.01245, audio_tagging_loss=0.008997, over 2938885.37 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:27:55,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3290846.6666666665, ans=0.1 2023-11-28 01:28:03,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.37 vs. limit=15.0 2023-11-28 01:28:13,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3290913.3333333335, ans=0.5 2023-11-28 01:28:17,843 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493650 2023-11-28 01:28:23,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3290980.0, ans=0.0 2023-11-28 01:28:24,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3290980.0, ans=0.125 2023-11-28 01:28:38,891 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.095e+01 8.722e+01 9.347e+01 9.970e+01 1.370e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 01:28:51,644 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 700, loss[loss=0.0528, simple_loss=0.06928, pruned_loss=0.008766, audio_tagging_loss=0.009388, over 15216.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08992, pruned_loss=0.01222, audio_tagging_loss=0.008896, over 2964513.97 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:29:06,588 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-28 01:29:15,847 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493700 2023-11-28 01:29:19,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3291313.3333333335, ans=0.125 2023-11-28 01:29:26,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3291380.0, ans=0.125 2023-11-28 01:29:30,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3291380.0, ans=6.0 2023-11-28 01:29:39,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3291446.6666666665, ans=0.0 2023-11-28 01:29:41,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3291446.6666666665, ans=0.0 2023-11-28 01:29:47,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3291446.6666666665, ans=0.125 2023-11-28 01:29:49,680 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 750, loss[loss=0.0627, simple_loss=0.0875, pruned_loss=0.009731, audio_tagging_loss=0.009218, over 14867.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09005, pruned_loss=0.01231, audio_tagging_loss=0.008939, over 2982065.65 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 4.0 2023-11-28 01:29:54,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3291513.3333333335, ans=0.5 2023-11-28 01:30:13,466 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.30 vs. limit=22.5 2023-11-28 01:30:13,879 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493750 2023-11-28 01:30:22,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3291646.6666666665, ans=0.1 2023-11-28 01:30:29,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3291713.3333333335, ans=0.0 2023-11-28 01:30:30,300 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.74 vs. limit=22.5 2023-11-28 01:30:36,020 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.598e+01 9.375e+01 9.953e+01 1.444e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-28 01:30:45,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3291780.0, ans=0.1 2023-11-28 01:30:47,078 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 800, loss[loss=0.05279, simple_loss=0.06779, pruned_loss=0.007159, audio_tagging_loss=0.01173, over 14858.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09144, pruned_loss=0.0127, audio_tagging_loss=0.008932, over 3009298.88 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:30:55,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3291846.6666666665, ans=0.035 2023-11-28 01:31:06,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3291913.3333333335, ans=0.0 2023-11-28 01:31:07,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3291913.3333333335, ans=0.0 2023-11-28 01:31:11,480 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493800 2023-11-28 01:31:16,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3291980.0, ans=0.0 2023-11-28 01:31:19,876 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2023-11-28 01:31:21,082 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.27 vs. limit=22.5 2023-11-28 01:31:28,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3292046.6666666665, ans=0.05 2023-11-28 01:31:32,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3292113.3333333335, ans=0.125 2023-11-28 01:31:33,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3292113.3333333335, ans=0.0 2023-11-28 01:31:45,316 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 850, loss[loss=0.07291, simple_loss=0.0977, pruned_loss=0.01549, audio_tagging_loss=0.008564, over 15009.00 frames. ], tot_loss[loss=0.06712, simple_loss=0.09081, pruned_loss=0.01267, audio_tagging_loss=0.009039, over 3018400.60 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:31:57,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3292246.6666666665, ans=0.0 2023-11-28 01:32:04,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3292246.6666666665, ans=0.125 2023-11-28 01:32:04,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3292246.6666666665, ans=0.2 2023-11-28 01:32:09,993 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493850 2023-11-28 01:32:13,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3292313.3333333335, ans=0.07 2023-11-28 01:32:17,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3292313.3333333335, ans=0.125 2023-11-28 01:32:31,569 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.796e+01 9.707e+01 1.029e+02 1.774e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-28 01:32:43,117 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 900, loss[loss=0.06949, simple_loss=0.08966, pruned_loss=0.01152, audio_tagging_loss=0.01314, over 14813.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.08995, pruned_loss=0.0124, audio_tagging_loss=0.009157, over 3021591.46 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:32:47,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3292513.3333333335, ans=0.1 2023-11-28 01:33:01,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3292580.0, ans=0.07 2023-11-28 01:33:07,765 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493900 2023-11-28 01:33:14,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3292646.6666666665, ans=0.125 2023-11-28 01:33:20,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3292713.3333333335, ans=0.04949747468305833 2023-11-28 01:33:25,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3292713.3333333335, ans=0.0 2023-11-28 01:33:26,440 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.52 vs. limit=15.0 2023-11-28 01:33:41,068 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 950, loss[loss=0.06609, simple_loss=0.08499, pruned_loss=0.01509, audio_tagging_loss=0.008504, over 15815.00 frames. ], tot_loss[loss=0.06702, simple_loss=0.09115, pruned_loss=0.01252, audio_tagging_loss=0.008927, over 3028308.54 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:33:51,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3292913.3333333335, ans=0.125 2023-11-28 01:33:58,516 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.13 vs. limit=6.0 2023-11-28 01:34:05,455 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 493950 2023-11-28 01:34:05,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3292980.0, ans=0.125 2023-11-28 01:34:05,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3292980.0, ans=0.0 2023-11-28 01:34:05,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3292980.0, ans=0.2 2023-11-28 01:34:18,777 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.85 vs. limit=22.5 2023-11-28 01:34:27,166 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.099e+01 8.712e+01 9.268e+01 9.903e+01 1.259e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-28 01:34:38,900 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1000, loss[loss=0.07334, simple_loss=0.09997, pruned_loss=0.0146, audio_tagging_loss=0.00875, over 16537.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.0905, pruned_loss=0.01239, audio_tagging_loss=0.008818, over 3022415.25 frames. ], batch size: 60, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:34:40,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3293180.0, ans=0.5 2023-11-28 01:34:42,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3293180.0, ans=0.1 2023-11-28 01:35:02,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3293313.3333333335, ans=0.0 2023-11-28 01:35:03,350 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494000 2023-11-28 01:35:05,785 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:35:07,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3293313.3333333335, ans=0.2 2023-11-28 01:35:10,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3293313.3333333335, ans=0.2 2023-11-28 01:35:20,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3293380.0, ans=0.0 2023-11-28 01:35:35,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3293513.3333333335, ans=0.125 2023-11-28 01:35:37,062 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1050, loss[loss=0.06357, simple_loss=0.08483, pruned_loss=0.01153, audio_tagging_loss=0.009622, over 16910.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08975, pruned_loss=0.01237, audio_tagging_loss=0.008719, over 3025394.36 frames. ], batch size: 65, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:35:37,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3293513.3333333335, ans=0.0 2023-11-28 01:35:43,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3293513.3333333335, ans=0.0 2023-11-28 01:35:56,351 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.71 vs. limit=10.0 2023-11-28 01:35:59,904 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.90 vs. limit=12.0 2023-11-28 01:36:00,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3293646.6666666665, ans=0.1 2023-11-28 01:36:01,749 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494050 2023-11-28 01:36:15,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3293713.3333333335, ans=0.1 2023-11-28 01:36:18,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3293713.3333333335, ans=0.0 2023-11-28 01:36:22,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3293780.0, ans=0.125 2023-11-28 01:36:23,363 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.763e+01 8.510e+01 9.155e+01 1.003e+02 1.223e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-28 01:36:35,318 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1100, loss[loss=0.06981, simple_loss=0.1059, pruned_loss=0.01052, audio_tagging_loss=0.006358, over 14617.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08986, pruned_loss=0.01248, audio_tagging_loss=0.008622, over 3029977.13 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:36:39,871 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:36:41,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3293846.6666666665, ans=0.125 2023-11-28 01:36:42,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3293846.6666666665, ans=0.1 2023-11-28 01:36:57,440 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0 2023-11-28 01:36:58,989 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494100 2023-11-28 01:37:18,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3294046.6666666665, ans=0.125 2023-11-28 01:37:32,933 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1150, loss[loss=0.05277, simple_loss=0.06464, pruned_loss=0.008211, audio_tagging_loss=0.01224, over 15993.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08869, pruned_loss=0.01222, audio_tagging_loss=0.008677, over 3036327.83 frames. ], batch size: 62, lr: 1.62e-03, grad_scale: 8.0 2023-11-28 01:37:38,970 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=22.5 2023-11-28 01:37:42,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3294246.6666666665, ans=0.125 2023-11-28 01:37:57,723 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494150 2023-11-28 01:38:18,792 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.809e+01 8.708e+01 9.340e+01 1.012e+02 1.442e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 01:38:28,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3294446.6666666665, ans=0.0 2023-11-28 01:38:29,972 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1200, loss[loss=0.07223, simple_loss=0.09308, pruned_loss=0.01503, audio_tagging_loss=0.01065, over 14503.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.0888, pruned_loss=0.01225, audio_tagging_loss=0.008654, over 3035252.89 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:38:37,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3294513.3333333335, ans=0.125 2023-11-28 01:38:51,637 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2023-11-28 01:38:54,949 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494200 2023-11-28 01:39:06,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3294713.3333333335, ans=0.0 2023-11-28 01:39:06,834 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.72 vs. limit=15.0 2023-11-28 01:39:10,099 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.81 vs. limit=15.0 2023-11-28 01:39:29,052 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1250, loss[loss=0.06104, simple_loss=0.08798, pruned_loss=0.009004, audio_tagging_loss=0.008047, over 14405.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.0883, pruned_loss=0.01207, audio_tagging_loss=0.008615, over 3045984.50 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:39:38,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3294846.6666666665, ans=0.0 2023-11-28 01:39:51,062 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-28 01:39:52,902 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494250 2023-11-28 01:39:54,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3294980.0, ans=0.1 2023-11-28 01:39:55,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3294980.0, ans=0.0 2023-11-28 01:40:10,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3295046.6666666665, ans=0.125 2023-11-28 01:40:15,302 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 8.552e+01 9.215e+01 9.963e+01 1.305e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-28 01:40:16,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3295113.3333333335, ans=0.125 2023-11-28 01:40:26,868 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1300, loss[loss=0.0641, simple_loss=0.0922, pruned_loss=0.01004, audio_tagging_loss=0.007961, over 15565.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08795, pruned_loss=0.01194, audio_tagging_loss=0.008566, over 3034012.36 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:40:31,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3295180.0, ans=0.125 2023-11-28 01:40:35,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3295180.0, ans=0.125 2023-11-28 01:40:51,069 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494300 2023-11-28 01:40:52,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3295313.3333333335, ans=0.125 2023-11-28 01:41:01,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3295380.0, ans=0.125 2023-11-28 01:41:14,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3295446.6666666665, ans=0.1 2023-11-28 01:41:17,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3295446.6666666665, ans=0.125 2023-11-28 01:41:23,998 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1350, loss[loss=0.08257, simple_loss=0.109, pruned_loss=0.021, audio_tagging_loss=0.007057, over 15238.00 frames. ], tot_loss[loss=0.06403, simple_loss=0.08719, pruned_loss=0.0118, audio_tagging_loss=0.008633, over 3031802.18 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:41:26,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3295513.3333333335, ans=0.05 2023-11-28 01:41:32,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3295513.3333333335, ans=0.1 2023-11-28 01:41:48,824 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494350 2023-11-28 01:41:50,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3295646.6666666665, ans=0.1 2023-11-28 01:41:50,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3295646.6666666665, ans=0.125 2023-11-28 01:41:51,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3295646.6666666665, ans=0.125 2023-11-28 01:41:53,104 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.40 vs. limit=10.0 2023-11-28 01:42:01,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3295713.3333333335, ans=0.0 2023-11-28 01:42:05,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=3295713.3333333335, ans=0.02 2023-11-28 01:42:08,153 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:42:10,305 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.515e+01 9.138e+01 9.769e+01 1.555e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-28 01:42:14,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3295780.0, ans=0.125 2023-11-28 01:42:16,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3295780.0, ans=0.125 2023-11-28 01:42:22,272 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1400, loss[loss=0.05975, simple_loss=0.08536, pruned_loss=0.009952, audio_tagging_loss=0.007115, over 16554.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08747, pruned_loss=0.01188, audio_tagging_loss=0.008745, over 3040789.46 frames. ], batch size: 63, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:42:37,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3295913.3333333335, ans=6.0 2023-11-28 01:42:41,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3295913.3333333335, ans=0.0 2023-11-28 01:42:46,776 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494400 2023-11-28 01:43:02,595 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=22.5 2023-11-28 01:43:20,763 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1450, loss[loss=0.06269, simple_loss=0.08428, pruned_loss=0.01056, audio_tagging_loss=0.009991, over 14849.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08838, pruned_loss=0.01211, audio_tagging_loss=0.008748, over 3039585.71 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:43:38,938 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.58 vs. limit=10.0 2023-11-28 01:43:39,786 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2023-11-28 01:43:44,163 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494450 2023-11-28 01:44:06,549 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.683e+01 8.990e+01 9.437e+01 1.012e+02 1.630e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 01:44:15,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3296446.6666666665, ans=0.0 2023-11-28 01:44:17,586 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1500, loss[loss=0.07273, simple_loss=0.09383, pruned_loss=0.01696, audio_tagging_loss=0.008854, over 16430.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08908, pruned_loss=0.01227, audio_tagging_loss=0.008816, over 3042981.13 frames. ], batch size: 62, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:44:21,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3296513.3333333335, ans=0.2 2023-11-28 01:44:42,236 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494500 2023-11-28 01:44:46,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3296646.6666666665, ans=0.2 2023-11-28 01:45:01,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3296713.3333333335, ans=0.0 2023-11-28 01:45:03,242 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2023-11-28 01:45:15,436 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.40 vs. limit=15.0 2023-11-28 01:45:15,935 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1550, loss[loss=0.0758, simple_loss=0.1018, pruned_loss=0.01566, audio_tagging_loss=0.009224, over 15353.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08965, pruned_loss=0.0124, audio_tagging_loss=0.008793, over 3049182.82 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:45:23,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3296846.6666666665, ans=0.0 2023-11-28 01:45:30,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3296913.3333333335, ans=0.125 2023-11-28 01:45:37,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3296913.3333333335, ans=0.125 2023-11-28 01:45:40,189 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494550 2023-11-28 01:45:42,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3296980.0, ans=0.125 2023-11-28 01:46:01,874 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.675e+01 9.125e+01 9.756e+01 1.252e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-28 01:46:13,992 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1600, loss[loss=0.05381, simple_loss=0.0752, pruned_loss=0.0093, audio_tagging_loss=0.006908, over 14851.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.08994, pruned_loss=0.01247, audio_tagging_loss=0.008828, over 3045804.35 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:46:21,942 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:46:37,090 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494600 2023-11-28 01:46:37,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2023-11-28 01:46:42,867 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2023-11-28 01:47:04,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3297446.6666666665, ans=0.2 2023-11-28 01:47:10,407 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1650, loss[loss=0.05449, simple_loss=0.07301, pruned_loss=0.007061, audio_tagging_loss=0.01092, over 15558.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08928, pruned_loss=0.0124, audio_tagging_loss=0.008956, over 3053836.23 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:47:16,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3297513.3333333335, ans=0.0 2023-11-28 01:47:24,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3297580.0, ans=0.2 2023-11-28 01:47:28,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3297580.0, ans=0.0 2023-11-28 01:47:31,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3297580.0, ans=6.0 2023-11-28 01:47:34,419 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494650 2023-11-28 01:47:52,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3297713.3333333335, ans=0.125 2023-11-28 01:47:57,554 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.801e+01 9.426e+01 1.009e+02 1.226e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 01:47:59,194 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.48 vs. limit=22.5 2023-11-28 01:48:01,667 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.53 vs. limit=10.0 2023-11-28 01:48:08,364 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1700, loss[loss=0.05236, simple_loss=0.06584, pruned_loss=0.008774, audio_tagging_loss=0.01066, over 14038.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08959, pruned_loss=0.01251, audio_tagging_loss=0.008912, over 3062089.16 frames. ], batch size: 53, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:48:19,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3297913.3333333335, ans=0.125 2023-11-28 01:48:31,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3297980.0, ans=0.125 2023-11-28 01:48:32,320 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494700 2023-11-28 01:48:40,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3297980.0, ans=0.05 2023-11-28 01:49:04,879 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1750, loss[loss=0.04426, simple_loss=0.05542, pruned_loss=0.008573, audio_tagging_loss=0.007975, over 14545.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08935, pruned_loss=0.01248, audio_tagging_loss=0.008855, over 3053429.47 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:49:11,707 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:49:13,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3298180.0, ans=0.0 2023-11-28 01:49:18,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3298246.6666666665, ans=0.125 2023-11-28 01:49:29,048 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494750 2023-11-28 01:49:29,488 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.02 vs. limit=6.0 2023-11-28 01:49:31,960 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.56 vs. limit=15.0 2023-11-28 01:49:35,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3298313.3333333335, ans=0.125 2023-11-28 01:49:46,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3298380.0, ans=0.1 2023-11-28 01:49:52,442 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.816e+01 9.529e+01 1.029e+02 1.383e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 01:49:55,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3298446.6666666665, ans=0.2 2023-11-28 01:49:57,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3298446.6666666665, ans=0.125 2023-11-28 01:50:01,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3298513.3333333335, ans=0.1 2023-11-28 01:50:02,959 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1800, loss[loss=0.05234, simple_loss=0.07259, pruned_loss=0.008419, audio_tagging_loss=0.007625, over 14962.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09045, pruned_loss=0.01254, audio_tagging_loss=0.008718, over 3057866.16 frames. ], batch size: 57, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:50:12,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3298580.0, ans=0.125 2023-11-28 01:50:14,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3298580.0, ans=0.0 2023-11-28 01:50:24,231 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.53 vs. limit=22.5 2023-11-28 01:50:26,963 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494800 2023-11-28 01:50:28,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3298646.6666666665, ans=0.125 2023-11-28 01:50:59,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3298846.6666666665, ans=0.0 2023-11-28 01:51:00,590 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1850, loss[loss=0.05284, simple_loss=0.0688, pruned_loss=0.007016, audio_tagging_loss=0.01142, over 14300.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09075, pruned_loss=0.01242, audio_tagging_loss=0.00862, over 3062618.95 frames. ], batch size: 55, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:51:06,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3298846.6666666665, ans=0.2 2023-11-28 01:51:15,024 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=12.0 2023-11-28 01:51:24,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3298980.0, ans=0.2 2023-11-28 01:51:25,211 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494850 2023-11-28 01:51:42,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3299046.6666666665, ans=0.125 2023-11-28 01:51:42,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3299046.6666666665, ans=0.125 2023-11-28 01:51:44,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3299046.6666666665, ans=0.0 2023-11-28 01:51:48,561 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.704e+01 9.342e+01 1.015e+02 1.516e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 01:51:57,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3299180.0, ans=0.125 2023-11-28 01:51:58,704 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1900, loss[loss=0.06407, simple_loss=0.08437, pruned_loss=0.01122, audio_tagging_loss=0.01066, over 15963.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.09127, pruned_loss=0.0126, audio_tagging_loss=0.008594, over 3058498.84 frames. ], batch size: 61, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:52:02,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3299180.0, ans=0.0 2023-11-28 01:52:08,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3299180.0, ans=0.2 2023-11-28 01:52:09,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3299246.6666666665, ans=0.2 2023-11-28 01:52:22,937 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494900 2023-11-28 01:52:30,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3299313.3333333335, ans=0.0 2023-11-28 01:52:31,947 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=15.0 2023-11-28 01:52:35,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3299380.0, ans=0.0 2023-11-28 01:52:36,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3299380.0, ans=0.1 2023-11-28 01:52:38,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=12.0 2023-11-28 01:52:56,224 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 1950, loss[loss=0.07074, simple_loss=0.1017, pruned_loss=0.01398, audio_tagging_loss=0.005889, over 15567.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09044, pruned_loss=0.01255, audio_tagging_loss=0.008588, over 3053640.24 frames. ], batch size: 56, lr: 1.62e-03, grad_scale: 16.0 2023-11-28 01:53:16,408 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 01:53:20,621 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 494950 2023-11-28 01:53:33,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3299713.3333333335, ans=0.1 2023-11-28 01:53:37,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3299713.3333333335, ans=0.125 2023-11-28 01:53:42,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3299780.0, ans=0.125 2023-11-28 01:53:43,771 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.727e+01 9.410e+01 1.013e+02 1.318e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 01:53:53,606 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2000, loss[loss=0.07252, simple_loss=0.09572, pruned_loss=0.01598, audio_tagging_loss=0.008676, over 15046.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.08997, pruned_loss=0.01243, audio_tagging_loss=0.008627, over 3054185.14 frames. ], batch size: 54, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:54:00,069 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.90 vs. limit=15.0 2023-11-28 01:54:05,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3299913.3333333335, ans=0.5 2023-11-28 01:54:05,713 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.90 vs. limit=15.0 2023-11-28 01:54:07,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3299913.3333333335, ans=0.125 2023-11-28 01:54:17,824 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495000 2023-11-28 01:54:22,941 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-28 01:54:35,478 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.52 vs. limit=10.0 2023-11-28 01:54:38,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3300113.3333333335, ans=0.0 2023-11-28 01:54:51,447 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2050, loss[loss=0.07634, simple_loss=0.09914, pruned_loss=0.01685, audio_tagging_loss=0.009917, over 16656.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.08978, pruned_loss=0.01239, audio_tagging_loss=0.008703, over 3050191.68 frames. ], batch size: 62, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:55:15,638 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495050 2023-11-28 01:55:38,625 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.597e+01 8.786e+01 9.334e+01 1.004e+02 1.293e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-28 01:55:39,479 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.82 vs. limit=8.0 2023-11-28 01:55:39,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3300446.6666666665, ans=0.125 2023-11-28 01:55:49,308 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2100, loss[loss=0.05467, simple_loss=0.07621, pruned_loss=0.008023, audio_tagging_loss=0.008547, over 15529.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08998, pruned_loss=0.01237, audio_tagging_loss=0.008689, over 3057210.03 frames. ], batch size: 59, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:55:52,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3300513.3333333335, ans=0.1 2023-11-28 01:55:54,120 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2023-11-28 01:56:08,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3300580.0, ans=0.125 2023-11-28 01:56:08,871 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-11-28 01:56:14,136 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495100 2023-11-28 01:56:14,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3300646.6666666665, ans=0.125 2023-11-28 01:56:44,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3300780.0, ans=0.0 2023-11-28 01:56:45,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3300780.0, ans=0.0 2023-11-28 01:56:46,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3300846.6666666665, ans=0.2 2023-11-28 01:56:47,189 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2150, loss[loss=0.06643, simple_loss=0.09449, pruned_loss=0.01056, audio_tagging_loss=0.008619, over 14824.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09069, pruned_loss=0.0125, audio_tagging_loss=0.008656, over 3054126.81 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:56:50,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3300846.6666666665, ans=0.125 2023-11-28 01:57:11,875 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495150 2023-11-28 01:57:20,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3301046.6666666665, ans=0.125 2023-11-28 01:57:22,131 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.20 vs. limit=15.0 2023-11-28 01:57:22,686 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 01:57:22,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3301046.6666666665, ans=0.1 2023-11-28 01:57:31,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3301046.6666666665, ans=0.0 2023-11-28 01:57:34,208 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 8.726e+01 9.403e+01 1.016e+02 1.279e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 01:57:38,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3301113.3333333335, ans=0.125 2023-11-28 01:57:41,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3301113.3333333335, ans=0.125 2023-11-28 01:57:45,119 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2200, loss[loss=0.07811, simple_loss=0.1097, pruned_loss=0.01678, audio_tagging_loss=0.006477, over 15817.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09164, pruned_loss=0.01266, audio_tagging_loss=0.008616, over 3055462.94 frames. ], batch size: 58, lr: 1.62e-03, grad_scale: 32.0 2023-11-28 01:57:47,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3301180.0, ans=0.0 2023-11-28 01:57:52,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3301180.0, ans=0.125 2023-11-28 01:57:54,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3301180.0, ans=0.125 2023-11-28 01:57:57,180 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-11-28 01:58:04,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3301246.6666666665, ans=0.95 2023-11-28 01:58:08,873 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495200 2023-11-28 01:58:43,004 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2250, loss[loss=0.07352, simple_loss=0.1044, pruned_loss=0.01199, audio_tagging_loss=0.009332, over 15285.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09101, pruned_loss=0.01262, audio_tagging_loss=0.008675, over 3053373.44 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 01:58:50,285 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.21 vs. limit=10.0 2023-11-28 01:58:56,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3301580.0, ans=0.125 2023-11-28 01:59:07,458 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495250 2023-11-28 01:59:08,983 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.83 vs. limit=15.0 2023-11-28 01:59:29,983 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 8.696e+01 9.309e+01 9.993e+01 1.259e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-28 01:59:39,876 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2300, loss[loss=0.07067, simple_loss=0.09793, pruned_loss=0.01264, audio_tagging_loss=0.009065, over 14501.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09011, pruned_loss=0.01257, audio_tagging_loss=0.008779, over 3044170.74 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 01:59:40,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3301846.6666666665, ans=0.125 2023-11-28 01:59:47,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3301846.6666666665, ans=0.125 2023-11-28 01:59:55,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3301913.3333333335, ans=0.0 2023-11-28 02:00:00,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3301913.3333333335, ans=0.1 2023-11-28 02:00:04,533 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495300 2023-11-28 02:00:25,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3302113.3333333335, ans=0.125 2023-11-28 02:00:26,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3302113.3333333335, ans=0.0 2023-11-28 02:00:29,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3302113.3333333335, ans=0.0 2023-11-28 02:00:32,679 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:00:37,650 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:00:38,607 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2350, loss[loss=0.06895, simple_loss=0.09484, pruned_loss=0.01221, audio_tagging_loss=0.009323, over 15777.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09063, pruned_loss=0.01264, audio_tagging_loss=0.008793, over 3046335.38 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:00:40,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3302180.0, ans=0.125 2023-11-28 02:00:44,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3302180.0, ans=0.05 2023-11-28 02:00:48,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3302246.6666666665, ans=0.125 2023-11-28 02:00:50,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3302246.6666666665, ans=0.1 2023-11-28 02:01:02,397 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495350 2023-11-28 02:01:08,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3302313.3333333335, ans=22.5 2023-11-28 02:01:21,566 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.34 vs. limit=10.0 2023-11-28 02:01:24,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3302446.6666666665, ans=0.125 2023-11-28 02:01:25,425 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 8.943e+01 9.346e+01 1.021e+02 1.230e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 02:01:29,238 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2023-11-28 02:01:36,091 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2400, loss[loss=0.05945, simple_loss=0.07681, pruned_loss=0.008913, audio_tagging_loss=0.01213, over 15719.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08928, pruned_loss=0.01225, audio_tagging_loss=0.008976, over 3042019.95 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:01:59,833 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495400 2023-11-28 02:02:05,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3302646.6666666665, ans=6.0 2023-11-28 02:02:22,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3302780.0, ans=0.1 2023-11-28 02:02:24,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3302780.0, ans=0.2 2023-11-28 02:02:25,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3302780.0, ans=0.125 2023-11-28 02:02:32,969 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2450, loss[loss=0.05751, simple_loss=0.07259, pruned_loss=0.01163, audio_tagging_loss=0.009576, over 14684.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08928, pruned_loss=0.01228, audio_tagging_loss=0.009144, over 3044690.39 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:02:57,438 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495450 2023-11-28 02:02:58,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3302980.0, ans=0.2 2023-11-28 02:03:21,015 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.760e+01 9.295e+01 9.959e+01 1.249e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 02:03:31,376 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2500, loss[loss=0.05316, simple_loss=0.06354, pruned_loss=0.0073, audio_tagging_loss=0.01409, over 14592.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09001, pruned_loss=0.01241, audio_tagging_loss=0.009112, over 3044584.57 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:03:32,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3303180.0, ans=0.125 2023-11-28 02:03:46,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3303246.6666666665, ans=0.0 2023-11-28 02:03:55,004 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495500 2023-11-28 02:03:59,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3303313.3333333335, ans=0.0 2023-11-28 02:04:08,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3303380.0, ans=0.125 2023-11-28 02:04:14,042 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.05 vs. limit=10.0 2023-11-28 02:04:28,459 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2550, loss[loss=0.0697, simple_loss=0.1035, pruned_loss=0.01141, audio_tagging_loss=0.006534, over 16041.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09054, pruned_loss=0.01245, audio_tagging_loss=0.008973, over 3046574.57 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:04:35,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3303513.3333333335, ans=0.2 2023-11-28 02:04:40,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.20 vs. limit=22.5 2023-11-28 02:04:52,372 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495550 2023-11-28 02:05:17,230 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.254e+01 8.589e+01 9.196e+01 9.860e+01 1.420e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-28 02:05:26,159 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2600, loss[loss=0.06005, simple_loss=0.07902, pruned_loss=0.01153, audio_tagging_loss=0.009016, over 14888.00 frames. ], tot_loss[loss=0.06645, simple_loss=0.09066, pruned_loss=0.0123, audio_tagging_loss=0.008822, over 3055078.14 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:05:50,760 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495600 2023-11-28 02:05:54,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3303980.0, ans=0.125 2023-11-28 02:06:05,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3304046.6666666665, ans=0.0 2023-11-28 02:06:10,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3304046.6666666665, ans=0.125 2023-11-28 02:06:24,277 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2650, loss[loss=0.07578, simple_loss=0.1022, pruned_loss=0.01599, audio_tagging_loss=0.008681, over 14923.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09029, pruned_loss=0.01234, audio_tagging_loss=0.00878, over 3052607.80 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:06:27,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3304180.0, ans=0.125 2023-11-28 02:06:47,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3304313.3333333335, ans=0.0 2023-11-28 02:06:48,581 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495650 2023-11-28 02:06:50,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3304313.3333333335, ans=0.125 2023-11-28 02:06:50,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3304313.3333333335, ans=0.0 2023-11-28 02:06:58,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3304380.0, ans=0.125 2023-11-28 02:06:59,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3304380.0, ans=0.125 2023-11-28 02:07:13,631 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 8.568e+01 9.215e+01 1.005e+02 1.316e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-28 02:07:16,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3304446.6666666665, ans=0.0 2023-11-28 02:07:21,987 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2700, loss[loss=0.06439, simple_loss=0.0988, pruned_loss=0.008175, audio_tagging_loss=0.00681, over 15308.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08965, pruned_loss=0.01215, audio_tagging_loss=0.0088, over 3052760.05 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:07:31,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3304513.3333333335, ans=0.0 2023-11-28 02:07:45,788 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495700 2023-11-28 02:08:01,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3304713.3333333335, ans=0.125 2023-11-28 02:08:18,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3304846.6666666665, ans=0.0 2023-11-28 02:08:19,869 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2750, loss[loss=0.06523, simple_loss=0.08422, pruned_loss=0.01482, audio_tagging_loss=0.008294, over 14737.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09005, pruned_loss=0.01252, audio_tagging_loss=0.008746, over 3047467.02 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:08:25,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3304846.6666666665, ans=0.0 2023-11-28 02:08:26,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3304846.6666666665, ans=0.125 2023-11-28 02:08:26,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3304846.6666666665, ans=0.0 2023-11-28 02:08:36,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3304913.3333333335, ans=0.1 2023-11-28 02:08:39,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3304913.3333333335, ans=0.0 2023-11-28 02:08:43,921 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495750 2023-11-28 02:08:53,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=3304980.0, ans=15.0 2023-11-28 02:09:05,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3305113.3333333335, ans=0.125 2023-11-28 02:09:09,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.832e+01 9.441e+01 1.011e+02 1.289e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 02:09:10,567 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:09:17,085 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2800, loss[loss=0.07061, simple_loss=0.09307, pruned_loss=0.01179, audio_tagging_loss=0.01229, over 15560.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09023, pruned_loss=0.01229, audio_tagging_loss=0.00872, over 3047507.07 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:09:20,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3305180.0, ans=0.125 2023-11-28 02:09:25,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3305180.0, ans=0.0 2023-11-28 02:09:32,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3305246.6666666665, ans=0.0 2023-11-28 02:09:42,229 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495800 2023-11-28 02:09:46,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3305313.3333333335, ans=0.125 2023-11-28 02:09:48,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3305313.3333333335, ans=0.125 2023-11-28 02:09:49,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=3305313.3333333335, ans=0.1 2023-11-28 02:09:54,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3305380.0, ans=0.125 2023-11-28 02:10:05,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3305446.6666666665, ans=0.0 2023-11-28 02:10:11,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3305446.6666666665, ans=0.125 2023-11-28 02:10:15,078 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2850, loss[loss=0.0613, simple_loss=0.07886, pruned_loss=0.01121, audio_tagging_loss=0.01067, over 15574.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.0898, pruned_loss=0.01228, audio_tagging_loss=0.0087, over 3046178.74 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:10:20,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3305513.3333333335, ans=0.125 2023-11-28 02:10:31,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3305580.0, ans=0.125 2023-11-28 02:10:39,396 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495850 2023-11-28 02:10:41,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3305646.6666666665, ans=0.125 2023-11-28 02:11:00,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3305780.0, ans=0.125 2023-11-28 02:11:02,610 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-11-28 02:11:04,935 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 8.806e+01 9.389e+01 1.005e+02 1.417e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 02:11:12,629 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2900, loss[loss=0.06033, simple_loss=0.08184, pruned_loss=0.009292, audio_tagging_loss=0.01012, over 14946.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08964, pruned_loss=0.01238, audio_tagging_loss=0.008638, over 3046315.73 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:11:20,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3305846.6666666665, ans=0.2 2023-11-28 02:11:35,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3305980.0, ans=0.05 2023-11-28 02:11:36,741 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495900 2023-11-28 02:11:40,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3305980.0, ans=0.125 2023-11-28 02:11:49,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3306046.6666666665, ans=0.125 2023-11-28 02:11:56,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3306046.6666666665, ans=0.0 2023-11-28 02:12:06,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3306113.3333333335, ans=0.125 2023-11-28 02:12:09,959 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 2950, loss[loss=0.05143, simple_loss=0.0601, pruned_loss=0.01144, audio_tagging_loss=0.009945, over 13652.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08997, pruned_loss=0.01249, audio_tagging_loss=0.008659, over 3050870.58 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:12:16,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3306180.0, ans=0.125 2023-11-28 02:12:17,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3306180.0, ans=0.125 2023-11-28 02:12:17,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3306180.0, ans=0.1 2023-11-28 02:12:17,989 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2023-11-28 02:12:34,936 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 495950 2023-11-28 02:13:01,044 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.879e+01 8.902e+01 9.555e+01 1.025e+02 1.277e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 02:13:07,696 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3000, loss[loss=0.07221, simple_loss=0.1015, pruned_loss=0.01434, audio_tagging_loss=0.007142, over 15187.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09035, pruned_loss=0.01239, audio_tagging_loss=0.008694, over 3050377.08 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:13:07,696 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 02:13:42,055 INFO [train_asr.py:1267] (2/4) Epoch 42, validation: loss=0.05767, simple_loss=0.05061, pruned_loss=0.005183, audio_tagging_loss=0.02719, over 4681554.00 frames. 2023-11-28 02:13:42,056 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 02:13:50,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3306513.3333333335, ans=0.0 2023-11-28 02:13:59,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3306580.0, ans=0.0 2023-11-28 02:14:01,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3306580.0, ans=0.05 2023-11-28 02:14:05,663 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496000 2023-11-28 02:14:12,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3306646.6666666665, ans=0.0 2023-11-28 02:14:17,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3306713.3333333335, ans=0.125 2023-11-28 02:14:24,116 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=15.0 2023-11-28 02:14:42,112 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3050, loss[loss=0.07777, simple_loss=0.1027, pruned_loss=0.01549, audio_tagging_loss=0.01094, over 15564.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09056, pruned_loss=0.01239, audio_tagging_loss=0.008802, over 3044575.08 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:14:56,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3306913.3333333335, ans=0.0 2023-11-28 02:15:05,668 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496050 2023-11-28 02:15:14,259 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=12.0 2023-11-28 02:15:15,443 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.59 vs. limit=10.0 2023-11-28 02:15:16,126 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:15:20,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3307046.6666666665, ans=0.0 2023-11-28 02:15:32,503 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.076e+01 8.885e+01 9.431e+01 1.019e+02 1.276e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 02:15:37,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3307113.3333333335, ans=0.125 2023-11-28 02:15:39,159 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3100, loss[loss=0.06406, simple_loss=0.08081, pruned_loss=0.01311, audio_tagging_loss=0.01055, over 15340.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08975, pruned_loss=0.01233, audio_tagging_loss=0.00886, over 3043128.72 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:15:59,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3307246.6666666665, ans=0.125 2023-11-28 02:16:02,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3307313.3333333335, ans=0.125 2023-11-28 02:16:03,444 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496100 2023-11-28 02:16:24,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3307446.6666666665, ans=0.125 2023-11-28 02:16:35,019 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2023-11-28 02:16:36,622 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3150, loss[loss=0.07646, simple_loss=0.1116, pruned_loss=0.01225, audio_tagging_loss=0.008433, over 15607.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09066, pruned_loss=0.01245, audio_tagging_loss=0.00891, over 3043193.46 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 02:16:51,208 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.04 vs. limit=15.0 2023-11-28 02:17:01,122 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496150 2023-11-28 02:17:22,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3307780.0, ans=0.0 2023-11-28 02:17:27,307 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.839e+01 8.803e+01 9.436e+01 1.005e+02 1.293e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 02:17:31,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3307780.0, ans=0.125 2023-11-28 02:17:33,935 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3200, loss[loss=0.06448, simple_loss=0.09153, pruned_loss=0.01226, audio_tagging_loss=0.006457, over 14492.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09075, pruned_loss=0.01246, audio_tagging_loss=0.008962, over 3042351.49 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:17:36,077 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2023-11-28 02:17:48,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3307913.3333333335, ans=0.0 2023-11-28 02:17:48,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3307913.3333333335, ans=6.0 2023-11-28 02:17:55,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3307913.3333333335, ans=0.2 2023-11-28 02:17:58,646 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496200 2023-11-28 02:18:03,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3307980.0, ans=0.1 2023-11-28 02:18:06,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3307980.0, ans=0.1 2023-11-28 02:18:09,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3308046.6666666665, ans=0.125 2023-11-28 02:18:10,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3308046.6666666665, ans=0.0 2023-11-28 02:18:16,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3308046.6666666665, ans=0.125 2023-11-28 02:18:32,209 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3250, loss[loss=0.07016, simple_loss=0.08989, pruned_loss=0.0143, audio_tagging_loss=0.01092, over 15063.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.08988, pruned_loss=0.01226, audio_tagging_loss=0.009051, over 3051264.02 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:18:34,973 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.02 vs. limit=15.0 2023-11-28 02:18:52,020 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.15 vs. limit=15.0 2023-11-28 02:18:55,886 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-28 02:18:56,679 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496250 2023-11-28 02:19:10,860 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:19:23,286 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.753e+01 9.382e+01 9.909e+01 1.200e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 02:19:23,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3308446.6666666665, ans=0.125 2023-11-28 02:19:29,810 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3300, loss[loss=0.07354, simple_loss=0.1031, pruned_loss=0.01356, audio_tagging_loss=0.008441, over 15086.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.08997, pruned_loss=0.01225, audio_tagging_loss=0.009046, over 3044907.06 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:19:32,468 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2023-11-28 02:19:32,475 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-28 02:19:35,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3308513.3333333335, ans=0.2 2023-11-28 02:19:54,026 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.68 vs. limit=10.0 2023-11-28 02:19:54,729 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496300 2023-11-28 02:20:01,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3308646.6666666665, ans=0.07 2023-11-28 02:20:05,235 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2023-11-28 02:20:14,778 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-28 02:20:21,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3308780.0, ans=0.0 2023-11-28 02:20:28,042 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3350, loss[loss=0.09893, simple_loss=0.1315, pruned_loss=0.02325, audio_tagging_loss=0.009933, over 16111.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09104, pruned_loss=0.01249, audio_tagging_loss=0.008931, over 3037775.80 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:20:34,676 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:20:35,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3308846.6666666665, ans=0.2 2023-11-28 02:20:38,317 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=12.0 2023-11-28 02:20:40,243 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2023-11-28 02:20:52,525 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496350 2023-11-28 02:21:05,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3309046.6666666665, ans=0.0 2023-11-28 02:21:19,213 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.593e+01 8.880e+01 9.434e+01 1.005e+02 1.295e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 02:21:24,104 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2023-11-28 02:21:24,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3309180.0, ans=0.0 2023-11-28 02:21:25,795 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3400, loss[loss=0.054, simple_loss=0.07603, pruned_loss=0.009613, audio_tagging_loss=0.006374, over 15173.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09058, pruned_loss=0.01236, audio_tagging_loss=0.008859, over 3042084.37 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:21:33,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3309180.0, ans=0.1 2023-11-28 02:21:45,471 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=12.0 2023-11-28 02:21:49,489 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496400 2023-11-28 02:21:57,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3309313.3333333335, ans=0.0 2023-11-28 02:21:59,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3309380.0, ans=0.0 2023-11-28 02:22:06,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3309380.0, ans=0.2 2023-11-28 02:22:11,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3309446.6666666665, ans=0.1 2023-11-28 02:22:20,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3309446.6666666665, ans=0.0 2023-11-28 02:22:23,523 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3450, loss[loss=0.07166, simple_loss=0.1035, pruned_loss=0.01223, audio_tagging_loss=0.007693, over 15982.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09091, pruned_loss=0.01238, audio_tagging_loss=0.008777, over 3045151.30 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:22:29,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3309513.3333333335, ans=0.0 2023-11-28 02:22:30,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3309513.3333333335, ans=0.125 2023-11-28 02:22:35,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3309580.0, ans=0.1 2023-11-28 02:22:48,150 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496450 2023-11-28 02:23:13,826 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.471e+01 8.800e+01 9.317e+01 1.017e+02 1.307e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-28 02:23:20,396 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3500, loss[loss=0.06429, simple_loss=0.08784, pruned_loss=0.01192, audio_tagging_loss=0.008451, over 15326.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09154, pruned_loss=0.01254, audio_tagging_loss=0.008688, over 3050165.52 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:23:20,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3309846.6666666665, ans=0.125 2023-11-28 02:23:28,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3309846.6666666665, ans=0.125 2023-11-28 02:23:38,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3309913.3333333335, ans=0.0 2023-11-28 02:23:44,974 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496500 2023-11-28 02:23:52,029 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:24:09,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3310113.3333333335, ans=0.0 2023-11-28 02:24:15,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3310113.3333333335, ans=0.1 2023-11-28 02:24:18,476 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3550, loss[loss=0.04298, simple_loss=0.05642, pruned_loss=0.004777, audio_tagging_loss=0.009997, over 14460.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09015, pruned_loss=0.01238, audio_tagging_loss=0.008725, over 3045264.21 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:24:42,238 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496550 2023-11-28 02:24:47,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3310313.3333333335, ans=0.1 2023-11-28 02:25:08,573 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.495e+01 8.721e+01 9.297e+01 1.018e+02 1.196e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 02:25:09,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3310446.6666666665, ans=0.125 2023-11-28 02:25:15,739 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3600, loss[loss=0.0662, simple_loss=0.09742, pruned_loss=0.01012, audio_tagging_loss=0.007373, over 15442.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08929, pruned_loss=0.01208, audio_tagging_loss=0.008644, over 3043271.81 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:25:29,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3310580.0, ans=0.05 2023-11-28 02:25:36,694 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:25:39,215 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496600 2023-11-28 02:25:42,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=3310646.6666666665, ans=0.5 2023-11-28 02:25:45,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3310646.6666666665, ans=0.125 2023-11-28 02:25:52,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3310713.3333333335, ans=0.0 2023-11-28 02:26:09,571 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.80 vs. limit=22.5 2023-11-28 02:26:12,306 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3650, loss[loss=0.0618, simple_loss=0.08667, pruned_loss=0.00935, audio_tagging_loss=0.009114, over 16187.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08928, pruned_loss=0.01204, audio_tagging_loss=0.008597, over 3043160.11 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:26:15,016 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-28 02:26:25,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3310913.3333333335, ans=0.1 2023-11-28 02:26:36,644 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496650 2023-11-28 02:26:37,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3310980.0, ans=0.125 2023-11-28 02:26:40,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3310980.0, ans=0.0 2023-11-28 02:26:41,160 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=15.0 2023-11-28 02:26:49,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3311046.6666666665, ans=0.1 2023-11-28 02:26:59,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3311113.3333333335, ans=0.125 2023-11-28 02:27:03,702 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.057e+01 8.594e+01 9.332e+01 1.003e+02 1.328e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 02:27:04,691 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2023-11-28 02:27:09,743 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3700, loss[loss=0.07135, simple_loss=0.1003, pruned_loss=0.01425, audio_tagging_loss=0.006954, over 14740.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09032, pruned_loss=0.01228, audio_tagging_loss=0.008574, over 3046398.95 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:27:13,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3311180.0, ans=0.05 2023-11-28 02:27:26,950 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=12.0 2023-11-28 02:27:29,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3311246.6666666665, ans=0.125 2023-11-28 02:27:33,991 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496700 2023-11-28 02:27:37,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3311313.3333333335, ans=0.0 2023-11-28 02:27:38,962 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.79 vs. limit=22.5 2023-11-28 02:27:51,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3311380.0, ans=0.1 2023-11-28 02:27:56,314 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:28:01,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3311446.6666666665, ans=0.1 2023-11-28 02:28:07,637 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3750, loss[loss=0.06019, simple_loss=0.08407, pruned_loss=0.009949, audio_tagging_loss=0.008204, over 15437.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09046, pruned_loss=0.01237, audio_tagging_loss=0.008576, over 3053537.37 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:28:21,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3311580.0, ans=0.2 2023-11-28 02:28:30,767 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496750 2023-11-28 02:28:47,234 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:28:55,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3311780.0, ans=0.125 2023-11-28 02:28:58,667 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 8.781e+01 9.240e+01 9.958e+01 1.596e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-28 02:28:58,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3311780.0, ans=0.125 2023-11-28 02:29:04,353 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3800, loss[loss=0.07652, simple_loss=0.1101, pruned_loss=0.01726, audio_tagging_loss=0.004227, over 15595.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08993, pruned_loss=0.0122, audio_tagging_loss=0.00858, over 3048550.47 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:29:23,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3311913.3333333335, ans=0.2 2023-11-28 02:29:25,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3311913.3333333335, ans=0.09899494936611666 2023-11-28 02:29:27,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3311980.0, ans=0.2 2023-11-28 02:29:28,749 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496800 2023-11-28 02:30:01,646 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3850, loss[loss=0.08195, simple_loss=0.1091, pruned_loss=0.01924, audio_tagging_loss=0.008169, over 15503.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08977, pruned_loss=0.01222, audio_tagging_loss=0.008674, over 3046984.90 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:30:06,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3312180.0, ans=0.125 2023-11-28 02:30:25,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3312313.3333333335, ans=0.2 2023-11-28 02:30:26,007 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496850 2023-11-28 02:30:40,743 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.38 vs. limit=15.0 2023-11-28 02:30:53,143 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.776e+01 8.894e+01 9.500e+01 1.019e+02 1.780e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 02:30:53,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3312446.6666666665, ans=0.1 2023-11-28 02:30:57,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3312446.6666666665, ans=0.0 2023-11-28 02:30:58,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.45 vs. limit=15.0 2023-11-28 02:30:59,259 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3900, loss[loss=0.0692, simple_loss=0.08553, pruned_loss=0.01382, audio_tagging_loss=0.01262, over 15797.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.0897, pruned_loss=0.01221, audio_tagging_loss=0.008738, over 3045088.29 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:31:13,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3312580.0, ans=0.0 2023-11-28 02:31:22,877 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496900 2023-11-28 02:31:27,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3312646.6666666665, ans=0.125 2023-11-28 02:31:28,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3312646.6666666665, ans=0.04949747468305833 2023-11-28 02:31:56,452 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 3950, loss[loss=0.06145, simple_loss=0.08041, pruned_loss=0.009808, audio_tagging_loss=0.01143, over 15380.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08998, pruned_loss=0.01224, audio_tagging_loss=0.008818, over 3042028.89 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:31:57,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3312846.6666666665, ans=0.1 2023-11-28 02:31:59,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3312846.6666666665, ans=0.125 2023-11-28 02:32:00,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3312846.6666666665, ans=0.1 2023-11-28 02:32:07,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3312913.3333333335, ans=0.125 2023-11-28 02:32:19,799 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 496950 2023-11-28 02:32:28,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3312980.0, ans=0.125 2023-11-28 02:32:35,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3313046.6666666665, ans=0.2 2023-11-28 02:32:46,855 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 8.841e+01 9.484e+01 1.039e+02 1.407e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 02:32:52,895 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4000, loss[loss=0.0908, simple_loss=0.1316, pruned_loss=0.01661, audio_tagging_loss=0.008374, over 15666.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09093, pruned_loss=0.01235, audio_tagging_loss=0.008836, over 3040922.06 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:32:59,692 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:33:04,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3313246.6666666665, ans=0.2 2023-11-28 02:33:17,207 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497000 2023-11-28 02:33:49,966 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4050, loss[loss=0.06848, simple_loss=0.09127, pruned_loss=0.01345, audio_tagging_loss=0.009398, over 15166.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09097, pruned_loss=0.01258, audio_tagging_loss=0.008929, over 3032864.46 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:33:50,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3313513.3333333335, ans=0.0 2023-11-28 02:33:53,185 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:33:59,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3313513.3333333335, ans=0.0 2023-11-28 02:34:10,485 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.61 vs. limit=22.5 2023-11-28 02:34:13,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3313646.6666666665, ans=0.0 2023-11-28 02:34:14,325 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497050 2023-11-28 02:34:20,733 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.55 vs. limit=15.0 2023-11-28 02:34:42,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.897e+01 8.798e+01 9.309e+01 1.006e+02 1.878e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-28 02:34:47,162 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4100, loss[loss=0.09562, simple_loss=0.1295, pruned_loss=0.02494, audio_tagging_loss=0.005935, over 15708.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09087, pruned_loss=0.01255, audio_tagging_loss=0.008938, over 3036056.40 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:34:49,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3313846.6666666665, ans=0.125 2023-11-28 02:34:58,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3313913.3333333335, ans=0.125 2023-11-28 02:35:10,924 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497100 2023-11-28 02:35:15,528 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:35:30,285 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.63 vs. limit=22.5 2023-11-28 02:35:34,586 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.08 vs. limit=6.0 2023-11-28 02:35:36,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3314113.3333333335, ans=0.125 2023-11-28 02:35:43,995 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4150, loss[loss=0.06068, simple_loss=0.0866, pruned_loss=0.01096, audio_tagging_loss=0.006425, over 14686.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09136, pruned_loss=0.01269, audio_tagging_loss=0.008729, over 3030707.82 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:35:51,105 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2023-11-28 02:36:08,928 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497150 2023-11-28 02:36:17,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3314313.3333333335, ans=0.0 2023-11-28 02:36:19,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3314380.0, ans=0.2 2023-11-28 02:36:23,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3314380.0, ans=0.125 2023-11-28 02:36:24,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3314380.0, ans=0.125 2023-11-28 02:36:27,021 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:36:32,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3314446.6666666665, ans=0.125 2023-11-28 02:36:32,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3314446.6666666665, ans=0.0 2023-11-28 02:36:37,304 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.681e+01 8.734e+01 9.392e+01 9.837e+01 1.224e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 02:36:41,700 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4200, loss[loss=0.06868, simple_loss=0.1007, pruned_loss=0.01075, audio_tagging_loss=0.007601, over 14925.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09107, pruned_loss=0.01252, audio_tagging_loss=0.0086, over 3041218.26 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:37:06,205 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497200 2023-11-28 02:37:06,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3314646.6666666665, ans=0.125 2023-11-28 02:37:13,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3314646.6666666665, ans=0.1 2023-11-28 02:37:19,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3314713.3333333335, ans=0.1 2023-11-28 02:37:26,104 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.99 vs. limit=15.0 2023-11-28 02:37:32,777 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-28 02:37:40,156 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4250, loss[loss=0.05211, simple_loss=0.06603, pruned_loss=0.009033, audio_tagging_loss=0.01006, over 15314.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09061, pruned_loss=0.01258, audio_tagging_loss=0.008539, over 3040294.39 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:37:42,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3314846.6666666665, ans=0.0 2023-11-28 02:37:42,889 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2023-11-28 02:37:44,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3314846.6666666665, ans=0.0 2023-11-28 02:37:49,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3314846.6666666665, ans=0.1 2023-11-28 02:37:56,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3314913.3333333335, ans=0.04949747468305833 2023-11-28 02:38:04,123 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497250 2023-11-28 02:38:05,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3314980.0, ans=0.125 2023-11-28 02:38:17,501 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-11-28 02:38:26,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3315113.3333333335, ans=0.035 2023-11-28 02:38:26,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3315113.3333333335, ans=0.0 2023-11-28 02:38:26,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3315113.3333333335, ans=0.0 2023-11-28 02:38:28,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3315113.3333333335, ans=0.0 2023-11-28 02:38:28,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3315113.3333333335, ans=0.125 2023-11-28 02:38:32,540 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.722e+01 9.477e+01 1.017e+02 1.335e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 02:38:36,918 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4300, loss[loss=0.06878, simple_loss=0.1018, pruned_loss=0.01142, audio_tagging_loss=0.006433, over 15426.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09097, pruned_loss=0.01251, audio_tagging_loss=0.008427, over 3039280.21 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:38:53,819 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.01 vs. limit=15.0 2023-11-28 02:39:01,080 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497300 2023-11-28 02:39:02,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3315313.3333333335, ans=0.125 2023-11-28 02:39:08,354 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=22.5 2023-11-28 02:39:10,272 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.35 vs. limit=10.0 2023-11-28 02:39:27,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3315446.6666666665, ans=0.0 2023-11-28 02:39:30,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3315446.6666666665, ans=0.2 2023-11-28 02:39:33,973 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4350, loss[loss=0.06807, simple_loss=0.08869, pruned_loss=0.0127, audio_tagging_loss=0.01103, over 15326.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09033, pruned_loss=0.0125, audio_tagging_loss=0.008477, over 3031884.58 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:39:58,395 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497350 2023-11-28 02:40:21,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3315780.0, ans=0.0 2023-11-28 02:40:26,198 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 8.957e+01 9.552e+01 1.043e+02 1.269e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 02:40:31,071 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4400, loss[loss=0.06275, simple_loss=0.07913, pruned_loss=0.01293, audio_tagging_loss=0.01025, over 14951.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08932, pruned_loss=0.01235, audio_tagging_loss=0.008644, over 3036581.41 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:40:55,838 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497400 2023-11-28 02:41:10,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2023-11-28 02:41:12,303 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.38 vs. limit=22.5 2023-11-28 02:41:23,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3316113.3333333335, ans=0.125 2023-11-28 02:41:29,272 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4450, loss[loss=0.08042, simple_loss=0.1131, pruned_loss=0.01731, audio_tagging_loss=0.006546, over 15362.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09008, pruned_loss=0.01263, audio_tagging_loss=0.008653, over 3045380.89 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:41:32,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3316180.0, ans=10.0 2023-11-28 02:41:41,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3316246.6666666665, ans=0.125 2023-11-28 02:41:47,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3316246.6666666665, ans=0.125 2023-11-28 02:41:50,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3316246.6666666665, ans=0.125 2023-11-28 02:41:53,486 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497450 2023-11-28 02:42:22,843 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 9.078e+01 9.731e+01 1.036e+02 1.394e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-28 02:42:27,230 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4500, loss[loss=0.06202, simple_loss=0.09192, pruned_loss=0.006802, audio_tagging_loss=0.009261, over 16911.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09104, pruned_loss=0.01275, audio_tagging_loss=0.008553, over 3054613.32 frames. ], batch size: 65, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:42:49,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3316646.6666666665, ans=0.125 2023-11-28 02:42:50,804 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497500 2023-11-28 02:43:01,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3316713.3333333335, ans=0.2 2023-11-28 02:43:01,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3316713.3333333335, ans=0.125 2023-11-28 02:43:12,297 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=22.5 2023-11-28 02:43:15,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3316780.0, ans=0.2 2023-11-28 02:43:24,680 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4550, loss[loss=0.07787, simple_loss=0.113, pruned_loss=0.01639, audio_tagging_loss=0.004993, over 16051.00 frames. ], tot_loss[loss=0.06734, simple_loss=0.09163, pruned_loss=0.01294, audio_tagging_loss=0.008587, over 3057709.52 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:43:33,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3316846.6666666665, ans=0.125 2023-11-28 02:43:36,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3316913.3333333335, ans=0.05 2023-11-28 02:43:49,306 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497550 2023-11-28 02:44:02,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3317046.6666666665, ans=0.1 2023-11-28 02:44:03,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3317046.6666666665, ans=0.0 2023-11-28 02:44:09,508 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 02:44:10,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3317113.3333333335, ans=0.0 2023-11-28 02:44:18,113 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.674e+01 9.170e+01 9.991e+01 1.281e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-28 02:44:19,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3317113.3333333335, ans=0.125 2023-11-28 02:44:21,529 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4600, loss[loss=0.06008, simple_loss=0.08435, pruned_loss=0.009038, audio_tagging_loss=0.008867, over 15394.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09028, pruned_loss=0.01269, audio_tagging_loss=0.008772, over 3056626.39 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:44:43,238 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 02:44:44,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3317313.3333333335, ans=0.0 2023-11-28 02:44:46,309 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497600 2023-11-28 02:44:51,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3317313.3333333335, ans=0.0 2023-11-28 02:44:56,365 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.07 vs. limit=22.5 2023-11-28 02:45:01,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3317380.0, ans=0.1 2023-11-28 02:45:20,472 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4650, loss[loss=0.05677, simple_loss=0.07673, pruned_loss=0.01002, audio_tagging_loss=0.008387, over 15469.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09057, pruned_loss=0.01258, audio_tagging_loss=0.008753, over 3057683.66 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:45:31,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3317580.0, ans=0.0 2023-11-28 02:45:44,336 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497650 2023-11-28 02:46:06,330 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.09 vs. limit=8.0 2023-11-28 02:46:14,349 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.731e+01 8.773e+01 9.249e+01 1.003e+02 1.204e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-28 02:46:17,625 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4700, loss[loss=0.05322, simple_loss=0.06911, pruned_loss=0.009322, audio_tagging_loss=0.009343, over 15796.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09074, pruned_loss=0.01266, audio_tagging_loss=0.008933, over 3057880.84 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:46:23,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3317846.6666666665, ans=0.125 2023-11-28 02:46:26,296 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=15.0 2023-11-28 02:46:38,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3317913.3333333335, ans=0.04949747468305833 2023-11-28 02:46:42,346 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497700 2023-11-28 02:46:50,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3317980.0, ans=0.1 2023-11-28 02:47:11,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3318113.3333333335, ans=0.125 2023-11-28 02:47:11,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3318113.3333333335, ans=0.0 2023-11-28 02:47:14,944 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4750, loss[loss=0.08416, simple_loss=0.1243, pruned_loss=0.01578, audio_tagging_loss=0.006219, over 16356.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09163, pruned_loss=0.01289, audio_tagging_loss=0.008883, over 3060442.07 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:47:16,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=3318180.0, ans=0.02 2023-11-28 02:47:22,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3318180.0, ans=0.1 2023-11-28 02:47:24,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3318180.0, ans=0.2 2023-11-28 02:47:39,390 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497750 2023-11-28 02:47:39,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3318313.3333333335, ans=0.125 2023-11-28 02:47:46,839 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2023-11-28 02:47:52,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3318380.0, ans=0.125 2023-11-28 02:48:06,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3318446.6666666665, ans=0.0 2023-11-28 02:48:08,869 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 8.846e+01 9.343e+01 1.002e+02 1.233e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 02:48:13,297 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4800, loss[loss=0.06611, simple_loss=0.08946, pruned_loss=0.01205, audio_tagging_loss=0.009339, over 15270.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.08992, pruned_loss=0.0126, audio_tagging_loss=0.009088, over 3061659.93 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:48:15,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3318513.3333333335, ans=0.125 2023-11-28 02:48:23,933 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2023-11-28 02:48:24,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3318580.0, ans=0.125 2023-11-28 02:48:32,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3318580.0, ans=0.125 2023-11-28 02:48:37,198 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497800 2023-11-28 02:48:37,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3318646.6666666665, ans=0.0 2023-11-28 02:48:37,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3318646.6666666665, ans=0.0 2023-11-28 02:49:10,418 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4850, loss[loss=0.0685, simple_loss=0.08995, pruned_loss=0.01197, audio_tagging_loss=0.01155, over 15871.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.08979, pruned_loss=0.01254, audio_tagging_loss=0.00915, over 3056404.49 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 02:49:34,154 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497850 2023-11-28 02:49:34,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.36 vs. limit=10.0 2023-11-28 02:49:52,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3319046.6666666665, ans=0.1 2023-11-28 02:50:03,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3319113.3333333335, ans=0.1 2023-11-28 02:50:05,855 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.086e+01 8.681e+01 9.347e+01 1.000e+02 1.245e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 02:50:08,171 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4900, loss[loss=0.06446, simple_loss=0.08244, pruned_loss=0.01178, audio_tagging_loss=0.01146, over 14441.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09017, pruned_loss=0.01246, audio_tagging_loss=0.009161, over 3045483.24 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:50:09,861 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=12.0 2023-11-28 02:50:20,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3319246.6666666665, ans=0.125 2023-11-28 02:50:33,037 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497900 2023-11-28 02:50:51,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3319380.0, ans=0.0 2023-11-28 02:50:56,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3319446.6666666665, ans=0.1 2023-11-28 02:51:05,901 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 4950, loss[loss=0.06312, simple_loss=0.08699, pruned_loss=0.0107, audio_tagging_loss=0.008927, over 15176.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09014, pruned_loss=0.01233, audio_tagging_loss=0.008976, over 3045246.36 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:51:23,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3319580.0, ans=0.0 2023-11-28 02:51:30,997 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 497950 2023-11-28 02:51:44,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3319713.3333333335, ans=0.95 2023-11-28 02:52:01,890 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.571e+01 9.206e+01 9.727e+01 1.276e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-28 02:52:04,053 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5000, loss[loss=0.07316, simple_loss=0.09804, pruned_loss=0.0155, audio_tagging_loss=0.008638, over 15451.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08948, pruned_loss=0.01233, audio_tagging_loss=0.008858, over 3038964.25 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:52:05,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3319846.6666666665, ans=0.0 2023-11-28 02:52:07,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3319846.6666666665, ans=0.125 2023-11-28 02:52:27,514 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498000 2023-11-28 02:52:43,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3320046.6666666665, ans=0.1 2023-11-28 02:52:50,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3320113.3333333335, ans=0.2 2023-11-28 02:52:55,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3320113.3333333335, ans=0.0 2023-11-28 02:53:01,690 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5050, loss[loss=0.07883, simple_loss=0.11, pruned_loss=0.01687, audio_tagging_loss=0.006968, over 15006.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.0889, pruned_loss=0.01224, audio_tagging_loss=0.008899, over 3041108.55 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:53:01,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3320180.0, ans=0.0 2023-11-28 02:53:02,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.16 vs. limit=15.0 2023-11-28 02:53:06,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3320180.0, ans=0.125 2023-11-28 02:53:09,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3320180.0, ans=0.0 2023-11-28 02:53:10,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3320180.0, ans=0.0 2023-11-28 02:53:25,473 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498050 2023-11-28 02:53:33,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3320313.3333333335, ans=0.2 2023-11-28 02:53:48,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3320446.6666666665, ans=0.0 2023-11-28 02:53:51,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3320446.6666666665, ans=0.0 2023-11-28 02:53:54,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3320446.6666666665, ans=0.125 2023-11-28 02:53:56,348 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.913e+01 8.785e+01 9.412e+01 9.952e+01 1.191e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 02:53:56,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3320446.6666666665, ans=0.09899494936611666 2023-11-28 02:53:58,588 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5100, loss[loss=0.04407, simple_loss=0.05436, pruned_loss=0.006912, audio_tagging_loss=0.009978, over 13321.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08815, pruned_loss=0.01221, audio_tagging_loss=0.008862, over 3042689.66 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:54:15,008 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=15.0 2023-11-28 02:54:19,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3320580.0, ans=0.0 2023-11-28 02:54:23,933 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498100 2023-11-28 02:54:33,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3320713.3333333335, ans=0.125 2023-11-28 02:54:42,016 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.06 vs. limit=15.0 2023-11-28 02:54:52,176 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.26 vs. limit=15.0 2023-11-28 02:54:56,893 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5150, loss[loss=0.05798, simple_loss=0.07637, pruned_loss=0.00996, audio_tagging_loss=0.009837, over 14880.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.0889, pruned_loss=0.01231, audio_tagging_loss=0.008812, over 3038879.43 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:55:13,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3320913.3333333335, ans=0.125 2023-11-28 02:55:13,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3320913.3333333335, ans=0.1 2023-11-28 02:55:14,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3320913.3333333335, ans=0.0 2023-11-28 02:55:21,127 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498150 2023-11-28 02:55:31,064 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.53 vs. limit=10.0 2023-11-28 02:55:33,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3321046.6666666665, ans=0.125 2023-11-28 02:55:53,306 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.740e+01 9.410e+01 1.002e+02 1.466e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 02:55:54,083 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2023-11-28 02:55:54,468 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5200, loss[loss=0.07575, simple_loss=0.1032, pruned_loss=0.01457, audio_tagging_loss=0.009606, over 14260.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09005, pruned_loss=0.01251, audio_tagging_loss=0.008685, over 3035683.09 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:55:55,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3321180.0, ans=0.125 2023-11-28 02:56:07,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3321246.6666666665, ans=0.125 2023-11-28 02:56:18,557 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498200 2023-11-28 02:56:34,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3321380.0, ans=0.0 2023-11-28 02:56:51,813 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5250, loss[loss=0.05538, simple_loss=0.07463, pruned_loss=0.0105, audio_tagging_loss=0.007559, over 14890.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09071, pruned_loss=0.01271, audio_tagging_loss=0.008554, over 3036951.89 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:57:14,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3321646.6666666665, ans=0.1 2023-11-28 02:57:14,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3321646.6666666665, ans=0.125 2023-11-28 02:57:16,171 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498250 2023-11-28 02:57:18,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3321646.6666666665, ans=15.0 2023-11-28 02:57:19,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3321646.6666666665, ans=0.125 2023-11-28 02:57:48,322 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.904e+01 9.487e+01 1.032e+02 1.355e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 02:57:48,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3321846.6666666665, ans=0.125 2023-11-28 02:57:49,441 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5300, loss[loss=0.07353, simple_loss=0.09044, pruned_loss=0.01662, audio_tagging_loss=0.01169, over 14511.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.09144, pruned_loss=0.01287, audio_tagging_loss=0.00854, over 3036867.80 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:57:49,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3321846.6666666665, ans=0.125 2023-11-28 02:57:49,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3321846.6666666665, ans=0.125 2023-11-28 02:57:53,364 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.33 vs. limit=22.5 2023-11-28 02:58:07,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3321913.3333333335, ans=0.0 2023-11-28 02:58:11,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3321980.0, ans=0.125 2023-11-28 02:58:13,547 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498300 2023-11-28 02:58:36,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3322113.3333333335, ans=0.2 2023-11-28 02:58:36,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3322113.3333333335, ans=22.5 2023-11-28 02:58:37,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3322113.3333333335, ans=0.125 2023-11-28 02:58:43,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3322113.3333333335, ans=0.125 2023-11-28 02:58:47,139 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5350, loss[loss=0.06923, simple_loss=0.09403, pruned_loss=0.01339, audio_tagging_loss=0.008823, over 15565.00 frames. ], tot_loss[loss=0.0678, simple_loss=0.09243, pruned_loss=0.01305, audio_tagging_loss=0.008537, over 3041504.93 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 02:58:53,283 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.46 vs. limit=10.0 2023-11-28 02:59:01,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3322246.6666666665, ans=0.1 2023-11-28 02:59:08,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3322313.3333333335, ans=0.125 2023-11-28 02:59:11,076 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498350 2023-11-28 02:59:20,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3322380.0, ans=0.2 2023-11-28 02:59:30,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3322380.0, ans=0.125 2023-11-28 02:59:31,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3322380.0, ans=0.125 2023-11-28 02:59:42,879 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.664e+01 9.180e+01 9.721e+01 1.287e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-28 02:59:44,016 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5400, loss[loss=0.06372, simple_loss=0.09485, pruned_loss=0.007487, audio_tagging_loss=0.008805, over 15434.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09016, pruned_loss=0.01257, audio_tagging_loss=0.008605, over 3045895.29 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:00:03,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.13 vs. limit=15.0 2023-11-28 03:00:04,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3322580.0, ans=0.125 2023-11-28 03:00:08,043 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498400 2023-11-28 03:00:09,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3322646.6666666665, ans=0.2 2023-11-28 03:00:10,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3322646.6666666665, ans=0.125 2023-11-28 03:00:15,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3322646.6666666665, ans=0.125 2023-11-28 03:00:38,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3322780.0, ans=0.125 2023-11-28 03:00:42,003 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5450, loss[loss=0.068, simple_loss=0.08952, pruned_loss=0.01433, audio_tagging_loss=0.008915, over 14643.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09099, pruned_loss=0.01275, audio_tagging_loss=0.008554, over 3040615.66 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:00:46,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3322846.6666666665, ans=0.0 2023-11-28 03:00:50,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3322846.6666666665, ans=0.125 2023-11-28 03:00:57,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3322913.3333333335, ans=0.0 2023-11-28 03:00:59,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3322913.3333333335, ans=0.0 2023-11-28 03:01:06,732 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498450 2023-11-28 03:01:12,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3322980.0, ans=0.125 2023-11-28 03:01:24,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3323046.6666666665, ans=0.5 2023-11-28 03:01:38,406 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.263e+01 8.881e+01 9.599e+01 1.024e+02 1.269e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 03:01:39,529 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5500, loss[loss=0.05396, simple_loss=0.07215, pruned_loss=0.008044, audio_tagging_loss=0.00984, over 16679.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09087, pruned_loss=0.01278, audio_tagging_loss=0.008576, over 3041333.27 frames. ], batch size: 62, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:01:44,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3323180.0, ans=0.125 2023-11-28 03:01:58,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3323246.6666666665, ans=0.1 2023-11-28 03:02:04,116 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498500 2023-11-28 03:02:06,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.28 vs. limit=15.0 2023-11-28 03:02:30,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3323446.6666666665, ans=0.125 2023-11-28 03:02:37,295 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5550, loss[loss=0.06459, simple_loss=0.08867, pruned_loss=0.009628, audio_tagging_loss=0.01063, over 15166.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09031, pruned_loss=0.01257, audio_tagging_loss=0.00864, over 3039351.60 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:02:54,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3323580.0, ans=0.125 2023-11-28 03:03:01,168 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498550 2023-11-28 03:03:03,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3323646.6666666665, ans=0.125 2023-11-28 03:03:07,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3323646.6666666665, ans=0.125 2023-11-28 03:03:15,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3323713.3333333335, ans=0.1 2023-11-28 03:03:23,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3323780.0, ans=0.125 2023-11-28 03:03:26,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3323780.0, ans=0.0 2023-11-28 03:03:33,944 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.494e+01 8.559e+01 9.219e+01 9.829e+01 1.565e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-28 03:03:34,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3323846.6666666665, ans=0.0 2023-11-28 03:03:35,077 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5600, loss[loss=0.07501, simple_loss=0.1007, pruned_loss=0.01611, audio_tagging_loss=0.008547, over 14825.00 frames. ], tot_loss[loss=0.06664, simple_loss=0.09057, pruned_loss=0.01255, audio_tagging_loss=0.00881, over 3039442.82 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:03:48,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3323913.3333333335, ans=0.035 2023-11-28 03:03:48,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3323913.3333333335, ans=0.125 2023-11-28 03:03:53,846 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.97 vs. limit=15.0 2023-11-28 03:03:59,244 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498600 2023-11-28 03:03:59,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3323980.0, ans=0.125 2023-11-28 03:04:04,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3323980.0, ans=0.0 2023-11-28 03:04:17,624 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:04:26,936 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.21 vs. limit=22.5 2023-11-28 03:04:31,805 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5650, loss[loss=0.06304, simple_loss=0.08484, pruned_loss=0.0114, audio_tagging_loss=0.009219, over 15487.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.08997, pruned_loss=0.01256, audio_tagging_loss=0.008919, over 3040860.55 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:04:41,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3324180.0, ans=0.1 2023-11-28 03:04:55,885 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498650 2023-11-28 03:05:11,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3324380.0, ans=0.125 2023-11-28 03:05:24,389 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=12.0 2023-11-28 03:05:28,658 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.782e+01 9.473e+01 1.042e+02 1.222e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 03:05:29,857 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5700, loss[loss=0.05916, simple_loss=0.08303, pruned_loss=0.01113, audio_tagging_loss=0.006515, over 15674.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.0905, pruned_loss=0.01269, audio_tagging_loss=0.00884, over 3048347.59 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:05:44,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3324580.0, ans=0.1 2023-11-28 03:05:52,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3324646.6666666665, ans=0.0 2023-11-28 03:05:53,876 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498700 2023-11-28 03:06:01,957 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.86 vs. limit=10.0 2023-11-28 03:06:03,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3324713.3333333335, ans=0.0 2023-11-28 03:06:22,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3324780.0, ans=0.125 2023-11-28 03:06:27,542 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5750, loss[loss=0.08712, simple_loss=0.1105, pruned_loss=0.02052, audio_tagging_loss=0.01137, over 15548.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09068, pruned_loss=0.01268, audio_tagging_loss=0.008667, over 3049276.02 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:06:36,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3324846.6666666665, ans=0.125 2023-11-28 03:06:47,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3324913.3333333335, ans=0.0 2023-11-28 03:06:50,997 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498750 2023-11-28 03:06:52,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3324980.0, ans=0.0 2023-11-28 03:07:00,882 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.80 vs. limit=15.0 2023-11-28 03:07:15,561 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:07:21,166 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2023-11-28 03:07:22,755 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.740e+01 9.291e+01 9.936e+01 1.231e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-28 03:07:23,844 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5800, loss[loss=0.05147, simple_loss=0.07289, pruned_loss=0.006618, audio_tagging_loss=0.008405, over 14703.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08966, pruned_loss=0.0124, audio_tagging_loss=0.00866, over 3043320.88 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:07:41,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3325246.6666666665, ans=0.0 2023-11-28 03:07:48,072 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498800 2023-11-28 03:08:02,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3325380.0, ans=0.125 2023-11-28 03:08:11,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3325446.6666666665, ans=0.1 2023-11-28 03:08:16,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3325446.6666666665, ans=0.2 2023-11-28 03:08:19,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3325446.6666666665, ans=0.125 2023-11-28 03:08:21,650 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5850, loss[loss=0.05389, simple_loss=0.07863, pruned_loss=0.00631, audio_tagging_loss=0.008264, over 14877.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08985, pruned_loss=0.01231, audio_tagging_loss=0.008687, over 3047716.67 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:08:26,397 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=15.0 2023-11-28 03:08:32,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3325580.0, ans=0.0 2023-11-28 03:08:46,163 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498850 2023-11-28 03:09:14,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3325780.0, ans=0.125 2023-11-28 03:09:18,073 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.216e+01 8.725e+01 9.320e+01 1.016e+02 1.515e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-28 03:09:19,657 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5900, loss[loss=0.05858, simple_loss=0.08394, pruned_loss=0.008885, audio_tagging_loss=0.007722, over 16228.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09038, pruned_loss=0.01223, audio_tagging_loss=0.008654, over 3054770.22 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:09:20,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3325846.6666666665, ans=0.2 2023-11-28 03:09:21,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3325846.6666666665, ans=0.1 2023-11-28 03:09:22,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3325846.6666666665, ans=0.2 2023-11-28 03:09:43,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3325980.0, ans=10.0 2023-11-28 03:09:43,938 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498900 2023-11-28 03:10:01,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3326046.6666666665, ans=0.125 2023-11-28 03:10:11,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3326113.3333333335, ans=0.2 2023-11-28 03:10:17,156 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 5950, loss[loss=0.06752, simple_loss=0.0931, pruned_loss=0.01367, audio_tagging_loss=0.007302, over 14845.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08937, pruned_loss=0.01212, audio_tagging_loss=0.008647, over 3059013.44 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:10:18,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3326180.0, ans=0.0 2023-11-28 03:10:20,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=3326180.0, ans=0.02 2023-11-28 03:10:23,349 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.41 vs. limit=15.0 2023-11-28 03:10:40,927 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 498950 2023-11-28 03:11:14,370 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.391e+01 8.682e+01 9.363e+01 1.001e+02 1.313e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 03:11:14,395 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6000, loss[loss=0.05069, simple_loss=0.05948, pruned_loss=0.009198, audio_tagging_loss=0.01175, over 14889.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08908, pruned_loss=0.01212, audio_tagging_loss=0.008581, over 3058097.69 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:11:14,395 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 03:11:45,767 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1249, 2.4467, 4.9750, 2.9552], device='cuda:2') 2023-11-28 03:11:49,939 INFO [train_asr.py:1267] (2/4) Epoch 42, validation: loss=0.05789, simple_loss=0.05056, pruned_loss=0.005172, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-28 03:11:49,939 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 03:12:11,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3326646.6666666665, ans=0.125 2023-11-28 03:12:13,467 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499000 2023-11-28 03:12:24,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3326713.3333333335, ans=0.2 2023-11-28 03:12:29,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3326713.3333333335, ans=0.0 2023-11-28 03:12:32,238 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:12:46,915 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6050, loss[loss=0.05363, simple_loss=0.07227, pruned_loss=0.009338, audio_tagging_loss=0.008159, over 15912.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08893, pruned_loss=0.01211, audio_tagging_loss=0.008645, over 3063082.40 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:12:58,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3326913.3333333335, ans=0.0 2023-11-28 03:13:10,388 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499050 2023-11-28 03:13:23,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3327046.6666666665, ans=0.125 2023-11-28 03:13:23,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3327046.6666666665, ans=0.0 2023-11-28 03:13:25,349 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2023-11-28 03:13:31,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3327113.3333333335, ans=0.04949747468305833 2023-11-28 03:13:40,298 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2023-11-28 03:13:44,243 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.384e+01 8.818e+01 9.290e+01 9.982e+01 1.282e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-28 03:13:44,268 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6100, loss[loss=0.0811, simple_loss=0.1115, pruned_loss=0.01849, audio_tagging_loss=0.006854, over 15116.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.0893, pruned_loss=0.01229, audio_tagging_loss=0.008625, over 3063279.64 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:13:51,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3327180.0, ans=0.125 2023-11-28 03:14:08,804 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499100 2023-11-28 03:14:14,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.72 vs. limit=22.5 2023-11-28 03:14:15,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3327313.3333333335, ans=0.0 2023-11-28 03:14:15,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3327313.3333333335, ans=0.5 2023-11-28 03:14:41,475 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6150, loss[loss=0.06729, simple_loss=0.09428, pruned_loss=0.01345, audio_tagging_loss=0.006701, over 15103.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.089, pruned_loss=0.01222, audio_tagging_loss=0.008741, over 3058428.36 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:15:06,124 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499150 2023-11-28 03:15:09,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3327646.6666666665, ans=0.2 2023-11-28 03:15:16,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3327713.3333333335, ans=0.125 2023-11-28 03:15:39,213 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6200, loss[loss=0.05536, simple_loss=0.06831, pruned_loss=0.00844, audio_tagging_loss=0.01276, over 14737.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08952, pruned_loss=0.01231, audio_tagging_loss=0.008748, over 3055764.71 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:15:40,272 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.163e+01 8.658e+01 9.318e+01 1.003e+02 1.390e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-28 03:15:40,501 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:15:47,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3327846.6666666665, ans=0.125 2023-11-28 03:15:50,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3327913.3333333335, ans=0.2 2023-11-28 03:16:02,947 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499200 2023-11-28 03:16:27,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3328113.3333333335, ans=0.0 2023-11-28 03:16:32,649 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2023-11-28 03:16:36,593 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6250, loss[loss=0.06333, simple_loss=0.08721, pruned_loss=0.01012, audio_tagging_loss=0.009606, over 14304.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09017, pruned_loss=0.01246, audio_tagging_loss=0.008787, over 3058601.84 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:17:00,535 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499250 2023-11-28 03:17:02,301 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=22.5 2023-11-28 03:17:17,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3328380.0, ans=0.125 2023-11-28 03:17:20,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3328380.0, ans=0.125 2023-11-28 03:17:33,309 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6300, loss[loss=0.07091, simple_loss=0.0979, pruned_loss=0.01376, audio_tagging_loss=0.0082, over 16296.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08963, pruned_loss=0.01236, audio_tagging_loss=0.008912, over 3059698.44 frames. ], batch size: 63, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:17:34,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.443e+01 9.160e+01 9.772e+01 1.060e+02 1.350e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-28 03:17:45,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3328580.0, ans=0.0 2023-11-28 03:17:54,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3328580.0, ans=0.1 2023-11-28 03:17:58,595 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499300 2023-11-28 03:18:05,803 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.68 vs. limit=22.5 2023-11-28 03:18:06,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3328646.6666666665, ans=0.0 2023-11-28 03:18:06,540 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.46 vs. limit=22.5 2023-11-28 03:18:17,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3328713.3333333335, ans=0.0 2023-11-28 03:18:22,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3328780.0, ans=0.0 2023-11-28 03:18:30,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3328846.6666666665, ans=0.2 2023-11-28 03:18:31,042 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6350, loss[loss=0.0609, simple_loss=0.09023, pruned_loss=0.009591, audio_tagging_loss=0.006192, over 14407.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08998, pruned_loss=0.01245, audio_tagging_loss=0.008943, over 3053389.02 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:18:36,655 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2023-11-28 03:18:46,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3328913.3333333335, ans=0.125 2023-11-28 03:18:55,219 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499350 2023-11-28 03:19:10,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3329046.6666666665, ans=0.125 2023-11-28 03:19:10,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3329046.6666666665, ans=0.125 2023-11-28 03:19:29,067 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6400, loss[loss=0.04968, simple_loss=0.06183, pruned_loss=0.00609, audio_tagging_loss=0.01268, over 13828.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.0892, pruned_loss=0.01218, audio_tagging_loss=0.009003, over 3046051.94 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:19:30,176 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 8.920e+01 9.509e+01 1.018e+02 1.569e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 03:19:51,371 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.72 vs. limit=10.0 2023-11-28 03:19:52,944 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499400 2023-11-28 03:19:55,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3329313.3333333335, ans=0.035 2023-11-28 03:20:13,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3329380.0, ans=0.1 2023-11-28 03:20:26,013 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6450, loss[loss=0.07998, simple_loss=0.1149, pruned_loss=0.0162, audio_tagging_loss=0.006328, over 16189.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.0895, pruned_loss=0.01215, audio_tagging_loss=0.008947, over 3047245.74 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:20:40,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3329580.0, ans=0.125 2023-11-28 03:20:44,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3329580.0, ans=0.125 2023-11-28 03:20:49,873 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499450 2023-11-28 03:21:01,652 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:21:02,017 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2023-11-28 03:21:07,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3329713.3333333335, ans=0.1 2023-11-28 03:21:23,068 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6500, loss[loss=0.06545, simple_loss=0.09314, pruned_loss=0.01039, audio_tagging_loss=0.008491, over 14718.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08984, pruned_loss=0.01215, audio_tagging_loss=0.008861, over 3041986.29 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:21:25,260 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.737e+01 9.352e+01 9.973e+01 1.217e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 03:21:45,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3329980.0, ans=0.2 2023-11-28 03:21:47,142 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499500 2023-11-28 03:21:47,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3329980.0, ans=0.125 2023-11-28 03:22:07,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3330113.3333333335, ans=0.125 2023-11-28 03:22:07,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3330113.3333333335, ans=0.125 2023-11-28 03:22:20,277 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6550, loss[loss=0.0733, simple_loss=0.1122, pruned_loss=0.01203, audio_tagging_loss=0.005158, over 15853.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09065, pruned_loss=0.01237, audio_tagging_loss=0.008737, over 3048461.27 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:22:23,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3330180.0, ans=0.125 2023-11-28 03:22:38,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3330246.6666666665, ans=0.125 2023-11-28 03:22:42,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3330313.3333333335, ans=0.125 2023-11-28 03:22:44,207 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499550 2023-11-28 03:22:46,879 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=22.5 2023-11-28 03:22:49,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3330313.3333333335, ans=0.0 2023-11-28 03:22:50,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3330313.3333333335, ans=0.2 2023-11-28 03:22:57,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3330380.0, ans=0.0 2023-11-28 03:23:09,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3330446.6666666665, ans=0.0 2023-11-28 03:23:13,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3330446.6666666665, ans=0.125 2023-11-28 03:23:16,573 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6600, loss[loss=0.0685, simple_loss=0.09303, pruned_loss=0.01493, audio_tagging_loss=0.007054, over 15426.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09025, pruned_loss=0.0123, audio_tagging_loss=0.008639, over 3048920.60 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:23:19,847 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.105e+01 8.958e+01 9.376e+01 9.845e+01 1.305e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-28 03:23:31,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3330580.0, ans=0.2 2023-11-28 03:23:40,481 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499600 2023-11-28 03:23:41,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3330646.6666666665, ans=0.0 2023-11-28 03:23:59,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3330713.3333333335, ans=0.125 2023-11-28 03:24:09,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3330780.0, ans=0.1 2023-11-28 03:24:14,472 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6650, loss[loss=0.05779, simple_loss=0.08511, pruned_loss=0.007962, audio_tagging_loss=0.007274, over 14171.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08991, pruned_loss=0.01214, audio_tagging_loss=0.008584, over 3044567.48 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:24:16,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3330846.6666666665, ans=0.125 2023-11-28 03:24:31,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3330913.3333333335, ans=0.125 2023-11-28 03:24:36,467 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:24:38,502 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499650 2023-11-28 03:25:10,988 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6700, loss[loss=0.08143, simple_loss=0.1075, pruned_loss=0.0201, audio_tagging_loss=0.007566, over 15628.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.09049, pruned_loss=0.01237, audio_tagging_loss=0.008528, over 3038032.11 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:25:14,824 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.118e+01 8.626e+01 9.557e+01 1.018e+02 1.449e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 03:25:21,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3331180.0, ans=0.0 2023-11-28 03:25:27,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3331246.6666666665, ans=0.0 2023-11-28 03:25:36,070 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499700 2023-11-28 03:25:41,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3331313.3333333335, ans=0.125 2023-11-28 03:25:42,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3331313.3333333335, ans=0.125 2023-11-28 03:25:43,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3331313.3333333335, ans=0.0 2023-11-28 03:26:03,569 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.78 vs. limit=15.0 2023-11-28 03:26:08,917 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6750, loss[loss=0.06747, simple_loss=0.09014, pruned_loss=0.01175, audio_tagging_loss=0.01065, over 14841.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09014, pruned_loss=0.01224, audio_tagging_loss=0.00857, over 3039925.46 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:26:09,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3331513.3333333335, ans=0.125 2023-11-28 03:26:12,752 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.03 vs. limit=10.0 2023-11-28 03:26:13,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3331513.3333333335, ans=0.125 2023-11-28 03:26:30,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3331646.6666666665, ans=0.0 2023-11-28 03:26:32,839 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499750 2023-11-28 03:26:35,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3331646.6666666665, ans=0.125 2023-11-28 03:26:43,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3331713.3333333335, ans=0.125 2023-11-28 03:26:52,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3331713.3333333335, ans=0.05 2023-11-28 03:27:00,875 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-11-28 03:27:06,687 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6800, loss[loss=0.05121, simple_loss=0.06097, pruned_loss=0.00853, audio_tagging_loss=0.0122, over 15730.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08982, pruned_loss=0.01218, audio_tagging_loss=0.008667, over 3041425.63 frames. ], batch size: 61, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:27:09,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3331846.6666666665, ans=0.125 2023-11-28 03:27:10,005 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.683e+01 9.159e+01 9.907e+01 1.833e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-28 03:27:22,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3331913.3333333335, ans=0.0 2023-11-28 03:27:24,885 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=15.0 2023-11-28 03:27:25,887 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-28 03:27:30,267 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499800 2023-11-28 03:27:34,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3331980.0, ans=0.125 2023-11-28 03:27:43,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3332046.6666666665, ans=0.125 2023-11-28 03:27:59,877 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.14 vs. limit=15.0 2023-11-28 03:28:03,799 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6850, loss[loss=0.06447, simple_loss=0.08669, pruned_loss=0.01266, audio_tagging_loss=0.008463, over 15200.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08954, pruned_loss=0.01227, audio_tagging_loss=0.008688, over 3029834.89 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:28:07,370 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:28:28,020 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499850 2023-11-28 03:28:47,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3332380.0, ans=0.125 2023-11-28 03:29:00,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3332513.3333333335, ans=0.125 2023-11-28 03:29:01,367 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6900, loss[loss=0.08465, simple_loss=0.1137, pruned_loss=0.02054, audio_tagging_loss=0.007234, over 15996.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08977, pruned_loss=0.01248, audio_tagging_loss=0.008694, over 3039775.02 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:29:07,515 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.595e+01 9.072e+01 9.849e+01 1.232e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-28 03:29:07,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3332513.3333333335, ans=0.125 2023-11-28 03:29:09,023 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.16 vs. limit=10.0 2023-11-28 03:29:09,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3332513.3333333335, ans=0.05 2023-11-28 03:29:12,481 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2023-11-28 03:29:15,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3332580.0, ans=0.2 2023-11-28 03:29:16,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3332580.0, ans=0.125 2023-11-28 03:29:21,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3332580.0, ans=0.125 2023-11-28 03:29:25,800 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499900 2023-11-28 03:29:47,263 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:29:47,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3332780.0, ans=0.125 2023-11-28 03:29:54,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3332780.0, ans=0.0 2023-11-28 03:29:59,739 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 6950, loss[loss=0.06226, simple_loss=0.09119, pruned_loss=0.01175, audio_tagging_loss=0.004918, over 14308.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08948, pruned_loss=0.01228, audio_tagging_loss=0.008722, over 3043364.59 frames. ], batch size: 53, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:30:13,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3332913.3333333335, ans=0.0 2023-11-28 03:30:23,202 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 499950 2023-11-28 03:30:27,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3332980.0, ans=0.125 2023-11-28 03:30:32,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3333046.6666666665, ans=0.125 2023-11-28 03:30:42,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3333046.6666666665, ans=0.1 2023-11-28 03:30:55,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3333180.0, ans=0.95 2023-11-28 03:30:56,302 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7000, loss[loss=0.0565, simple_loss=0.0857, pruned_loss=0.00686, audio_tagging_loss=0.006793, over 14442.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08926, pruned_loss=0.01218, audio_tagging_loss=0.008714, over 3045681.30 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:31:01,686 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.564e+01 9.211e+01 9.659e+01 1.272e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-28 03:31:20,413 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500000 2023-11-28 03:31:25,648 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.03 vs. limit=12.0 2023-11-28 03:31:30,479 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2023-11-28 03:31:38,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3333380.0, ans=0.1 2023-11-28 03:31:41,348 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2023-11-28 03:31:44,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3333446.6666666665, ans=0.1 2023-11-28 03:31:55,576 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7050, loss[loss=0.05144, simple_loss=0.06806, pruned_loss=0.008572, audio_tagging_loss=0.008838, over 15427.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09007, pruned_loss=0.01218, audio_tagging_loss=0.008664, over 3047704.66 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:32:16,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3333580.0, ans=0.125 2023-11-28 03:32:18,994 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:32:19,935 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500050 2023-11-28 03:32:21,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3333646.6666666665, ans=0.0 2023-11-28 03:32:39,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3333713.3333333335, ans=0.125 2023-11-28 03:32:52,923 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7100, loss[loss=0.05251, simple_loss=0.07589, pruned_loss=0.006744, audio_tagging_loss=0.007822, over 14635.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08899, pruned_loss=0.01216, audio_tagging_loss=0.008762, over 3048919.98 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:32:58,780 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.626e+01 8.733e+01 9.408e+01 1.010e+02 1.480e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 03:33:03,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3333913.3333333335, ans=22.5 2023-11-28 03:33:06,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3333913.3333333335, ans=0.125 2023-11-28 03:33:16,488 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500100 2023-11-28 03:33:49,672 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7150, loss[loss=0.04062, simple_loss=0.05364, pruned_loss=0.00401, audio_tagging_loss=0.009789, over 14442.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08905, pruned_loss=0.01222, audio_tagging_loss=0.008774, over 3043268.42 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:34:10,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3334246.6666666665, ans=0.0 2023-11-28 03:34:13,253 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500150 2023-11-28 03:34:14,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3334313.3333333335, ans=0.0 2023-11-28 03:34:28,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3334380.0, ans=0.125 2023-11-28 03:34:30,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3334380.0, ans=0.125 2023-11-28 03:34:43,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3334446.6666666665, ans=0.0 2023-11-28 03:34:43,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3334446.6666666665, ans=0.125 2023-11-28 03:34:45,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3334513.3333333335, ans=0.1 2023-11-28 03:34:46,536 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7200, loss[loss=0.06158, simple_loss=0.08667, pruned_loss=0.008625, audio_tagging_loss=0.00962, over 15175.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08963, pruned_loss=0.01224, audio_tagging_loss=0.008799, over 3042725.52 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:34:47,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3334513.3333333335, ans=0.2 2023-11-28 03:34:51,917 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.895e+01 9.379e+01 1.001e+02 1.500e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 03:34:52,111 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:35:04,250 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:35:10,598 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500200 2023-11-28 03:35:13,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3334646.6666666665, ans=0.0 2023-11-28 03:35:14,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.93 vs. limit=15.0 2023-11-28 03:35:36,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3334780.0, ans=0.1 2023-11-28 03:35:36,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3334780.0, ans=0.2 2023-11-28 03:35:38,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3334780.0, ans=0.125 2023-11-28 03:35:43,149 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7250, loss[loss=0.07124, simple_loss=0.1013, pruned_loss=0.0121, audio_tagging_loss=0.008508, over 15618.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08963, pruned_loss=0.01216, audio_tagging_loss=0.008825, over 3038911.45 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:36:07,227 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500250 2023-11-28 03:36:27,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3335113.3333333335, ans=0.125 2023-11-28 03:36:40,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3335180.0, ans=0.2 2023-11-28 03:36:40,977 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7300, loss[loss=0.06034, simple_loss=0.08493, pruned_loss=0.01047, audio_tagging_loss=0.007403, over 14605.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08897, pruned_loss=0.01213, audio_tagging_loss=0.008891, over 3035764.38 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:36:46,354 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.677e+01 9.313e+01 1.019e+02 2.186e+02, threshold=1.863e+02, percent-clipped=1.0 2023-11-28 03:36:47,896 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=22.5 2023-11-28 03:36:48,929 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:36:58,233 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:36:58,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3335246.6666666665, ans=0.125 2023-11-28 03:37:04,792 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500300 2023-11-28 03:37:08,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3335313.3333333335, ans=0.05 2023-11-28 03:37:16,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3335380.0, ans=0.125 2023-11-28 03:37:21,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3335380.0, ans=0.125 2023-11-28 03:37:28,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3335446.6666666665, ans=0.025 2023-11-28 03:37:31,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3335446.6666666665, ans=0.0 2023-11-28 03:37:32,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3335446.6666666665, ans=0.2 2023-11-28 03:37:38,146 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7350, loss[loss=0.0737, simple_loss=0.1016, pruned_loss=0.01552, audio_tagging_loss=0.007392, over 14985.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08901, pruned_loss=0.01219, audio_tagging_loss=0.00878, over 3036066.26 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:38:02,883 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500350 2023-11-28 03:38:10,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3335646.6666666665, ans=0.125 2023-11-28 03:38:14,807 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2023-11-28 03:38:22,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3335713.3333333335, ans=0.0 2023-11-28 03:38:30,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3335780.0, ans=0.0 2023-11-28 03:38:35,810 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7400, loss[loss=0.06282, simple_loss=0.07692, pruned_loss=0.01278, audio_tagging_loss=0.01158, over 14702.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08969, pruned_loss=0.01245, audio_tagging_loss=0.008622, over 3037386.09 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:38:39,528 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=12.0 2023-11-28 03:38:43,240 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.183e+01 8.811e+01 9.404e+01 1.022e+02 2.241e+02, threshold=1.881e+02, percent-clipped=1.0 2023-11-28 03:38:49,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3335913.3333333335, ans=0.125 2023-11-28 03:38:57,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3335913.3333333335, ans=0.125 2023-11-28 03:39:00,569 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500400 2023-11-28 03:39:20,641 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.06 vs. limit=15.0 2023-11-28 03:39:23,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3336113.3333333335, ans=0.0 2023-11-28 03:39:34,651 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7450, loss[loss=0.05855, simple_loss=0.08086, pruned_loss=0.009744, audio_tagging_loss=0.008379, over 13677.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08989, pruned_loss=0.01249, audio_tagging_loss=0.008583, over 3033278.41 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:39:57,496 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2023-11-28 03:39:58,198 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500450 2023-11-28 03:40:20,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3336446.6666666665, ans=0.125 2023-11-28 03:40:31,117 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7500, loss[loss=0.06484, simple_loss=0.0894, pruned_loss=0.01067, audio_tagging_loss=0.009469, over 14998.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09028, pruned_loss=0.01249, audio_tagging_loss=0.008603, over 3033772.18 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:40:38,136 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.849e+01 9.074e+01 9.605e+01 1.016e+02 1.899e+02, threshold=1.921e+02, percent-clipped=1.0 2023-11-28 03:40:55,791 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500500 2023-11-28 03:40:57,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3336646.6666666665, ans=0.2 2023-11-28 03:41:24,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3336780.0, ans=0.125 2023-11-28 03:41:25,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3336780.0, ans=0.2 2023-11-28 03:41:28,464 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7550, loss[loss=0.0543, simple_loss=0.0753, pruned_loss=0.007823, audio_tagging_loss=0.008831, over 14741.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09021, pruned_loss=0.01251, audio_tagging_loss=0.00855, over 3036263.90 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 8.0 2023-11-28 03:41:28,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3336846.6666666665, ans=0.0 2023-11-28 03:41:51,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3336980.0, ans=0.5 2023-11-28 03:41:52,869 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500550 2023-11-28 03:42:05,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3337046.6666666665, ans=0.125 2023-11-28 03:42:06,207 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2023-11-28 03:42:07,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3337046.6666666665, ans=0.2 2023-11-28 03:42:10,477 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.46 vs. limit=15.0 2023-11-28 03:42:18,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3337113.3333333335, ans=0.0 2023-11-28 03:42:26,187 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7600, loss[loss=0.06136, simple_loss=0.0789, pruned_loss=0.01184, audio_tagging_loss=0.01007, over 15097.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08885, pruned_loss=0.0122, audio_tagging_loss=0.008586, over 3036576.05 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:42:30,187 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-11-28 03:42:32,814 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 8.828e+01 9.447e+01 1.020e+02 1.254e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 03:42:47,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3337246.6666666665, ans=0.0 2023-11-28 03:42:48,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3337313.3333333335, ans=0.125 2023-11-28 03:42:50,601 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500600 2023-11-28 03:43:22,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3337446.6666666665, ans=0.125 2023-11-28 03:43:22,213 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=15.0 2023-11-28 03:43:23,925 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7650, loss[loss=0.05814, simple_loss=0.07702, pruned_loss=0.009719, audio_tagging_loss=0.009915, over 15590.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08979, pruned_loss=0.01239, audio_tagging_loss=0.008521, over 3042341.88 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:43:31,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3337513.3333333335, ans=0.125 2023-11-28 03:43:48,244 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500650 2023-11-28 03:44:13,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3337780.0, ans=0.1 2023-11-28 03:44:14,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3337780.0, ans=0.1 2023-11-28 03:44:18,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3337780.0, ans=0.125 2023-11-28 03:44:21,213 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7700, loss[loss=0.05758, simple_loss=0.07895, pruned_loss=0.009082, audio_tagging_loss=0.009021, over 16361.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08946, pruned_loss=0.01243, audio_tagging_loss=0.008611, over 3044002.84 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:44:27,646 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.039e+01 8.661e+01 9.049e+01 9.903e+01 1.330e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-28 03:44:27,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3337846.6666666665, ans=0.125 2023-11-28 03:44:44,874 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500700 2023-11-28 03:45:12,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3338113.3333333335, ans=0.125 2023-11-28 03:45:18,656 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7750, loss[loss=0.07876, simple_loss=0.1121, pruned_loss=0.01591, audio_tagging_loss=0.006799, over 15055.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09062, pruned_loss=0.01258, audio_tagging_loss=0.008698, over 3045793.80 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:45:20,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3338180.0, ans=0.1 2023-11-28 03:45:43,004 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500750 2023-11-28 03:45:44,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=15.0 2023-11-28 03:46:15,549 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7800, loss[loss=0.05294, simple_loss=0.06396, pruned_loss=0.01097, audio_tagging_loss=0.009995, over 14527.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08951, pruned_loss=0.01252, audio_tagging_loss=0.008723, over 3031366.41 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:46:22,527 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 8.833e+01 9.588e+01 1.059e+02 1.292e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 03:46:30,483 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2023-11-28 03:46:34,782 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.74 vs. limit=22.5 2023-11-28 03:46:39,827 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500800 2023-11-28 03:46:42,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3338646.6666666665, ans=0.0 2023-11-28 03:47:04,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3338780.0, ans=0.1 2023-11-28 03:47:13,889 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7850, loss[loss=0.05677, simple_loss=0.08099, pruned_loss=0.00895, audio_tagging_loss=0.007326, over 14928.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.08979, pruned_loss=0.01272, audio_tagging_loss=0.008784, over 3033662.03 frames. ], batch size: 57, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:47:14,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3338846.6666666665, ans=0.1 2023-11-28 03:47:19,935 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=15.0 2023-11-28 03:47:32,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3338913.3333333335, ans=0.0 2023-11-28 03:47:37,932 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500850 2023-11-28 03:48:10,413 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7900, loss[loss=0.05481, simple_loss=0.06499, pruned_loss=0.01303, audio_tagging_loss=0.009279, over 14729.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08969, pruned_loss=0.01265, audio_tagging_loss=0.008845, over 3035055.02 frames. ], batch size: 58, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:48:17,454 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.758e+01 9.324e+01 1.005e+02 1.322e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 03:48:26,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3339246.6666666665, ans=0.04949747468305833 2023-11-28 03:48:33,959 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500900 2023-11-28 03:48:35,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3339313.3333333335, ans=0.125 2023-11-28 03:48:47,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2023-11-28 03:48:55,397 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=15.0 2023-11-28 03:49:06,820 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 7950, loss[loss=0.08094, simple_loss=0.1124, pruned_loss=0.01616, audio_tagging_loss=0.008591, over 15378.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08935, pruned_loss=0.01255, audio_tagging_loss=0.008987, over 3038999.85 frames. ], batch size: 54, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:49:22,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3339580.0, ans=0.2 2023-11-28 03:49:24,546 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:49:31,084 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 500950 2023-11-28 03:49:32,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3339646.6666666665, ans=0.1 2023-11-28 03:49:37,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3339646.6666666665, ans=0.125 2023-11-28 03:50:04,223 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8000, loss[loss=0.06708, simple_loss=0.09873, pruned_loss=0.01065, audio_tagging_loss=0.007069, over 15098.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.08974, pruned_loss=0.01261, audio_tagging_loss=0.009047, over 3042278.65 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:50:11,484 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 8.539e+01 9.143e+01 9.818e+01 1.375e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-28 03:50:23,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3339913.3333333335, ans=0.125 2023-11-28 03:50:28,946 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501000 2023-11-28 03:50:42,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3340046.6666666665, ans=0.125 2023-11-28 03:50:49,862 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.67 vs. limit=12.0 2023-11-28 03:50:52,533 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:51:02,056 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8050, loss[loss=0.06637, simple_loss=0.09046, pruned_loss=0.01274, audio_tagging_loss=0.008399, over 15064.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08857, pruned_loss=0.0123, audio_tagging_loss=0.009073, over 3042681.10 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:51:03,516 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.44 vs. limit=10.0 2023-11-28 03:51:05,659 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=12.0 2023-11-28 03:51:19,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3340246.6666666665, ans=0.2 2023-11-28 03:51:21,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3340246.6666666665, ans=0.125 2023-11-28 03:51:26,188 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501050 2023-11-28 03:51:43,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3340380.0, ans=0.2 2023-11-28 03:51:58,513 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=15.0 2023-11-28 03:52:00,071 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8100, loss[loss=0.06235, simple_loss=0.09025, pruned_loss=0.01102, audio_tagging_loss=0.006203, over 14781.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.08942, pruned_loss=0.01257, audio_tagging_loss=0.008901, over 3043402.42 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:52:04,719 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=22.5 2023-11-28 03:52:07,637 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.157e+01 8.647e+01 9.377e+01 1.005e+02 1.143e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-28 03:52:10,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3340580.0, ans=0.125 2023-11-28 03:52:22,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3340646.6666666665, ans=0.0 2023-11-28 03:52:24,109 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501100 2023-11-28 03:52:26,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3340646.6666666665, ans=0.0 2023-11-28 03:52:56,855 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8150, loss[loss=0.05262, simple_loss=0.07029, pruned_loss=0.008159, audio_tagging_loss=0.009315, over 14268.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09028, pruned_loss=0.01274, audio_tagging_loss=0.00871, over 3043579.93 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:53:02,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3340846.6666666665, ans=0.125 2023-11-28 03:53:08,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3340913.3333333335, ans=0.0 2023-11-28 03:53:09,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3340913.3333333335, ans=0.125 2023-11-28 03:53:19,913 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.71 vs. limit=15.0 2023-11-28 03:53:20,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3340980.0, ans=0.05 2023-11-28 03:53:21,309 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501150 2023-11-28 03:53:34,769 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=22.5 2023-11-28 03:53:49,470 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=15.0 2023-11-28 03:53:53,995 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8200, loss[loss=0.08224, simple_loss=0.1199, pruned_loss=0.01287, audio_tagging_loss=0.00942, over 15791.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09019, pruned_loss=0.01258, audio_tagging_loss=0.008747, over 3046539.55 frames. ], batch size: 59, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:53:54,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3341180.0, ans=0.125 2023-11-28 03:53:57,341 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 03:54:02,498 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 8.802e+01 9.578e+01 1.025e+02 1.373e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 03:54:03,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3341180.0, ans=0.0 2023-11-28 03:54:05,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3341246.6666666665, ans=6.0 2023-11-28 03:54:11,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3341246.6666666665, ans=0.125 2023-11-28 03:54:14,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3341246.6666666665, ans=0.1 2023-11-28 03:54:17,766 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501200 2023-11-28 03:54:39,620 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=12.0 2023-11-28 03:54:51,790 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8250, loss[loss=0.07699, simple_loss=0.1027, pruned_loss=0.01545, audio_tagging_loss=0.0102, over 14857.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.08995, pruned_loss=0.01247, audio_tagging_loss=0.008701, over 3041349.11 frames. ], batch size: 55, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:55:15,128 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501250 2023-11-28 03:55:23,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3341646.6666666665, ans=0.125 2023-11-28 03:55:24,050 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.66 vs. limit=15.0 2023-11-28 03:55:31,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3341713.3333333335, ans=0.0 2023-11-28 03:55:33,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3341713.3333333335, ans=6.0 2023-11-28 03:55:34,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3341713.3333333335, ans=0.04949747468305833 2023-11-28 03:55:40,106 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-11-28 03:55:45,518 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:55:48,579 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8300, loss[loss=0.07263, simple_loss=0.1004, pruned_loss=0.01537, audio_tagging_loss=0.007052, over 15360.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.08992, pruned_loss=0.0125, audio_tagging_loss=0.008741, over 3042528.19 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:55:54,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3341846.6666666665, ans=0.025 2023-11-28 03:55:56,894 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.433e+01 8.790e+01 9.364e+01 1.000e+02 1.308e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 03:56:11,009 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2023-11-28 03:56:13,746 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501300 2023-11-28 03:56:41,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3342113.3333333335, ans=0.0 2023-11-28 03:56:45,954 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8350, loss[loss=0.06253, simple_loss=0.08588, pruned_loss=0.01, audio_tagging_loss=0.009586, over 15783.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08958, pruned_loss=0.01228, audio_tagging_loss=0.008762, over 3048917.40 frames. ], batch size: 56, lr: 1.61e-03, grad_scale: 16.0 2023-11-28 03:56:48,686 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.56 vs. limit=10.0 2023-11-28 03:56:56,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3342180.0, ans=0.0 2023-11-28 03:57:10,700 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501350 2023-11-28 03:57:15,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3342313.3333333335, ans=0.0 2023-11-28 03:57:19,324 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:57:20,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3342380.0, ans=0.125 2023-11-28 03:57:21,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3342380.0, ans=0.0 2023-11-28 03:57:23,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3342380.0, ans=0.95 2023-11-28 03:57:25,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3342380.0, ans=0.125 2023-11-28 03:57:30,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3342380.0, ans=0.0 2023-11-28 03:57:43,978 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8400, loss[loss=0.03691, simple_loss=0.04372, pruned_loss=0.004383, audio_tagging_loss=0.01067, over 15159.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08954, pruned_loss=0.01221, audio_tagging_loss=0.008684, over 3048064.65 frames. ], batch size: 60, lr: 1.61e-03, grad_scale: 32.0 2023-11-28 03:57:48,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.72 vs. limit=10.0 2023-11-28 03:57:51,645 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.717e+01 8.873e+01 9.503e+01 1.023e+02 1.226e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 03:57:56,411 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 03:58:07,671 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501400 2023-11-28 03:58:14,153 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.24 vs. limit=12.0 2023-11-28 03:58:41,309 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8450, loss[loss=0.05953, simple_loss=0.07692, pruned_loss=0.01222, audio_tagging_loss=0.008847, over 14890.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08939, pruned_loss=0.01226, audio_tagging_loss=0.00869, over 3047777.81 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 03:59:05,841 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501450 2023-11-28 03:59:36,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3343113.3333333335, ans=0.1 2023-11-28 03:59:39,103 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8500, loss[loss=0.06051, simple_loss=0.08664, pruned_loss=0.007085, audio_tagging_loss=0.01011, over 15265.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08829, pruned_loss=0.01214, audio_tagging_loss=0.008759, over 3050960.22 frames. ], batch size: 62, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 03:59:46,770 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.888e+01 9.285e+01 1.024e+02 1.288e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 04:00:03,297 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501500 2023-11-28 04:00:29,616 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2023-11-28 04:00:36,603 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8550, loss[loss=0.04385, simple_loss=0.05713, pruned_loss=0.005474, audio_tagging_loss=0.009811, over 16043.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08871, pruned_loss=0.01208, audio_tagging_loss=0.008757, over 3050933.82 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:00:50,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3343580.0, ans=0.0 2023-11-28 04:00:56,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3343580.0, ans=0.125 2023-11-28 04:01:00,885 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501550 2023-11-28 04:01:13,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3343713.3333333335, ans=0.0 2023-11-28 04:01:30,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3343780.0, ans=0.125 2023-11-28 04:01:33,867 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8600, loss[loss=0.06983, simple_loss=0.1079, pruned_loss=0.01113, audio_tagging_loss=0.004771, over 14882.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08785, pruned_loss=0.01198, audio_tagging_loss=0.008935, over 3042910.03 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:01:41,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3343846.6666666665, ans=0.025 2023-11-28 04:01:42,158 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.741e+01 9.411e+01 9.975e+01 1.880e+02, threshold=1.882e+02, percent-clipped=1.0 2023-11-28 04:01:53,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3343913.3333333335, ans=0.0 2023-11-28 04:01:55,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3343980.0, ans=0.125 2023-11-28 04:01:57,404 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501600 2023-11-28 04:02:08,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.65 vs. limit=15.0 2023-11-28 04:02:31,101 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8650, loss[loss=0.04848, simple_loss=0.06377, pruned_loss=0.007518, audio_tagging_loss=0.009073, over 15094.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08863, pruned_loss=0.01207, audio_tagging_loss=0.00895, over 3040726.54 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:02:48,957 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=12.0 2023-11-28 04:02:55,647 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501650 2023-11-28 04:02:57,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3344313.3333333335, ans=0.2 2023-11-28 04:03:01,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3344313.3333333335, ans=0.125 2023-11-28 04:03:13,968 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:03:19,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3344446.6666666665, ans=0.125 2023-11-28 04:03:26,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3344446.6666666665, ans=0.0 2023-11-28 04:03:28,906 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8700, loss[loss=0.05912, simple_loss=0.07969, pruned_loss=0.01189, audio_tagging_loss=0.007387, over 14365.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08898, pruned_loss=0.01228, audio_tagging_loss=0.009054, over 3034875.34 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:03:37,600 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.850e+01 9.398e+01 9.849e+01 1.274e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 04:03:38,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3344580.0, ans=0.125 2023-11-28 04:03:42,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3344580.0, ans=0.125 2023-11-28 04:03:53,021 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501700 2023-11-28 04:03:56,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3344646.6666666665, ans=0.125 2023-11-28 04:04:00,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3344646.6666666665, ans=0.125 2023-11-28 04:04:12,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3344713.3333333335, ans=0.2 2023-11-28 04:04:16,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3344780.0, ans=0.2 2023-11-28 04:04:21,640 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.72 vs. limit=22.5 2023-11-28 04:04:26,117 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8750, loss[loss=0.07864, simple_loss=0.1117, pruned_loss=0.01463, audio_tagging_loss=0.008173, over 16482.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09035, pruned_loss=0.01217, audio_tagging_loss=0.009022, over 3043363.44 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:04:49,565 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501750 2023-11-28 04:04:57,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3344980.0, ans=0.5 2023-11-28 04:05:04,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3345046.6666666665, ans=0.0 2023-11-28 04:05:22,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3345180.0, ans=0.2 2023-11-28 04:05:22,943 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8800, loss[loss=0.04895, simple_loss=0.06227, pruned_loss=0.006942, audio_tagging_loss=0.01087, over 14650.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09121, pruned_loss=0.01227, audio_tagging_loss=0.008989, over 3045838.96 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:05:31,631 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.835e+01 9.360e+01 1.012e+02 1.261e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 04:05:43,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3345246.6666666665, ans=0.2 2023-11-28 04:05:46,830 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501800 2023-11-28 04:05:49,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3345313.3333333335, ans=0.125 2023-11-28 04:05:52,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3345313.3333333335, ans=0.125 2023-11-28 04:05:58,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.96 vs. limit=6.0 2023-11-28 04:06:10,651 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.78 vs. limit=6.0 2023-11-28 04:06:13,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3345446.6666666665, ans=0.0 2023-11-28 04:06:19,645 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8850, loss[loss=0.06461, simple_loss=0.09415, pruned_loss=0.009389, audio_tagging_loss=0.008148, over 15880.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09099, pruned_loss=0.0123, audio_tagging_loss=0.008989, over 3047333.56 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:06:34,711 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:06:41,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3345580.0, ans=0.0 2023-11-28 04:06:42,468 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.66 vs. limit=15.0 2023-11-28 04:06:44,139 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501850 2023-11-28 04:06:52,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3345646.6666666665, ans=0.125 2023-11-28 04:07:09,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3345780.0, ans=0.125 2023-11-28 04:07:09,517 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.09 vs. limit=12.0 2023-11-28 04:07:14,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3345780.0, ans=0.0 2023-11-28 04:07:15,069 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.87 vs. limit=15.0 2023-11-28 04:07:16,660 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8900, loss[loss=0.07961, simple_loss=0.1105, pruned_loss=0.01791, audio_tagging_loss=0.006437, over 13969.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08985, pruned_loss=0.01217, audio_tagging_loss=0.008982, over 3046748.89 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:07:19,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3345846.6666666665, ans=0.125 2023-11-28 04:07:25,984 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.854e+01 9.513e+01 9.955e+01 1.488e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 04:07:28,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3345913.3333333335, ans=0.1 2023-11-28 04:07:40,797 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501900 2023-11-28 04:07:44,832 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.75 vs. limit=10.0 2023-11-28 04:07:45,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3345980.0, ans=0.125 2023-11-28 04:07:47,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3345980.0, ans=0.2 2023-11-28 04:08:14,203 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 8950, loss[loss=0.06218, simple_loss=0.09431, pruned_loss=0.007922, audio_tagging_loss=0.007102, over 15904.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08999, pruned_loss=0.01211, audio_tagging_loss=0.008849, over 3051806.04 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:08:17,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3346180.0, ans=0.0 2023-11-28 04:08:30,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3346246.6666666665, ans=0.125 2023-11-28 04:08:31,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3346246.6666666665, ans=0.125 2023-11-28 04:08:35,036 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2023-11-28 04:08:37,874 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 501950 2023-11-28 04:09:04,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3346446.6666666665, ans=0.2 2023-11-28 04:09:10,176 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9000, loss[loss=0.06218, simple_loss=0.09059, pruned_loss=0.01084, audio_tagging_loss=0.006042, over 15954.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09025, pruned_loss=0.01228, audio_tagging_loss=0.008658, over 3055496.96 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:09:10,176 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 04:09:32,711 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6000, 3.6335, 3.8874, 3.4689], device='cuda:2') 2023-11-28 04:09:44,954 INFO [train_asr.py:1267] (2/4) Epoch 42, validation: loss=0.05915, simple_loss=0.05063, pruned_loss=0.005264, audio_tagging_loss=0.02857, over 4681554.00 frames. 2023-11-28 04:09:44,954 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 04:09:53,112 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=12.0 2023-11-28 04:09:54,286 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 8.664e+01 9.503e+01 1.037e+02 1.475e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 04:10:06,358 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2023-11-28 04:10:07,703 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=15.0 2023-11-28 04:10:09,069 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502000 2023-11-28 04:10:17,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3346646.6666666665, ans=0.125 2023-11-28 04:10:28,968 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-11-28 04:10:29,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3346780.0, ans=0.2 2023-11-28 04:10:41,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3346780.0, ans=0.1 2023-11-28 04:10:43,075 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9050, loss[loss=0.05735, simple_loss=0.0805, pruned_loss=0.007591, audio_tagging_loss=0.009505, over 14332.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08961, pruned_loss=0.01207, audio_tagging_loss=0.008522, over 3056922.90 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:10:50,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3346846.6666666665, ans=0.125 2023-11-28 04:10:55,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3346913.3333333335, ans=0.0 2023-11-28 04:11:05,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3346980.0, ans=0.1 2023-11-28 04:11:06,621 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502050 2023-11-28 04:11:13,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3346980.0, ans=0.125 2023-11-28 04:11:16,060 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2023-11-28 04:11:28,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3347113.3333333335, ans=0.125 2023-11-28 04:11:33,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3347113.3333333335, ans=0.125 2023-11-28 04:11:40,120 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9100, loss[loss=0.0601, simple_loss=0.08087, pruned_loss=0.0102, audio_tagging_loss=0.00947, over 14871.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09072, pruned_loss=0.01224, audio_tagging_loss=0.008391, over 3055481.70 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:11:40,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3347180.0, ans=0.07 2023-11-28 04:11:48,890 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.691e+01 9.383e+01 1.014e+02 1.282e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 04:11:52,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3347246.6666666665, ans=0.125 2023-11-28 04:11:54,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3347246.6666666665, ans=0.125 2023-11-28 04:12:03,024 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502100 2023-11-28 04:12:14,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3347380.0, ans=0.2 2023-11-28 04:12:17,008 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=10.0 2023-11-28 04:12:17,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3347380.0, ans=0.0 2023-11-28 04:12:23,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3347380.0, ans=0.2 2023-11-28 04:12:36,158 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.32 vs. limit=22.5 2023-11-28 04:12:36,751 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9150, loss[loss=0.06777, simple_loss=0.09274, pruned_loss=0.01508, audio_tagging_loss=0.00632, over 14373.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09045, pruned_loss=0.01247, audio_tagging_loss=0.008372, over 3048189.19 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:12:50,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3347580.0, ans=0.125 2023-11-28 04:12:59,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3347646.6666666665, ans=0.1 2023-11-28 04:12:59,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3347646.6666666665, ans=0.1 2023-11-28 04:13:01,260 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502150 2023-11-28 04:13:03,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3347646.6666666665, ans=0.125 2023-11-28 04:13:11,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3347713.3333333335, ans=0.125 2023-11-28 04:13:23,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3347780.0, ans=0.0 2023-11-28 04:13:34,146 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9200, loss[loss=0.05385, simple_loss=0.07095, pruned_loss=0.009919, audio_tagging_loss=0.008461, over 15483.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09013, pruned_loss=0.0124, audio_tagging_loss=0.008457, over 3049760.67 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:13:43,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3347846.6666666665, ans=0.125 2023-11-28 04:13:44,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.904e+01 8.837e+01 9.520e+01 1.030e+02 1.268e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 04:13:47,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3347913.3333333335, ans=0.2 2023-11-28 04:13:56,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3347980.0, ans=0.125 2023-11-28 04:13:58,676 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502200 2023-11-28 04:14:05,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3347980.0, ans=0.125 2023-11-28 04:14:32,133 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9250, loss[loss=0.06089, simple_loss=0.08483, pruned_loss=0.008905, audio_tagging_loss=0.009572, over 14098.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08985, pruned_loss=0.01226, audio_tagging_loss=0.008527, over 3050761.88 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:14:55,938 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502250 2023-11-28 04:15:00,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3348313.3333333335, ans=0.125 2023-11-28 04:15:04,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3348313.3333333335, ans=0.2 2023-11-28 04:15:21,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3348446.6666666665, ans=0.0 2023-11-28 04:15:29,230 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9300, loss[loss=0.06021, simple_loss=0.08029, pruned_loss=0.008056, audio_tagging_loss=0.01201, over 15075.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08882, pruned_loss=0.01211, audio_tagging_loss=0.008628, over 3044700.09 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:15:36,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3348513.3333333335, ans=0.125 2023-11-28 04:15:37,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3348513.3333333335, ans=0.2 2023-11-28 04:15:40,871 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.853e+01 8.934e+01 9.500e+01 1.008e+02 1.455e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 04:15:53,802 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502300 2023-11-28 04:15:58,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3348646.6666666665, ans=0.2 2023-11-28 04:16:26,567 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9350, loss[loss=0.06232, simple_loss=0.08426, pruned_loss=0.009336, audio_tagging_loss=0.01085, over 14790.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08811, pruned_loss=0.0121, audio_tagging_loss=0.008775, over 3038322.13 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:16:31,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3348846.6666666665, ans=0.125 2023-11-28 04:16:39,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3348913.3333333335, ans=0.1 2023-11-28 04:16:41,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3348913.3333333335, ans=0.2 2023-11-28 04:16:50,828 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502350 2023-11-28 04:16:52,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3348980.0, ans=0.1 2023-11-28 04:17:03,399 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-28 04:17:13,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3349113.3333333335, ans=0.125 2023-11-28 04:17:23,707 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9400, loss[loss=0.07335, simple_loss=0.101, pruned_loss=0.01613, audio_tagging_loss=0.006734, over 15775.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08907, pruned_loss=0.01229, audio_tagging_loss=0.008789, over 3046555.75 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:17:30,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3349180.0, ans=0.2 2023-11-28 04:17:34,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3349246.6666666665, ans=0.0 2023-11-28 04:17:35,169 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 8.937e+01 9.623e+01 1.033e+02 2.333e+02, threshold=1.925e+02, percent-clipped=1.0 2023-11-28 04:17:40,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3349246.6666666665, ans=0.125 2023-11-28 04:17:41,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3349246.6666666665, ans=0.125 2023-11-28 04:17:47,533 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502400 2023-11-28 04:17:48,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3349313.3333333335, ans=0.1 2023-11-28 04:17:56,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3349313.3333333335, ans=0.2 2023-11-28 04:17:58,668 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:18:10,458 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=12.0 2023-11-28 04:18:21,493 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9450, loss[loss=0.07452, simple_loss=0.101, pruned_loss=0.01539, audio_tagging_loss=0.008632, over 14985.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08986, pruned_loss=0.01233, audio_tagging_loss=0.008826, over 3046139.47 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:18:22,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3349513.3333333335, ans=0.025 2023-11-28 04:18:23,684 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:18:31,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3349580.0, ans=10.0 2023-11-28 04:18:45,171 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502450 2023-11-28 04:18:47,846 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2023-11-28 04:18:52,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3349646.6666666665, ans=0.05 2023-11-28 04:18:57,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3349713.3333333335, ans=0.0 2023-11-28 04:19:04,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3349713.3333333335, ans=0.1 2023-11-28 04:19:18,885 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9500, loss[loss=0.06482, simple_loss=0.08455, pruned_loss=0.01204, audio_tagging_loss=0.01051, over 15389.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09023, pruned_loss=0.01218, audio_tagging_loss=0.008859, over 3047373.29 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:19:20,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3349846.6666666665, ans=0.125 2023-11-28 04:19:29,914 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 8.748e+01 9.346e+01 1.036e+02 1.231e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 04:19:37,583 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.12 vs. limit=22.5 2023-11-28 04:19:38,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3349913.3333333335, ans=0.0 2023-11-28 04:19:43,120 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502500 2023-11-28 04:19:44,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3349980.0, ans=0.125 2023-11-28 04:19:44,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3349980.0, ans=0.0 2023-11-28 04:19:46,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3349980.0, ans=0.125 2023-11-28 04:19:47,868 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.01 vs. limit=10.0 2023-11-28 04:19:49,950 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2023-11-28 04:20:12,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3350113.3333333335, ans=0.125 2023-11-28 04:20:15,480 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9550, loss[loss=0.07506, simple_loss=0.1027, pruned_loss=0.01441, audio_tagging_loss=0.009286, over 15311.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09074, pruned_loss=0.0122, audio_tagging_loss=0.008857, over 3054880.31 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:20:39,845 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502550 2023-11-28 04:20:40,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3350313.3333333335, ans=0.125 2023-11-28 04:20:40,340 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=15.0 2023-11-28 04:20:43,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3350313.3333333335, ans=0.125 2023-11-28 04:20:55,317 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-11-28 04:20:58,548 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.32 vs. limit=22.5 2023-11-28 04:21:13,654 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9600, loss[loss=0.07293, simple_loss=0.09676, pruned_loss=0.01707, audio_tagging_loss=0.007473, over 15543.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09003, pruned_loss=0.01211, audio_tagging_loss=0.008914, over 3053754.91 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:21:17,464 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=15.0 2023-11-28 04:21:19,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=3350513.3333333335, ans=12.0 2023-11-28 04:21:24,513 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 8.754e+01 9.206e+01 1.000e+02 1.278e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-28 04:21:29,718 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:21:31,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3350580.0, ans=0.0 2023-11-28 04:21:36,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3350646.6666666665, ans=0.0 2023-11-28 04:21:37,313 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502600 2023-11-28 04:21:41,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3350646.6666666665, ans=0.1 2023-11-28 04:21:59,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3350780.0, ans=0.125 2023-11-28 04:22:10,872 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9650, loss[loss=0.0613, simple_loss=0.09046, pruned_loss=0.007409, audio_tagging_loss=0.008655, over 15237.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08914, pruned_loss=0.01206, audio_tagging_loss=0.008989, over 3048756.22 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:22:22,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3350913.3333333335, ans=0.0 2023-11-28 04:22:35,670 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502650 2023-11-28 04:22:39,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3350980.0, ans=0.1 2023-11-28 04:22:50,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3351046.6666666665, ans=0.125 2023-11-28 04:23:08,616 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9700, loss[loss=0.0542, simple_loss=0.07394, pruned_loss=0.0084, audio_tagging_loss=0.00883, over 15928.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.0889, pruned_loss=0.01205, audio_tagging_loss=0.00884, over 3045796.76 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:23:21,615 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 8.660e+01 9.403e+01 1.036e+02 1.751e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 04:23:27,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3351246.6666666665, ans=0.2 2023-11-28 04:23:33,223 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502700 2023-11-28 04:23:33,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3351313.3333333335, ans=0.1 2023-11-28 04:23:42,741 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:23:45,481 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.55 vs. limit=22.5 2023-11-28 04:23:55,254 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-11-28 04:24:03,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3351446.6666666665, ans=0.125 2023-11-28 04:24:06,643 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9750, loss[loss=0.05072, simple_loss=0.06132, pruned_loss=0.00873, audio_tagging_loss=0.01133, over 14913.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08861, pruned_loss=0.01195, audio_tagging_loss=0.008805, over 3037539.23 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:24:07,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3351513.3333333335, ans=0.0 2023-11-28 04:24:28,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3351646.6666666665, ans=0.04949747468305833 2023-11-28 04:24:30,793 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502750 2023-11-28 04:24:30,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3351646.6666666665, ans=0.125 2023-11-28 04:24:36,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3351646.6666666665, ans=0.2 2023-11-28 04:24:52,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3351780.0, ans=0.04949747468305833 2023-11-28 04:25:04,292 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9800, loss[loss=0.05174, simple_loss=0.06305, pruned_loss=0.008233, audio_tagging_loss=0.01199, over 16490.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08885, pruned_loss=0.01199, audio_tagging_loss=0.008777, over 3039308.87 frames. ], batch size: 65, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:25:14,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2023-11-28 04:25:16,769 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 8.861e+01 9.508e+01 1.028e+02 1.749e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 04:25:18,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3351913.3333333335, ans=0.125 2023-11-28 04:25:22,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3351913.3333333335, ans=0.2 2023-11-28 04:25:28,317 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502800 2023-11-28 04:25:28,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3351980.0, ans=0.125 2023-11-28 04:25:33,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3351980.0, ans=0.0 2023-11-28 04:25:34,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3351980.0, ans=0.0 2023-11-28 04:25:53,557 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.56 vs. limit=10.0 2023-11-28 04:25:59,720 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:26:01,883 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9850, loss[loss=0.05315, simple_loss=0.07044, pruned_loss=0.01017, audio_tagging_loss=0.007766, over 16254.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08814, pruned_loss=0.01193, audio_tagging_loss=0.008774, over 3045108.71 frames. ], batch size: 63, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:26:09,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3352180.0, ans=0.125 2023-11-28 04:26:19,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3352246.6666666665, ans=0.125 2023-11-28 04:26:26,267 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502850 2023-11-28 04:26:28,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=22.5 2023-11-28 04:26:29,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3352313.3333333335, ans=0.1 2023-11-28 04:26:41,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3352380.0, ans=0.05 2023-11-28 04:26:45,040 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2023-11-28 04:26:46,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3352446.6666666665, ans=0.125 2023-11-28 04:26:59,726 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9900, loss[loss=0.06998, simple_loss=0.09531, pruned_loss=0.01489, audio_tagging_loss=0.007434, over 16063.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08881, pruned_loss=0.01197, audio_tagging_loss=0.008738, over 3047946.03 frames. ], batch size: 61, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:27:00,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3352513.3333333335, ans=0.125 2023-11-28 04:27:12,292 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.961e+01 8.663e+01 9.354e+01 9.948e+01 1.345e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-28 04:27:14,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3352580.0, ans=0.035 2023-11-28 04:27:23,787 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502900 2023-11-28 04:27:45,035 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=15.0 2023-11-28 04:27:47,071 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2023-11-28 04:27:48,384 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2023-11-28 04:27:52,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3352780.0, ans=0.0 2023-11-28 04:27:57,186 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 9950, loss[loss=0.06138, simple_loss=0.07721, pruned_loss=0.01325, audio_tagging_loss=0.00952, over 14587.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08805, pruned_loss=0.012, audio_tagging_loss=0.008795, over 3042478.84 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:27:57,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3352846.6666666665, ans=0.0 2023-11-28 04:28:17,101 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=22.5 2023-11-28 04:28:20,916 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 502950 2023-11-28 04:28:23,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3352980.0, ans=0.1 2023-11-28 04:28:29,980 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-28 04:28:44,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3353113.3333333335, ans=0.125 2023-11-28 04:28:54,841 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10000, loss[loss=0.05375, simple_loss=0.06534, pruned_loss=0.01107, audio_tagging_loss=0.01001, over 16072.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08765, pruned_loss=0.01197, audio_tagging_loss=0.00875, over 3043730.66 frames. ], batch size: 62, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:29:08,338 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.771e+01 9.442e+01 1.017e+02 1.444e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 04:29:11,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3353246.6666666665, ans=0.1 2023-11-28 04:29:18,693 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503000 2023-11-28 04:29:21,680 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.39 vs. limit=15.0 2023-11-28 04:29:38,302 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.85 vs. limit=15.0 2023-11-28 04:29:38,411 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2023-11-28 04:29:45,745 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2023-11-28 04:29:51,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3353513.3333333335, ans=0.125 2023-11-28 04:29:52,377 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10050, loss[loss=0.06269, simple_loss=0.08514, pruned_loss=0.01201, audio_tagging_loss=0.008109, over 15461.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08782, pruned_loss=0.01202, audio_tagging_loss=0.008731, over 3044522.65 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:29:59,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3353513.3333333335, ans=0.125 2023-11-28 04:30:17,444 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503050 2023-11-28 04:30:25,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3353646.6666666665, ans=0.0 2023-11-28 04:30:34,805 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=22.5 2023-11-28 04:30:38,591 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=22.5 2023-11-28 04:30:40,963 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.71 vs. limit=10.0 2023-11-28 04:30:47,451 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.24 vs. limit=12.0 2023-11-28 04:30:50,287 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10100, loss[loss=0.06432, simple_loss=0.0898, pruned_loss=0.009197, audio_tagging_loss=0.01022, over 15486.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08851, pruned_loss=0.01213, audio_tagging_loss=0.008768, over 3044776.01 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:31:04,734 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.581e+01 9.372e+01 1.014e+02 1.280e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 04:31:05,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3353913.3333333335, ans=0.125 2023-11-28 04:31:11,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3353913.3333333335, ans=0.125 2023-11-28 04:31:14,655 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503100 2023-11-28 04:31:21,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3353980.0, ans=0.125 2023-11-28 04:31:36,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3354113.3333333335, ans=0.1 2023-11-28 04:31:39,637 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:31:48,554 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10150, loss[loss=0.07434, simple_loss=0.1109, pruned_loss=0.008294, audio_tagging_loss=0.01062, over 16887.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08884, pruned_loss=0.01207, audio_tagging_loss=0.008824, over 3049736.45 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:31:50,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3354180.0, ans=0.0 2023-11-28 04:32:00,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3354246.6666666665, ans=0.2 2023-11-28 04:32:03,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3354246.6666666665, ans=0.125 2023-11-28 04:32:06,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3354246.6666666665, ans=0.2 2023-11-28 04:32:12,522 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503150 2023-11-28 04:32:13,187 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.67 vs. limit=10.0 2023-11-28 04:32:17,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3354313.3333333335, ans=0.125 2023-11-28 04:32:18,933 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:32:30,482 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.79 vs. limit=10.0 2023-11-28 04:32:45,389 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10200, loss[loss=0.07682, simple_loss=0.1094, pruned_loss=0.01391, audio_tagging_loss=0.008213, over 14543.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08894, pruned_loss=0.01201, audio_tagging_loss=0.00884, over 3053928.47 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:32:46,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3354513.3333333335, ans=0.0 2023-11-28 04:32:52,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3354513.3333333335, ans=0.04949747468305833 2023-11-28 04:32:59,163 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.630e+01 9.209e+01 1.011e+02 1.470e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-28 04:33:09,092 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503200 2023-11-28 04:33:10,767 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:33:24,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3354713.3333333335, ans=0.125 2023-11-28 04:33:25,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3354713.3333333335, ans=0.2 2023-11-28 04:33:41,816 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10250, loss[loss=0.0798, simple_loss=0.1085, pruned_loss=0.01813, audio_tagging_loss=0.007438, over 14406.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08996, pruned_loss=0.01229, audio_tagging_loss=0.008837, over 3049265.52 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:33:53,770 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.81 vs. limit=15.0 2023-11-28 04:33:59,463 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:33:59,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3354913.3333333335, ans=0.1 2023-11-28 04:34:05,866 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503250 2023-11-28 04:34:11,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3354980.0, ans=0.125 2023-11-28 04:34:28,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3355113.3333333335, ans=0.0 2023-11-28 04:34:38,536 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10300, loss[loss=0.06857, simple_loss=0.09763, pruned_loss=0.01156, audio_tagging_loss=0.008186, over 15245.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08965, pruned_loss=0.01228, audio_tagging_loss=0.008888, over 3045739.01 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:34:38,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3355180.0, ans=0.1 2023-11-28 04:34:47,177 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2023-11-28 04:34:51,860 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 9.001e+01 9.538e+01 1.014e+02 1.211e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 04:34:59,125 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-11-28 04:35:01,364 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2023-11-28 04:35:02,833 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503300 2023-11-28 04:35:16,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3355380.0, ans=0.125 2023-11-28 04:35:23,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3355446.6666666665, ans=0.0 2023-11-28 04:35:35,741 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10350, loss[loss=0.08087, simple_loss=0.09823, pruned_loss=0.01978, audio_tagging_loss=0.01197, over 14884.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.0899, pruned_loss=0.01247, audio_tagging_loss=0.008946, over 3038732.33 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:35:48,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3355580.0, ans=0.125 2023-11-28 04:35:49,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3355580.0, ans=0.0 2023-11-28 04:35:59,218 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503350 2023-11-28 04:36:28,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3355780.0, ans=0.125 2023-11-28 04:36:32,728 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10400, loss[loss=0.05783, simple_loss=0.07208, pruned_loss=0.01031, audio_tagging_loss=0.01148, over 15338.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.0892, pruned_loss=0.01231, audio_tagging_loss=0.009108, over 3040526.84 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:36:47,554 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.770e+01 9.452e+01 1.025e+02 1.480e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 04:36:56,905 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503400 2023-11-28 04:37:07,442 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2023-11-28 04:37:30,477 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10450, loss[loss=0.07872, simple_loss=0.1095, pruned_loss=0.01445, audio_tagging_loss=0.009522, over 15359.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09015, pruned_loss=0.01241, audio_tagging_loss=0.00898, over 3038217.41 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:37:30,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3356180.0, ans=0.1 2023-11-28 04:37:49,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3356246.6666666665, ans=0.125 2023-11-28 04:37:55,330 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503450 2023-11-28 04:38:16,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3356446.6666666665, ans=0.125 2023-11-28 04:38:19,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3356446.6666666665, ans=0.0 2023-11-28 04:38:19,444 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=15.0 2023-11-28 04:38:22,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3356446.6666666665, ans=0.1 2023-11-28 04:38:28,240 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10500, loss[loss=0.05199, simple_loss=0.07134, pruned_loss=0.008912, audio_tagging_loss=0.007405, over 14687.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08969, pruned_loss=0.01219, audio_tagging_loss=0.00888, over 3044435.11 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:38:34,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3356513.3333333335, ans=0.05 2023-11-28 04:38:43,216 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.671e+01 9.492e+01 1.004e+02 1.311e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 04:38:46,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3356580.0, ans=0.125 2023-11-28 04:38:51,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3356646.6666666665, ans=0.1 2023-11-28 04:38:52,099 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503500 2023-11-28 04:38:54,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.61 vs. limit=22.5 2023-11-28 04:38:58,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3356646.6666666665, ans=0.2 2023-11-28 04:39:02,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3356713.3333333335, ans=0.2 2023-11-28 04:39:25,931 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10550, loss[loss=0.07169, simple_loss=0.09481, pruned_loss=0.01346, audio_tagging_loss=0.01083, over 14470.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08986, pruned_loss=0.01221, audio_tagging_loss=0.008733, over 3051504.16 frames. ], batch size: 54, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:39:30,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3356846.6666666665, ans=0.0 2023-11-28 04:39:42,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3356913.3333333335, ans=0.2 2023-11-28 04:39:49,588 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503550 2023-11-28 04:40:00,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3357046.6666666665, ans=0.125 2023-11-28 04:40:17,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3357113.3333333335, ans=0.125 2023-11-28 04:40:18,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3357113.3333333335, ans=0.1 2023-11-28 04:40:22,838 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10600, loss[loss=0.05458, simple_loss=0.07082, pruned_loss=0.01032, audio_tagging_loss=0.00885, over 15634.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08957, pruned_loss=0.01228, audio_tagging_loss=0.008747, over 3044021.82 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:40:23,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3357180.0, ans=0.1 2023-11-28 04:40:26,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3357180.0, ans=0.125 2023-11-28 04:40:30,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3357180.0, ans=0.2 2023-11-28 04:40:37,796 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.827e+01 9.555e+01 1.028e+02 1.264e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 04:40:48,210 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503600 2023-11-28 04:40:54,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3357313.3333333335, ans=0.0 2023-11-28 04:40:58,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3357380.0, ans=0.0 2023-11-28 04:40:59,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3357380.0, ans=0.1 2023-11-28 04:41:01,175 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2023-11-28 04:41:06,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3357380.0, ans=0.125 2023-11-28 04:41:13,185 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2023-11-28 04:41:17,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3357446.6666666665, ans=0.0 2023-11-28 04:41:21,487 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10650, loss[loss=0.07016, simple_loss=0.0947, pruned_loss=0.01548, audio_tagging_loss=0.007333, over 15104.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08961, pruned_loss=0.01231, audio_tagging_loss=0.008688, over 3040144.21 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:41:28,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3357513.3333333335, ans=0.125 2023-11-28 04:41:43,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3357580.0, ans=0.125 2023-11-28 04:41:46,310 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503650 2023-11-28 04:41:53,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3357646.6666666665, ans=0.2 2023-11-28 04:42:20,149 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10700, loss[loss=0.05612, simple_loss=0.06822, pruned_loss=0.009169, audio_tagging_loss=0.01284, over 15159.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08903, pruned_loss=0.01217, audio_tagging_loss=0.008635, over 3037770.29 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:42:21,855 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.69 vs. limit=15.0 2023-11-28 04:42:23,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3357846.6666666665, ans=0.09899494936611666 2023-11-28 04:42:32,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3357913.3333333335, ans=0.0 2023-11-28 04:42:33,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3357913.3333333335, ans=0.1 2023-11-28 04:42:35,420 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.497e+01 9.278e+01 9.975e+01 1.438e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-28 04:42:39,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3357913.3333333335, ans=0.0 2023-11-28 04:42:43,729 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503700 2023-11-28 04:42:51,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3357980.0, ans=0.0 2023-11-28 04:43:16,267 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10750, loss[loss=0.04481, simple_loss=0.05736, pruned_loss=0.00539, audio_tagging_loss=0.01074, over 16299.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.0887, pruned_loss=0.01212, audio_tagging_loss=0.008713, over 3038274.30 frames. ], batch size: 63, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:43:18,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3358180.0, ans=0.125 2023-11-28 04:43:18,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3358180.0, ans=0.2 2023-11-28 04:43:27,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3358246.6666666665, ans=0.2 2023-11-28 04:43:40,943 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503750 2023-11-28 04:43:51,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3358380.0, ans=0.2 2023-11-28 04:43:51,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3358380.0, ans=0.1 2023-11-28 04:43:56,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3358380.0, ans=0.125 2023-11-28 04:44:13,537 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10800, loss[loss=0.05734, simple_loss=0.07826, pruned_loss=0.009456, audio_tagging_loss=0.008751, over 15458.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.0885, pruned_loss=0.01216, audio_tagging_loss=0.00868, over 3045925.93 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:44:30,561 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.815e+01 9.428e+01 9.959e+01 1.276e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 04:44:31,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3358580.0, ans=0.0 2023-11-28 04:44:38,263 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503800 2023-11-28 04:44:46,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3358646.6666666665, ans=0.125 2023-11-28 04:44:46,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3358646.6666666665, ans=0.0 2023-11-28 04:45:05,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3358780.0, ans=0.0 2023-11-28 04:45:06,762 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=15.0 2023-11-28 04:45:12,746 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10850, loss[loss=0.0575, simple_loss=0.07958, pruned_loss=0.006218, audio_tagging_loss=0.0115, over 14978.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08987, pruned_loss=0.01252, audio_tagging_loss=0.008597, over 3047568.20 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:45:27,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3358913.3333333335, ans=0.125 2023-11-28 04:45:27,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3358913.3333333335, ans=0.2 2023-11-28 04:45:36,407 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503850 2023-11-28 04:46:09,857 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10900, loss[loss=0.06221, simple_loss=0.0894, pruned_loss=0.008802, audio_tagging_loss=0.008701, over 14623.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08929, pruned_loss=0.0124, audio_tagging_loss=0.008622, over 3045666.90 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:46:09,872 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:46:25,733 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.788e+01 9.283e+01 9.844e+01 1.254e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 04:46:29,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3359246.6666666665, ans=0.125 2023-11-28 04:46:34,068 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503900 2023-11-28 04:46:47,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3359380.0, ans=0.07 2023-11-28 04:47:07,423 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 10950, loss[loss=0.06295, simple_loss=0.08673, pruned_loss=0.01077, audio_tagging_loss=0.008812, over 15869.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08905, pruned_loss=0.01225, audio_tagging_loss=0.008692, over 3048409.24 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:47:28,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3359580.0, ans=0.0 2023-11-28 04:47:31,930 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 503950 2023-11-28 04:47:40,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3359713.3333333335, ans=0.2 2023-11-28 04:47:49,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3359713.3333333335, ans=0.1 2023-11-28 04:47:49,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3359713.3333333335, ans=0.125 2023-11-28 04:47:53,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3359780.0, ans=0.125 2023-11-28 04:48:03,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3359846.6666666665, ans=0.2 2023-11-28 04:48:05,130 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11000, loss[loss=0.06503, simple_loss=0.09056, pruned_loss=0.01207, audio_tagging_loss=0.00768, over 13717.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08792, pruned_loss=0.01204, audio_tagging_loss=0.008747, over 3049198.76 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:48:17,903 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:48:18,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3359913.3333333335, ans=0.2 2023-11-28 04:48:21,135 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.126e+01 8.488e+01 9.034e+01 9.756e+01 1.163e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-28 04:48:24,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3359913.3333333335, ans=0.125 2023-11-28 04:48:24,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3359913.3333333335, ans=0.1 2023-11-28 04:48:29,540 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504000 2023-11-28 04:48:36,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3359980.0, ans=0.1 2023-11-28 04:48:42,508 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.62 vs. limit=15.0 2023-11-28 04:49:05,290 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11050, loss[loss=0.07513, simple_loss=0.1063, pruned_loss=0.01636, audio_tagging_loss=0.005602, over 14741.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.0889, pruned_loss=0.01218, audio_tagging_loss=0.008799, over 3056983.49 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:49:27,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3360313.3333333335, ans=0.0 2023-11-28 04:49:28,471 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504050 2023-11-28 04:49:33,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3360313.3333333335, ans=0.2 2023-11-28 04:49:46,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3360380.0, ans=0.125 2023-11-28 04:50:00,728 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.89 vs. limit=15.0 2023-11-28 04:50:02,375 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11100, loss[loss=0.05267, simple_loss=0.06957, pruned_loss=0.007869, audio_tagging_loss=0.01002, over 13917.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08924, pruned_loss=0.01223, audio_tagging_loss=0.008781, over 3065114.61 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:50:06,102 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=12.0 2023-11-28 04:50:14,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3360580.0, ans=0.0 2023-11-28 04:50:14,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3360580.0, ans=0.125 2023-11-28 04:50:18,572 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.723e+01 9.489e+01 1.017e+02 2.061e+02, threshold=1.898e+02, percent-clipped=1.0 2023-11-28 04:50:26,307 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504100 2023-11-28 04:50:59,699 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11150, loss[loss=0.05575, simple_loss=0.07667, pruned_loss=0.009183, audio_tagging_loss=0.008235, over 15161.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08948, pruned_loss=0.01233, audio_tagging_loss=0.008791, over 3066870.10 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:51:02,509 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2023-11-28 04:51:10,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.33 vs. limit=12.0 2023-11-28 04:51:12,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3360913.3333333335, ans=0.0 2023-11-28 04:51:13,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3360913.3333333335, ans=0.125 2023-11-28 04:51:18,904 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=12.0 2023-11-28 04:51:23,197 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=12.0 2023-11-28 04:51:23,849 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504150 2023-11-28 04:51:30,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3360980.0, ans=0.125 2023-11-28 04:51:54,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3361113.3333333335, ans=0.125 2023-11-28 04:51:57,689 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11200, loss[loss=0.05982, simple_loss=0.07783, pruned_loss=0.01251, audio_tagging_loss=0.008393, over 15198.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.08984, pruned_loss=0.01249, audio_tagging_loss=0.008913, over 3057135.98 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:52:13,619 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 8.826e+01 9.324e+01 1.011e+02 1.372e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 04:52:14,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3361246.6666666665, ans=0.125 2023-11-28 04:52:21,302 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504200 2023-11-28 04:52:34,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3361380.0, ans=0.2 2023-11-28 04:52:41,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3361380.0, ans=0.125 2023-11-28 04:52:49,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3361446.6666666665, ans=0.125 2023-11-28 04:52:55,503 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11250, loss[loss=0.08019, simple_loss=0.1145, pruned_loss=0.01611, audio_tagging_loss=0.006812, over 14790.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08925, pruned_loss=0.01246, audio_tagging_loss=0.008849, over 3055504.52 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 04:52:57,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3361513.3333333335, ans=0.0 2023-11-28 04:53:10,669 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.17 vs. limit=10.0 2023-11-28 04:53:12,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3361580.0, ans=0.1 2023-11-28 04:53:12,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3361580.0, ans=0.125 2023-11-28 04:53:19,160 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504250 2023-11-28 04:53:28,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3361713.3333333335, ans=0.125 2023-11-28 04:53:30,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3361713.3333333335, ans=0.0 2023-11-28 04:53:30,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3361713.3333333335, ans=0.125 2023-11-28 04:53:52,354 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11300, loss[loss=0.0547, simple_loss=0.0709, pruned_loss=0.01096, audio_tagging_loss=0.008282, over 13867.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.08946, pruned_loss=0.01251, audio_tagging_loss=0.008724, over 3048228.60 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:53:58,991 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2023-11-28 04:54:09,274 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.179e+01 8.810e+01 9.312e+01 1.008e+02 1.209e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-28 04:54:16,567 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504300 2023-11-28 04:54:32,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3362046.6666666665, ans=0.0 2023-11-28 04:54:48,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3362113.3333333335, ans=0.1 2023-11-28 04:54:50,070 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11350, loss[loss=0.06276, simple_loss=0.0902, pruned_loss=0.009639, audio_tagging_loss=0.008019, over 15073.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08994, pruned_loss=0.01253, audio_tagging_loss=0.008656, over 3042129.78 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:54:53,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3362180.0, ans=0.2 2023-11-28 04:55:14,386 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504350 2023-11-28 04:55:40,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3362446.6666666665, ans=0.125 2023-11-28 04:55:46,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3362513.3333333335, ans=0.0 2023-11-28 04:55:48,170 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11400, loss[loss=0.05304, simple_loss=0.0674, pruned_loss=0.009648, audio_tagging_loss=0.009691, over 13812.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08954, pruned_loss=0.01232, audio_tagging_loss=0.008659, over 3043313.00 frames. ], batch size: 53, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:55:48,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3362513.3333333335, ans=0.0 2023-11-28 04:55:51,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3362513.3333333335, ans=0.125 2023-11-28 04:55:55,309 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-28 04:55:59,486 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 04:56:05,107 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.951e+01 9.530e+01 1.041e+02 1.873e+02, threshold=1.906e+02, percent-clipped=1.0 2023-11-28 04:56:12,143 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504400 2023-11-28 04:56:22,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3362713.3333333335, ans=0.125 2023-11-28 04:56:45,792 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11450, loss[loss=0.06073, simple_loss=0.08079, pruned_loss=0.01313, audio_tagging_loss=0.007201, over 14519.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09019, pruned_loss=0.01247, audio_tagging_loss=0.008658, over 3038293.21 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:57:09,862 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504450 2023-11-28 04:57:31,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3363113.3333333335, ans=0.1 2023-11-28 04:57:32,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3363113.3333333335, ans=15.0 2023-11-28 04:57:34,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3363113.3333333335, ans=0.05 2023-11-28 04:57:38,841 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.60 vs. limit=15.0 2023-11-28 04:57:43,781 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11500, loss[loss=0.06372, simple_loss=0.09645, pruned_loss=0.008204, audio_tagging_loss=0.007294, over 17166.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08997, pruned_loss=0.01234, audio_tagging_loss=0.008632, over 3044059.39 frames. ], batch size: 65, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:57:47,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3363180.0, ans=0.125 2023-11-28 04:58:02,620 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.810e+01 9.465e+01 1.017e+02 1.248e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 04:58:08,092 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504500 2023-11-28 04:58:40,740 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11550, loss[loss=0.057, simple_loss=0.07997, pruned_loss=0.00852, audio_tagging_loss=0.008494, over 15652.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.0895, pruned_loss=0.01223, audio_tagging_loss=0.008619, over 3039242.89 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 8.0 2023-11-28 04:59:05,943 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504550 2023-11-28 04:59:18,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3363713.3333333335, ans=0.125 2023-11-28 04:59:19,001 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 04:59:32,688 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.12 vs. limit=22.5 2023-11-28 04:59:38,807 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11600, loss[loss=0.07237, simple_loss=0.09696, pruned_loss=0.0146, audio_tagging_loss=0.009289, over 14640.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08948, pruned_loss=0.01229, audio_tagging_loss=0.008599, over 3033003.15 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 04:59:39,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3363846.6666666665, ans=0.125 2023-11-28 04:59:54,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3363913.3333333335, ans=0.125 2023-11-28 04:59:57,182 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.620e+01 9.416e+01 1.017e+02 1.407e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 05:00:02,705 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504600 2023-11-28 05:00:08,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3363980.0, ans=0.5 2023-11-28 05:00:34,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3364113.3333333335, ans=0.125 2023-11-28 05:00:36,731 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11650, loss[loss=0.06338, simple_loss=0.08491, pruned_loss=0.01069, audio_tagging_loss=0.01024, over 15515.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.0893, pruned_loss=0.01228, audio_tagging_loss=0.008755, over 3033908.82 frames. ], batch size: 57, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:00:39,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3364180.0, ans=0.125 2023-11-28 05:00:44,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3364180.0, ans=0.1 2023-11-28 05:00:48,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3364246.6666666665, ans=0.0 2023-11-28 05:00:51,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3364246.6666666665, ans=0.125 2023-11-28 05:01:01,193 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504650 2023-11-28 05:01:13,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3364380.0, ans=0.125 2023-11-28 05:01:21,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3364446.6666666665, ans=0.2 2023-11-28 05:01:33,592 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11700, loss[loss=0.03346, simple_loss=0.03746, pruned_loss=0.004483, audio_tagging_loss=0.01025, over 14682.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08875, pruned_loss=0.01227, audio_tagging_loss=0.00884, over 3042089.69 frames. ], batch size: 58, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:01:34,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3364513.3333333335, ans=0.125 2023-11-28 05:01:52,248 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.810e+01 9.366e+01 1.007e+02 1.398e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 05:01:58,234 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504700 2023-11-28 05:02:01,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3364646.6666666665, ans=0.1 2023-11-28 05:02:04,777 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=22.5 2023-11-28 05:02:10,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3364713.3333333335, ans=0.125 2023-11-28 05:02:31,535 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11750, loss[loss=0.0622, simple_loss=0.08419, pruned_loss=0.01452, audio_tagging_loss=0.00559, over 14529.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08768, pruned_loss=0.01214, audio_tagging_loss=0.008876, over 3041117.73 frames. ], batch size: 55, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:02:32,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3364846.6666666665, ans=0.125 2023-11-28 05:02:38,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3364846.6666666665, ans=0.125 2023-11-28 05:02:40,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2023-11-28 05:02:43,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3364913.3333333335, ans=0.1 2023-11-28 05:02:55,582 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504750 2023-11-28 05:03:02,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3364980.0, ans=0.125 2023-11-28 05:03:07,669 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.39 vs. limit=22.5 2023-11-28 05:03:26,154 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2023-11-28 05:03:29,532 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11800, loss[loss=0.07256, simple_loss=0.09968, pruned_loss=0.01544, audio_tagging_loss=0.007276, over 14917.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08832, pruned_loss=0.01221, audio_tagging_loss=0.008958, over 3037786.88 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:03:33,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3365180.0, ans=0.07 2023-11-28 05:03:38,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3365180.0, ans=0.2 2023-11-28 05:03:40,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3365246.6666666665, ans=0.04949747468305833 2023-11-28 05:03:47,023 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.284e+01 8.611e+01 9.542e+01 1.045e+02 1.429e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 05:03:47,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3365246.6666666665, ans=0.125 2023-11-28 05:03:51,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3365313.3333333335, ans=0.0 2023-11-28 05:03:53,099 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504800 2023-11-28 05:04:14,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3365446.6666666665, ans=0.0 2023-11-28 05:04:16,371 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.12 vs. limit=12.0 2023-11-28 05:04:18,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3365446.6666666665, ans=0.05 2023-11-28 05:04:26,612 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11850, loss[loss=0.05255, simple_loss=0.06408, pruned_loss=0.01072, audio_tagging_loss=0.009786, over 15442.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08849, pruned_loss=0.01234, audio_tagging_loss=0.008977, over 3038002.87 frames. ], batch size: 60, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:04:39,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3365580.0, ans=0.1 2023-11-28 05:04:39,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3365580.0, ans=0.2 2023-11-28 05:04:51,136 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504850 2023-11-28 05:04:59,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3365646.6666666665, ans=0.2 2023-11-28 05:05:10,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3365713.3333333335, ans=0.125 2023-11-28 05:05:18,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3365780.0, ans=0.125 2023-11-28 05:05:24,485 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11900, loss[loss=0.06993, simple_loss=0.09667, pruned_loss=0.01089, audio_tagging_loss=0.01071, over 14411.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09003, pruned_loss=0.01277, audio_tagging_loss=0.008963, over 3044287.13 frames. ], batch size: 56, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:05:24,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3365846.6666666665, ans=0.125 2023-11-28 05:05:24,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3365846.6666666665, ans=0.125 2023-11-28 05:05:39,642 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.34 vs. limit=15.0 2023-11-28 05:05:39,655 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2023-11-28 05:05:43,452 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.747e+01 9.488e+01 1.023e+02 1.658e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 05:05:49,009 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504900 2023-11-28 05:06:02,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3366046.6666666665, ans=0.125 2023-11-28 05:06:07,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3366046.6666666665, ans=0.0 2023-11-28 05:06:23,024 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 11950, loss[loss=0.08216, simple_loss=0.1197, pruned_loss=0.01432, audio_tagging_loss=0.007993, over 15597.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.08971, pruned_loss=0.01261, audio_tagging_loss=0.009119, over 3042206.90 frames. ], batch size: 59, lr: 1.60e-03, grad_scale: 16.0 2023-11-28 05:06:29,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3366180.0, ans=0.125 2023-11-28 05:06:35,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3366246.6666666665, ans=0.0 2023-11-28 05:06:46,935 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 504950 2023-11-28 05:06:47,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3366313.3333333335, ans=0.0 2023-11-28 05:07:02,002 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:07:08,425 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=22.5 2023-11-28 05:07:19,250 INFO [train_asr.py:1235] (2/4) Epoch 42, batch 12000, loss[loss=0.06435, simple_loss=0.08326, pruned_loss=0.01148, audio_tagging_loss=0.01124, over 14095.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.08933, pruned_loss=0.01244, audio_tagging_loss=0.009229, over 3039231.78 frames. ], batch size: 52, lr: 1.60e-03, grad_scale: 32.0 2023-11-28 05:07:19,251 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 05:07:48,457 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4965, 3.4305, 3.7370, 3.6130], device='cuda:2') 2023-11-28 05:07:54,231 INFO [train_asr.py:1267] (2/4) Epoch 42, validation: loss=0.05822, simple_loss=0.05066, pruned_loss=0.005316, audio_tagging_loss=0.02757, over 4681554.00 frames. 2023-11-28 05:07:54,231 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 05:07:56,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3366513.3333333335, ans=15.0 2023-11-28 05:08:04,527 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2023-11-28 05:08:09,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3366580.0, ans=15.0 2023-11-28 05:08:11,281 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.786e+01 8.775e+01 9.473e+01 1.010e+02 1.187e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 05:08:14,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3366646.6666666665, ans=0.95 2023-11-28 05:08:16,471 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505000 2023-11-28 05:08:35,696 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 0, loss[loss=0.06659, simple_loss=0.07462, pruned_loss=0.007042, audio_tagging_loss=0.02224, over 14746.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.07462, pruned_loss=0.007042, audio_tagging_loss=0.02224, over 14746.00 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:08:35,696 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 05:08:50,560 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0873, 5.8376, 5.4977, 5.5559], device='cuda:2') 2023-11-28 05:09:10,071 INFO [train_asr.py:1267] (2/4) Epoch 43, validation: loss=0.05773, simple_loss=0.0506, pruned_loss=0.005225, audio_tagging_loss=0.0272, over 4681554.00 frames. 2023-11-28 05:09:10,072 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 05:09:25,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3366740.0, ans=0.125 2023-11-28 05:09:28,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3366740.0, ans=0.1 2023-11-28 05:09:37,142 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:09:41,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3366806.6666666665, ans=0.0 2023-11-28 05:09:50,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3366873.3333333335, ans=0.125 2023-11-28 05:10:04,096 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505050 2023-11-28 05:10:07,291 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 50, loss[loss=0.07535, simple_loss=0.09248, pruned_loss=0.0119, audio_tagging_loss=0.0172, over 15113.00 frames. ], tot_loss[loss=0.07296, simple_loss=0.08859, pruned_loss=0.01146, audio_tagging_loss=0.01721, over 692857.18 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:10:20,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3367073.3333333335, ans=0.5 2023-11-28 05:10:24,621 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:10:24,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3367073.3333333335, ans=0.125 2023-11-28 05:10:27,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3367073.3333333335, ans=0.04949747468305833 2023-11-28 05:10:45,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3367206.6666666665, ans=0.125 2023-11-28 05:10:51,887 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.44 vs. limit=15.0 2023-11-28 05:10:56,685 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.375e+01 9.586e+01 1.037e+02 1.129e+02 1.417e+02, threshold=2.074e+02, percent-clipped=0.0 2023-11-28 05:11:01,179 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505100 2023-11-28 05:11:04,364 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 100, loss[loss=0.05105, simple_loss=0.0585, pruned_loss=0.008352, audio_tagging_loss=0.01345, over 16583.00 frames. ], tot_loss[loss=0.07107, simple_loss=0.08601, pruned_loss=0.01167, audio_tagging_loss=0.01639, over 1216557.85 frames. ], batch size: 66, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:11:06,799 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.33 vs. limit=10.0 2023-11-28 05:11:30,159 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=12.0 2023-11-28 05:11:42,082 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.46 vs. limit=6.0 2023-11-28 05:11:43,417 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.64 vs. limit=22.5 2023-11-28 05:11:43,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3367540.0, ans=0.2 2023-11-28 05:11:46,551 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=12.0 2023-11-28 05:11:55,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3367606.6666666665, ans=0.125 2023-11-28 05:11:58,728 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505150 2023-11-28 05:12:02,497 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 150, loss[loss=0.06968, simple_loss=0.09509, pruned_loss=0.01294, audio_tagging_loss=0.009196, over 16076.00 frames. ], tot_loss[loss=0.07029, simple_loss=0.08766, pruned_loss=0.01178, audio_tagging_loss=0.01468, over 1631283.01 frames. ], batch size: 62, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:12:48,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3367940.0, ans=0.125 2023-11-28 05:12:50,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3367940.0, ans=0.2 2023-11-28 05:12:52,801 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 9.055e+01 9.611e+01 1.032e+02 1.243e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 05:12:57,235 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505200 2023-11-28 05:13:01,124 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 200, loss[loss=0.07827, simple_loss=0.109, pruned_loss=0.01559, audio_tagging_loss=0.008159, over 15796.00 frames. ], tot_loss[loss=0.06865, simple_loss=0.08759, pruned_loss=0.01183, audio_tagging_loss=0.01302, over 1943977.12 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:13:15,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3368073.3333333335, ans=0.2 2023-11-28 05:13:16,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3368073.3333333335, ans=0.95 2023-11-28 05:13:39,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3368206.6666666665, ans=0.2 2023-11-28 05:13:54,363 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505250 2023-11-28 05:13:56,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3368340.0, ans=0.0 2023-11-28 05:13:57,686 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 250, loss[loss=0.06559, simple_loss=0.09544, pruned_loss=0.00993, audio_tagging_loss=0.007941, over 15446.00 frames. ], tot_loss[loss=0.06816, simple_loss=0.08894, pruned_loss=0.01199, audio_tagging_loss=0.0117, over 2194256.76 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-28 05:14:19,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3368473.3333333335, ans=0.125 2023-11-28 05:14:21,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3368473.3333333335, ans=0.2 2023-11-28 05:14:22,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3368473.3333333335, ans=0.1 2023-11-28 05:14:32,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3368540.0, ans=15.0 2023-11-28 05:14:34,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3368540.0, ans=0.125 2023-11-28 05:14:35,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3368540.0, ans=0.125 2023-11-28 05:14:40,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3368540.0, ans=0.1 2023-11-28 05:14:48,346 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.828e+01 9.024e+01 9.625e+01 1.027e+02 1.223e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 05:14:51,680 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505300 2023-11-28 05:14:55,467 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 300, loss[loss=0.07662, simple_loss=0.1045, pruned_loss=0.01407, audio_tagging_loss=0.01028, over 15888.00 frames. ], tot_loss[loss=0.06795, simple_loss=0.08957, pruned_loss=0.01229, audio_tagging_loss=0.01087, over 2383911.72 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 8.0 2023-11-28 05:14:58,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3368673.3333333335, ans=0.125 2023-11-28 05:15:02,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3368673.3333333335, ans=0.125 2023-11-28 05:15:18,894 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2023-11-28 05:15:22,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3368806.6666666665, ans=0.0 2023-11-28 05:15:33,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3368873.3333333335, ans=0.0 2023-11-28 05:15:49,194 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505350 2023-11-28 05:15:52,952 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 350, loss[loss=0.0768, simple_loss=0.1103, pruned_loss=0.01398, audio_tagging_loss=0.007698, over 15384.00 frames. ], tot_loss[loss=0.0678, simple_loss=0.09051, pruned_loss=0.01231, audio_tagging_loss=0.01023, over 2537100.10 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 8.0 2023-11-28 05:16:00,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3369006.6666666665, ans=0.1 2023-11-28 05:16:04,669 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.61 vs. limit=15.0 2023-11-28 05:16:15,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3369140.0, ans=0.125 2023-11-28 05:16:15,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-28 05:16:24,065 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.79 vs. limit=22.5 2023-11-28 05:16:42,840 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.904e+01 9.500e+01 1.023e+02 1.547e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 05:16:46,264 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505400 2023-11-28 05:16:49,837 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 400, loss[loss=0.05447, simple_loss=0.07344, pruned_loss=0.006584, audio_tagging_loss=0.01116, over 16211.00 frames. ], tot_loss[loss=0.06735, simple_loss=0.09062, pruned_loss=0.01229, audio_tagging_loss=0.009753, over 2649482.74 frames. ], batch size: 61, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:17:03,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3369406.6666666665, ans=0.1 2023-11-28 05:17:08,554 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.67 vs. limit=15.0 2023-11-28 05:17:08,705 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2023-11-28 05:17:22,839 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2023-11-28 05:17:24,851 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.01 vs. limit=6.0 2023-11-28 05:17:38,082 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.96 vs. limit=15.0 2023-11-28 05:17:41,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3369606.6666666665, ans=0.125 2023-11-28 05:17:43,285 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505450 2023-11-28 05:17:46,363 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 450, loss[loss=0.06429, simple_loss=0.08299, pruned_loss=0.01043, audio_tagging_loss=0.01236, over 14013.00 frames. ], tot_loss[loss=0.06719, simple_loss=0.09069, pruned_loss=0.01241, audio_tagging_loss=0.009434, over 2738940.12 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:17:52,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3369673.3333333335, ans=0.05 2023-11-28 05:18:05,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3369740.0, ans=0.1 2023-11-28 05:18:15,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3369806.6666666665, ans=0.2 2023-11-28 05:18:37,196 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.908e+01 8.667e+01 9.242e+01 1.003e+02 1.378e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-28 05:18:40,644 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2023-11-28 05:18:41,094 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505500 2023-11-28 05:18:42,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3369940.0, ans=0.2 2023-11-28 05:18:44,358 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 500, loss[loss=0.06821, simple_loss=0.09643, pruned_loss=0.01074, audio_tagging_loss=0.009252, over 15386.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.08977, pruned_loss=0.01232, audio_tagging_loss=0.009285, over 2804945.28 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:18:44,779 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2023-11-28 05:18:52,738 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:19:00,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3370073.3333333335, ans=0.125 2023-11-28 05:19:08,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3370140.0, ans=0.1 2023-11-28 05:19:08,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3370140.0, ans=0.0 2023-11-28 05:19:12,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3370140.0, ans=0.035 2023-11-28 05:19:23,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3370206.6666666665, ans=0.0 2023-11-28 05:19:38,567 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505550 2023-11-28 05:19:41,712 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 550, loss[loss=0.0482, simple_loss=0.0555, pruned_loss=0.01023, audio_tagging_loss=0.01021, over 15122.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.08977, pruned_loss=0.01238, audio_tagging_loss=0.009236, over 2866301.63 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:19:52,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3370406.6666666665, ans=0.125 2023-11-28 05:19:53,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3370406.6666666665, ans=0.1 2023-11-28 05:19:56,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3370406.6666666665, ans=0.0 2023-11-28 05:20:14,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3370473.3333333335, ans=0.125 2023-11-28 05:20:32,174 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 9.076e+01 9.606e+01 1.009e+02 1.464e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 05:20:34,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3370606.6666666665, ans=0.125 2023-11-28 05:20:36,122 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505600 2023-11-28 05:20:39,634 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 600, loss[loss=0.07298, simple_loss=0.1055, pruned_loss=0.01287, audio_tagging_loss=0.007358, over 15474.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09069, pruned_loss=0.01263, audio_tagging_loss=0.009125, over 2911238.75 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:20:44,033 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.80 vs. limit=12.0 2023-11-28 05:20:51,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3370740.0, ans=0.1 2023-11-28 05:20:53,238 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-28 05:21:00,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3370740.0, ans=0.125 2023-11-28 05:21:09,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3370806.6666666665, ans=0.0 2023-11-28 05:21:16,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3370873.3333333335, ans=0.125 2023-11-28 05:21:26,219 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.73 vs. limit=15.0 2023-11-28 05:21:34,482 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505650 2023-11-28 05:21:35,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3370940.0, ans=0.2 2023-11-28 05:21:36,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3371006.6666666665, ans=0.125 2023-11-28 05:21:36,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3371006.6666666665, ans=0.125 2023-11-28 05:21:37,657 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 650, loss[loss=0.06177, simple_loss=0.08115, pruned_loss=0.01096, audio_tagging_loss=0.01023, over 15121.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.08991, pruned_loss=0.01254, audio_tagging_loss=0.009156, over 2939553.49 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:21:40,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3371006.6666666665, ans=0.0 2023-11-28 05:21:58,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3371073.3333333335, ans=0.125 2023-11-28 05:22:00,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3371140.0, ans=0.125 2023-11-28 05:22:28,534 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 8.720e+01 9.285e+01 9.863e+01 1.198e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 05:22:31,969 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505700 2023-11-28 05:22:35,217 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 700, loss[loss=0.06366, simple_loss=0.0925, pruned_loss=0.01023, audio_tagging_loss=0.007176, over 15437.00 frames. ], tot_loss[loss=0.0671, simple_loss=0.09092, pruned_loss=0.01262, audio_tagging_loss=0.009012, over 2967283.16 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:22:35,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3371340.0, ans=0.125 2023-11-28 05:22:40,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3371340.0, ans=0.125 2023-11-28 05:22:49,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3371406.6666666665, ans=0.125 2023-11-28 05:22:52,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3371406.6666666665, ans=0.125 2023-11-28 05:22:56,227 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.49 vs. limit=22.5 2023-11-28 05:23:02,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=22.5 2023-11-28 05:23:30,001 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505750 2023-11-28 05:23:31,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3371606.6666666665, ans=0.125 2023-11-28 05:23:33,253 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 750, loss[loss=0.07316, simple_loss=0.1027, pruned_loss=0.01522, audio_tagging_loss=0.006573, over 14931.00 frames. ], tot_loss[loss=0.06673, simple_loss=0.09043, pruned_loss=0.0125, audio_tagging_loss=0.009008, over 2986425.97 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:23:33,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3371673.3333333335, ans=0.125 2023-11-28 05:23:58,053 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.37 vs. limit=15.0 2023-11-28 05:23:59,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3371806.6666666665, ans=0.0 2023-11-28 05:24:06,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3371873.3333333335, ans=0.125 2023-11-28 05:24:08,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3371873.3333333335, ans=0.1 2023-11-28 05:24:22,034 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-11-28 05:24:24,147 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.959e+01 9.414e+01 9.993e+01 1.273e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 05:24:27,561 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505800 2023-11-28 05:24:31,329 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 800, loss[loss=0.07627, simple_loss=0.1045, pruned_loss=0.01634, audio_tagging_loss=0.007663, over 14811.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09069, pruned_loss=0.01254, audio_tagging_loss=0.008971, over 3003471.00 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:24:33,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3372006.6666666665, ans=0.0 2023-11-28 05:24:33,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3372006.6666666665, ans=0.0 2023-11-28 05:24:43,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3372073.3333333335, ans=0.0 2023-11-28 05:24:46,103 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-11-28 05:24:46,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3372073.3333333335, ans=0.125 2023-11-28 05:25:05,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3372206.6666666665, ans=0.125 2023-11-28 05:25:24,803 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505850 2023-11-28 05:25:28,109 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 850, loss[loss=0.06215, simple_loss=0.08267, pruned_loss=0.01214, audio_tagging_loss=0.008678, over 15129.00 frames. ], tot_loss[loss=0.06691, simple_loss=0.09091, pruned_loss=0.01245, audio_tagging_loss=0.009, over 3008993.50 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:25:48,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3372406.6666666665, ans=0.025 2023-11-28 05:25:56,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3372473.3333333335, ans=0.125 2023-11-28 05:25:58,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3372473.3333333335, ans=0.2 2023-11-28 05:26:05,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3372540.0, ans=0.125 2023-11-28 05:26:12,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3372540.0, ans=0.2 2023-11-28 05:26:18,488 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.788e+01 8.784e+01 9.411e+01 9.995e+01 2.932e+02, threshold=1.882e+02, percent-clipped=1.0 2023-11-28 05:26:21,827 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505900 2023-11-28 05:26:26,158 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 900, loss[loss=0.06791, simple_loss=0.0906, pruned_loss=0.01493, audio_tagging_loss=0.007679, over 15034.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09112, pruned_loss=0.01252, audio_tagging_loss=0.008928, over 3020341.00 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:26:40,675 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.00 vs. limit=15.0 2023-11-28 05:27:19,698 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 505950 2023-11-28 05:27:23,365 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 950, loss[loss=0.04691, simple_loss=0.06365, pruned_loss=0.006388, audio_tagging_loss=0.008695, over 14358.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09095, pruned_loss=0.01253, audio_tagging_loss=0.008924, over 3025600.57 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:27:30,583 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2023-11-28 05:27:32,572 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2023-11-28 05:27:43,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3373073.3333333335, ans=0.125 2023-11-28 05:27:51,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3373140.0, ans=0.125 2023-11-28 05:27:55,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2023-11-28 05:27:58,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3373206.6666666665, ans=0.125 2023-11-28 05:28:14,042 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.684e+01 9.471e+01 1.027e+02 1.244e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 05:28:17,480 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506000 2023-11-28 05:28:20,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3373340.0, ans=0.04949747468305833 2023-11-28 05:28:21,347 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1000, loss[loss=0.06442, simple_loss=0.07593, pruned_loss=0.01486, audio_tagging_loss=0.0116, over 14609.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09113, pruned_loss=0.01262, audio_tagging_loss=0.00875, over 3029938.65 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:28:43,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3373473.3333333335, ans=0.125 2023-11-28 05:28:47,908 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:28:50,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3373473.3333333335, ans=0.0 2023-11-28 05:28:53,324 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2023-11-28 05:28:59,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3373540.0, ans=0.125 2023-11-28 05:29:06,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3373606.6666666665, ans=0.125 2023-11-28 05:29:06,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3373606.6666666665, ans=0.125 2023-11-28 05:29:14,889 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506050 2023-11-28 05:29:18,169 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1050, loss[loss=0.05988, simple_loss=0.08285, pruned_loss=0.01007, audio_tagging_loss=0.008382, over 14938.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.0888, pruned_loss=0.01216, audio_tagging_loss=0.008771, over 3028388.51 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:29:26,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3373673.3333333335, ans=0.1 2023-11-28 05:29:36,272 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.83 vs. limit=22.5 2023-11-28 05:29:36,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3373740.0, ans=0.0 2023-11-28 05:29:41,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3373806.6666666665, ans=0.2 2023-11-28 05:29:46,163 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.72 vs. limit=15.0 2023-11-28 05:29:46,910 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.64 vs. limit=15.0 2023-11-28 05:29:51,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3373806.6666666665, ans=0.0 2023-11-28 05:29:54,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3373873.3333333335, ans=0.1 2023-11-28 05:29:55,408 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:29:58,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3373873.3333333335, ans=0.125 2023-11-28 05:29:58,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3373873.3333333335, ans=0.125 2023-11-28 05:30:07,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3373940.0, ans=0.125 2023-11-28 05:30:09,118 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.815e+01 8.822e+01 9.430e+01 1.008e+02 1.221e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 05:30:13,621 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506100 2023-11-28 05:30:16,829 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1100, loss[loss=0.05501, simple_loss=0.07962, pruned_loss=0.009141, audio_tagging_loss=0.006059, over 16273.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08895, pruned_loss=0.0122, audio_tagging_loss=0.008615, over 3042523.74 frames. ], batch size: 61, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:30:18,466 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.01 vs. limit=22.5 2023-11-28 05:30:21,727 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:30:26,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3374006.6666666665, ans=0.0 2023-11-28 05:30:34,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3374073.3333333335, ans=0.2 2023-11-28 05:30:36,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3374073.3333333335, ans=0.035 2023-11-28 05:30:50,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3374206.6666666665, ans=0.0 2023-11-28 05:31:11,334 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506150 2023-11-28 05:31:14,620 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1150, loss[loss=0.07341, simple_loss=0.1089, pruned_loss=0.01108, audio_tagging_loss=0.007888, over 16098.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08932, pruned_loss=0.01208, audio_tagging_loss=0.00858, over 3039791.06 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:31:22,886 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.31 vs. limit=22.5 2023-11-28 05:31:23,965 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=22.5 2023-11-28 05:31:27,357 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.26 vs. limit=15.0 2023-11-28 05:31:35,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3374406.6666666665, ans=0.125 2023-11-28 05:31:54,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3374540.0, ans=0.125 2023-11-28 05:31:56,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3374540.0, ans=0.125 2023-11-28 05:32:06,086 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.704e+01 9.429e+01 9.950e+01 1.461e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 05:32:08,334 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506200 2023-11-28 05:32:11,929 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1200, loss[loss=0.06647, simple_loss=0.09461, pruned_loss=0.01138, audio_tagging_loss=0.007782, over 16787.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08976, pruned_loss=0.01226, audio_tagging_loss=0.008472, over 3037739.54 frames. ], batch size: 63, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:32:18,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3374673.3333333335, ans=0.125 2023-11-28 05:32:31,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3374740.0, ans=0.0 2023-11-28 05:32:48,174 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=15.0 2023-11-28 05:32:53,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3374873.3333333335, ans=0.125 2023-11-28 05:33:05,763 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506250 2023-11-28 05:33:09,515 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1250, loss[loss=0.08742, simple_loss=0.1144, pruned_loss=0.02056, audio_tagging_loss=0.009671, over 15151.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.0908, pruned_loss=0.01245, audio_tagging_loss=0.008505, over 3037076.23 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:33:23,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3375073.3333333335, ans=0.125 2023-11-28 05:33:44,671 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=15.0 2023-11-28 05:34:02,428 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.743e+01 8.863e+01 9.431e+01 1.030e+02 1.303e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 05:34:04,669 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506300 2023-11-28 05:34:07,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3375340.0, ans=0.125 2023-11-28 05:34:07,936 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1300, loss[loss=0.06096, simple_loss=0.07874, pruned_loss=0.0112, audio_tagging_loss=0.01039, over 16667.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09059, pruned_loss=0.01227, audio_tagging_loss=0.008536, over 3040747.18 frames. ], batch size: 65, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:34:12,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3375340.0, ans=0.0 2023-11-28 05:34:21,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3375406.6666666665, ans=0.1 2023-11-28 05:34:37,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3375473.3333333335, ans=0.125 2023-11-28 05:34:39,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3375473.3333333335, ans=0.5 2023-11-28 05:34:47,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3375540.0, ans=0.0 2023-11-28 05:35:00,062 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.71 vs. limit=22.5 2023-11-28 05:35:00,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3375606.6666666665, ans=0.0 2023-11-28 05:35:01,561 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506350 2023-11-28 05:35:04,839 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1350, loss[loss=0.06916, simple_loss=0.09216, pruned_loss=0.01429, audio_tagging_loss=0.008794, over 14825.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.09006, pruned_loss=0.01228, audio_tagging_loss=0.008585, over 3035316.13 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:35:06,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3375673.3333333335, ans=0.0 2023-11-28 05:35:09,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3375673.3333333335, ans=0.125 2023-11-28 05:35:26,814 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=12.0 2023-11-28 05:35:34,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3375806.6666666665, ans=0.0 2023-11-28 05:35:36,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3375806.6666666665, ans=0.2 2023-11-28 05:35:39,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3375873.3333333335, ans=0.1 2023-11-28 05:35:48,609 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:35:50,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3375940.0, ans=0.125 2023-11-28 05:35:50,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3375940.0, ans=0.125 2023-11-28 05:35:57,890 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.640e+01 9.329e+01 1.009e+02 1.189e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 05:35:59,086 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506400 2023-11-28 05:36:02,704 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1400, loss[loss=0.0605, simple_loss=0.08138, pruned_loss=0.011, audio_tagging_loss=0.008807, over 16069.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08954, pruned_loss=0.01223, audio_tagging_loss=0.008704, over 3037392.20 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:36:15,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3376073.3333333335, ans=0.09899494936611666 2023-11-28 05:36:15,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3376073.3333333335, ans=0.1 2023-11-28 05:36:24,266 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:36:36,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3376206.6666666665, ans=0.1 2023-11-28 05:36:55,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3376273.3333333335, ans=0.1 2023-11-28 05:36:57,737 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506450 2023-11-28 05:37:01,513 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1450, loss[loss=0.05626, simple_loss=0.0704, pruned_loss=0.01121, audio_tagging_loss=0.009856, over 15301.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08837, pruned_loss=0.01216, audio_tagging_loss=0.008859, over 3034870.72 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:37:12,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3376406.6666666665, ans=0.125 2023-11-28 05:37:17,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3376406.6666666665, ans=0.0 2023-11-28 05:37:29,915 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.96 vs. limit=22.5 2023-11-28 05:37:39,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3376540.0, ans=0.125 2023-11-28 05:37:48,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3376606.6666666665, ans=0.2 2023-11-28 05:37:53,726 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 8.627e+01 9.329e+01 1.021e+02 1.483e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 05:37:54,904 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506500 2023-11-28 05:37:58,156 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1500, loss[loss=0.08616, simple_loss=0.1084, pruned_loss=0.02145, audio_tagging_loss=0.01052, over 15899.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.0887, pruned_loss=0.01227, audio_tagging_loss=0.008867, over 3033854.67 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:38:24,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3376806.6666666665, ans=0.125 2023-11-28 05:38:48,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3376940.0, ans=0.0 2023-11-28 05:38:52,903 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506550 2023-11-28 05:38:56,164 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1550, loss[loss=0.05486, simple_loss=0.07307, pruned_loss=0.009182, audio_tagging_loss=0.009145, over 14957.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.08996, pruned_loss=0.01243, audio_tagging_loss=0.008871, over 3031762.75 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:38:57,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3377006.6666666665, ans=0.125 2023-11-28 05:39:00,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.36 vs. limit=12.0 2023-11-28 05:39:13,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3377073.3333333335, ans=0.125 2023-11-28 05:39:22,373 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.29 vs. limit=22.5 2023-11-28 05:39:23,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3377140.0, ans=0.2 2023-11-28 05:39:33,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3377206.6666666665, ans=6.0 2023-11-28 05:39:49,864 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.333e+01 9.028e+01 9.506e+01 1.021e+02 1.396e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 05:39:51,050 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506600 2023-11-28 05:39:54,699 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1600, loss[loss=0.0604, simple_loss=0.08323, pruned_loss=0.009307, audio_tagging_loss=0.00948, over 14453.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08967, pruned_loss=0.01225, audio_tagging_loss=0.008909, over 3041420.87 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:40:00,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3377340.0, ans=0.1 2023-11-28 05:40:03,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3377340.0, ans=0.04949747468305833 2023-11-28 05:40:14,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.88 vs. limit=22.5 2023-11-28 05:40:20,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3377473.3333333335, ans=0.125 2023-11-28 05:40:42,428 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=15.0 2023-11-28 05:40:49,086 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506650 2023-11-28 05:40:49,585 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.58 vs. limit=15.0 2023-11-28 05:40:52,380 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1650, loss[loss=0.06996, simple_loss=0.09433, pruned_loss=0.01347, audio_tagging_loss=0.009329, over 15310.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09062, pruned_loss=0.01237, audio_tagging_loss=0.008942, over 3044712.59 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:40:52,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3377673.3333333335, ans=0.1 2023-11-28 05:41:05,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3377740.0, ans=0.125 2023-11-28 05:41:24,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3377806.6666666665, ans=0.2 2023-11-28 05:41:40,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3377940.0, ans=0.125 2023-11-28 05:41:46,317 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 8.798e+01 9.580e+01 1.024e+02 1.381e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 05:41:46,410 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506700 2023-11-28 05:41:50,484 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1700, loss[loss=0.05547, simple_loss=0.07416, pruned_loss=0.00947, audio_tagging_loss=0.00892, over 14755.00 frames. ], tot_loss[loss=0.06698, simple_loss=0.09097, pruned_loss=0.01259, audio_tagging_loss=0.008903, over 3041578.36 frames. ], batch size: 54, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:41:51,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3378006.6666666665, ans=0.0 2023-11-28 05:42:05,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3378073.3333333335, ans=0.125 2023-11-28 05:42:32,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3378206.6666666665, ans=0.125 2023-11-28 05:42:35,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3378273.3333333335, ans=0.125 2023-11-28 05:42:42,953 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.47 vs. limit=22.5 2023-11-28 05:42:44,568 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506750 2023-11-28 05:42:47,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3378340.0, ans=0.125 2023-11-28 05:42:48,368 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1750, loss[loss=0.06231, simple_loss=0.08298, pruned_loss=0.01209, audio_tagging_loss=0.008729, over 16016.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09063, pruned_loss=0.01244, audio_tagging_loss=0.008796, over 3042946.99 frames. ], batch size: 61, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:42:50,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3378340.0, ans=0.1 2023-11-28 05:43:05,098 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.54 vs. limit=5.0 2023-11-28 05:43:18,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3378473.3333333335, ans=0.5 2023-11-28 05:43:22,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3378540.0, ans=0.0 2023-11-28 05:43:41,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 8.622e+01 9.287e+01 9.848e+01 1.344e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 05:43:42,023 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506800 2023-11-28 05:43:45,570 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1800, loss[loss=0.06588, simple_loss=0.09421, pruned_loss=0.01012, audio_tagging_loss=0.00865, over 15438.00 frames. ], tot_loss[loss=0.06694, simple_loss=0.09124, pruned_loss=0.01267, audio_tagging_loss=0.008652, over 3044192.95 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:43:58,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3378740.0, ans=0.125 2023-11-28 05:44:39,478 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506850 2023-11-28 05:44:43,409 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1850, loss[loss=0.05794, simple_loss=0.07753, pruned_loss=0.01039, audio_tagging_loss=0.008788, over 15416.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09117, pruned_loss=0.01271, audio_tagging_loss=0.008556, over 3042987.91 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:44:43,937 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.01 vs. limit=15.0 2023-11-28 05:44:52,747 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:45:05,987 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.87 vs. limit=15.0 2023-11-28 05:45:07,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3379140.0, ans=0.125 2023-11-28 05:45:13,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3379140.0, ans=0.125 2023-11-28 05:45:30,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3379273.3333333335, ans=0.1 2023-11-28 05:45:36,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3379273.3333333335, ans=0.0 2023-11-28 05:45:37,548 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.914e+01 9.536e+01 1.008e+02 1.259e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 05:45:37,641 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506900 2023-11-28 05:45:41,363 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1900, loss[loss=0.0775, simple_loss=0.1032, pruned_loss=0.01782, audio_tagging_loss=0.008073, over 16173.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09046, pruned_loss=0.01254, audio_tagging_loss=0.008591, over 3050246.68 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:46:01,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3379406.6666666665, ans=0.125 2023-11-28 05:46:04,687 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.01 vs. limit=22.5 2023-11-28 05:46:15,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3379540.0, ans=0.1 2023-11-28 05:46:35,276 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 506950 2023-11-28 05:46:38,581 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 1950, loss[loss=0.06866, simple_loss=0.09238, pruned_loss=0.01295, audio_tagging_loss=0.009515, over 15542.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09021, pruned_loss=0.01248, audio_tagging_loss=0.00854, over 3046650.80 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:46:38,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3379673.3333333335, ans=0.125 2023-11-28 05:46:57,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3379740.0, ans=0.95 2023-11-28 05:47:32,643 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 8.861e+01 9.415e+01 1.012e+02 1.225e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 05:47:32,750 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507000 2023-11-28 05:47:36,945 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2000, loss[loss=0.08047, simple_loss=0.1137, pruned_loss=0.01461, audio_tagging_loss=0.009009, over 14755.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09044, pruned_loss=0.01274, audio_tagging_loss=0.008558, over 3038080.83 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:47:44,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3380006.6666666665, ans=0.0 2023-11-28 05:47:52,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3380073.3333333335, ans=0.125 2023-11-28 05:47:57,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3380073.3333333335, ans=0.0 2023-11-28 05:48:07,498 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.91 vs. limit=22.5 2023-11-28 05:48:18,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3380206.6666666665, ans=0.0 2023-11-28 05:48:31,183 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507050 2023-11-28 05:48:31,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3380273.3333333335, ans=0.0 2023-11-28 05:48:33,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3380273.3333333335, ans=0.0 2023-11-28 05:48:33,368 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-11-28 05:48:34,870 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2050, loss[loss=0.05206, simple_loss=0.06075, pruned_loss=0.008779, audio_tagging_loss=0.01291, over 15205.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09047, pruned_loss=0.01268, audio_tagging_loss=0.008586, over 3040922.64 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:48:45,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3380406.6666666665, ans=0.125 2023-11-28 05:49:00,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3380473.3333333335, ans=0.1 2023-11-28 05:49:03,248 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=12.0 2023-11-28 05:49:07,450 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.45 vs. limit=22.5 2023-11-28 05:49:16,875 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:49:23,076 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.22 vs. limit=15.0 2023-11-28 05:49:29,176 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507100 2023-11-28 05:49:30,216 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 9.113e+01 9.631e+01 1.014e+02 1.250e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-28 05:49:32,377 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2100, loss[loss=0.06782, simple_loss=0.08935, pruned_loss=0.0138, audio_tagging_loss=0.009338, over 14760.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09045, pruned_loss=0.01256, audio_tagging_loss=0.008541, over 3040444.44 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:49:42,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3380740.0, ans=0.125 2023-11-28 05:49:44,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3380740.0, ans=0.125 2023-11-28 05:49:44,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3380740.0, ans=0.125 2023-11-28 05:50:09,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3380873.3333333335, ans=0.125 2023-11-28 05:50:12,298 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:50:16,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3380873.3333333335, ans=0.125 2023-11-28 05:50:18,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.50 vs. limit=10.0 2023-11-28 05:50:18,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3380940.0, ans=0.125 2023-11-28 05:50:23,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3380940.0, ans=0.125 2023-11-28 05:50:23,776 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.62 vs. limit=12.0 2023-11-28 05:50:26,427 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507150 2023-11-28 05:50:29,589 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2150, loss[loss=0.06228, simple_loss=0.08207, pruned_loss=0.01199, audio_tagging_loss=0.009258, over 15815.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.09067, pruned_loss=0.01263, audio_tagging_loss=0.008602, over 3047141.15 frames. ], batch size: 62, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:50:29,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3381006.6666666665, ans=0.2 2023-11-28 05:50:30,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3381006.6666666665, ans=0.125 2023-11-28 05:50:33,333 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2023-11-28 05:50:51,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3381073.3333333335, ans=0.125 2023-11-28 05:50:57,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3381140.0, ans=0.125 2023-11-28 05:50:58,662 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-28 05:51:05,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3381206.6666666665, ans=0.0 2023-11-28 05:51:07,298 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:51:14,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3381206.6666666665, ans=0.1 2023-11-28 05:51:25,082 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507200 2023-11-28 05:51:26,068 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.293e+01 8.657e+01 9.306e+01 1.016e+02 1.700e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-28 05:51:28,712 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2200, loss[loss=0.06988, simple_loss=0.08967, pruned_loss=0.01629, audio_tagging_loss=0.008757, over 15934.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09042, pruned_loss=0.01255, audio_tagging_loss=0.008656, over 3038476.82 frames. ], batch size: 61, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:51:43,420 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.24 vs. limit=15.0 2023-11-28 05:51:47,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3381406.6666666665, ans=0.125 2023-11-28 05:51:49,042 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.59 vs. limit=15.0 2023-11-28 05:51:54,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3381473.3333333335, ans=0.09899494936611666 2023-11-28 05:52:23,778 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507250 2023-11-28 05:52:27,083 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2250, loss[loss=0.06769, simple_loss=0.08592, pruned_loss=0.01447, audio_tagging_loss=0.01026, over 16067.00 frames. ], tot_loss[loss=0.06692, simple_loss=0.09116, pruned_loss=0.0127, audio_tagging_loss=0.008641, over 3040151.50 frames. ], batch size: 62, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:52:36,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3381673.3333333335, ans=0.0 2023-11-28 05:53:02,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3381873.3333333335, ans=0.125 2023-11-28 05:53:20,991 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507300 2023-11-28 05:53:22,034 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.780e+01 8.876e+01 9.357e+01 9.943e+01 1.279e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-28 05:53:24,249 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2300, loss[loss=0.08864, simple_loss=0.1106, pruned_loss=0.02688, audio_tagging_loss=0.006447, over 15496.00 frames. ], tot_loss[loss=0.06732, simple_loss=0.09167, pruned_loss=0.01283, audio_tagging_loss=0.008658, over 3046552.77 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:54:17,391 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 05:54:17,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3382273.3333333335, ans=0.1 2023-11-28 05:54:18,528 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507350 2023-11-28 05:54:21,706 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2350, loss[loss=0.06833, simple_loss=0.08907, pruned_loss=0.01465, audio_tagging_loss=0.009148, over 15446.00 frames. ], tot_loss[loss=0.06713, simple_loss=0.0914, pruned_loss=0.01273, audio_tagging_loss=0.008707, over 3051745.84 frames. ], batch size: 63, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 05:54:30,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3382340.0, ans=0.2 2023-11-28 05:54:38,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3382406.6666666665, ans=0.125 2023-11-28 05:55:00,371 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.21 vs. limit=15.0 2023-11-28 05:55:08,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3382606.6666666665, ans=0.0 2023-11-28 05:55:17,743 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507400 2023-11-28 05:55:18,808 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.197e+01 8.819e+01 9.502e+01 1.018e+02 1.349e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 05:55:21,456 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2400, loss[loss=0.07783, simple_loss=0.1059, pruned_loss=0.01713, audio_tagging_loss=0.007725, over 15808.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09105, pruned_loss=0.01261, audio_tagging_loss=0.008881, over 3055367.78 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:55:27,612 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2023-11-28 05:55:33,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3382740.0, ans=0.2 2023-11-28 05:55:43,085 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.62 vs. limit=15.0 2023-11-28 05:56:05,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3382940.0, ans=0.0 2023-11-28 05:56:08,466 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2023-11-28 05:56:14,620 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507450 2023-11-28 05:56:17,882 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2450, loss[loss=0.06086, simple_loss=0.08103, pruned_loss=0.01069, audio_tagging_loss=0.009657, over 15156.00 frames. ], tot_loss[loss=0.06729, simple_loss=0.09163, pruned_loss=0.0126, audio_tagging_loss=0.008879, over 3051781.47 frames. ], batch size: 59, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:56:36,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3383073.3333333335, ans=0.125 2023-11-28 05:56:48,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3383140.0, ans=0.125 2023-11-28 05:56:53,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3383206.6666666665, ans=0.125 2023-11-28 05:56:59,214 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.45 vs. limit=22.5 2023-11-28 05:57:12,369 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507500 2023-11-28 05:57:13,374 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.778e+01 9.508e+01 1.025e+02 1.201e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 05:57:15,555 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2500, loss[loss=0.07408, simple_loss=0.1075, pruned_loss=0.01092, audio_tagging_loss=0.009398, over 15824.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09094, pruned_loss=0.01244, audio_tagging_loss=0.008972, over 3049797.40 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:57:18,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3383340.0, ans=0.0 2023-11-28 05:57:58,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3383540.0, ans=0.125 2023-11-28 05:57:58,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3383540.0, ans=0.125 2023-11-28 05:58:10,252 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507550 2023-11-28 05:58:10,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3383606.6666666665, ans=0.125 2023-11-28 05:58:14,126 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2550, loss[loss=0.06603, simple_loss=0.08822, pruned_loss=0.01046, audio_tagging_loss=0.01147, over 15774.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09029, pruned_loss=0.01236, audio_tagging_loss=0.008891, over 3046330.91 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:58:27,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3383740.0, ans=0.125 2023-11-28 05:58:34,213 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 05:58:36,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3383806.6666666665, ans=0.125 2023-11-28 05:58:54,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3383873.3333333335, ans=0.1 2023-11-28 05:59:02,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3383940.0, ans=0.125 2023-11-28 05:59:02,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3383940.0, ans=0.025 2023-11-28 05:59:03,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3383940.0, ans=0.125 2023-11-28 05:59:08,238 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507600 2023-11-28 05:59:08,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=12.0 2023-11-28 05:59:09,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.936e+01 8.563e+01 9.261e+01 9.725e+01 1.208e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-28 05:59:11,673 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2600, loss[loss=0.07036, simple_loss=0.09239, pruned_loss=0.01721, audio_tagging_loss=0.006957, over 15197.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08947, pruned_loss=0.01227, audio_tagging_loss=0.008754, over 3042943.71 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 05:59:14,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3384006.6666666665, ans=0.125 2023-11-28 05:59:29,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3384073.3333333335, ans=0.07 2023-11-28 05:59:37,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3384140.0, ans=0.0 2023-11-28 05:59:43,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3384140.0, ans=0.0 2023-11-28 05:59:45,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3384206.6666666665, ans=0.125 2023-11-28 06:00:05,553 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507650 2023-11-28 06:00:09,386 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2650, loss[loss=0.04875, simple_loss=0.06554, pruned_loss=0.007595, audio_tagging_loss=0.008381, over 15511.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08963, pruned_loss=0.01223, audio_tagging_loss=0.00868, over 3051176.60 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:00:29,538 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.01 vs. limit=10.0 2023-11-28 06:00:46,142 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.27 vs. limit=6.0 2023-11-28 06:01:03,457 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507700 2023-11-28 06:01:04,923 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 8.714e+01 9.424e+01 1.027e+02 1.447e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 06:01:07,158 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2700, loss[loss=0.07415, simple_loss=0.09578, pruned_loss=0.01645, audio_tagging_loss=0.009813, over 15205.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08922, pruned_loss=0.01217, audio_tagging_loss=0.008654, over 3045916.45 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:01:15,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3384673.3333333335, ans=0.125 2023-11-28 06:01:18,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3384740.0, ans=0.125 2023-11-28 06:01:37,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3384806.6666666665, ans=0.125 2023-11-28 06:01:54,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3384940.0, ans=0.0 2023-11-28 06:02:01,527 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507750 2023-11-28 06:02:03,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3385006.6666666665, ans=0.125 2023-11-28 06:02:04,876 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2750, loss[loss=0.06988, simple_loss=0.1014, pruned_loss=0.01124, audio_tagging_loss=0.007931, over 13660.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08937, pruned_loss=0.01221, audio_tagging_loss=0.008698, over 3051860.84 frames. ], batch size: 53, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:02:08,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3385006.6666666665, ans=0.125 2023-11-28 06:02:15,707 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-28 06:02:20,222 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.66 vs. limit=6.0 2023-11-28 06:02:23,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3385073.3333333335, ans=0.125 2023-11-28 06:02:38,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3385206.6666666665, ans=0.0 2023-11-28 06:02:39,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3385206.6666666665, ans=0.04949747468305833 2023-11-28 06:02:48,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3385206.6666666665, ans=0.0 2023-11-28 06:02:56,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2023-11-28 06:02:57,429 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:02:57,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3385273.3333333335, ans=0.2 2023-11-28 06:02:58,575 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507800 2023-11-28 06:02:59,613 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.045e+01 8.812e+01 9.371e+01 1.002e+02 1.155e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 06:03:02,358 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2800, loss[loss=0.07294, simple_loss=0.09988, pruned_loss=0.01537, audio_tagging_loss=0.00763, over 14619.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.0886, pruned_loss=0.01222, audio_tagging_loss=0.008723, over 3046240.97 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:03:17,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3385406.6666666665, ans=0.1 2023-11-28 06:03:39,408 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2023-11-28 06:03:56,790 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507850 2023-11-28 06:03:56,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3385606.6666666665, ans=0.0 2023-11-28 06:04:00,422 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2850, loss[loss=0.06577, simple_loss=0.09319, pruned_loss=0.0132, audio_tagging_loss=0.00597, over 15423.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08886, pruned_loss=0.01246, audio_tagging_loss=0.008684, over 3040894.75 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:04:16,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3385740.0, ans=0.125 2023-11-28 06:04:18,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3385740.0, ans=0.125 2023-11-28 06:04:20,285 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.24 vs. limit=22.5 2023-11-28 06:04:32,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3385806.6666666665, ans=0.0 2023-11-28 06:04:44,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3385873.3333333335, ans=0.125 2023-11-28 06:04:53,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3385940.0, ans=0.125 2023-11-28 06:04:54,040 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507900 2023-11-28 06:04:55,038 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 8.844e+01 9.452e+01 9.978e+01 1.410e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 06:04:57,245 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2900, loss[loss=0.0534, simple_loss=0.06433, pruned_loss=0.01397, audio_tagging_loss=0.007269, over 14804.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08863, pruned_loss=0.01247, audio_tagging_loss=0.008673, over 3037898.75 frames. ], batch size: 58, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:04:58,962 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.38 vs. limit=22.5 2023-11-28 06:05:02,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3386006.6666666665, ans=0.125 2023-11-28 06:05:02,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3386006.6666666665, ans=0.1 2023-11-28 06:05:28,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3386140.0, ans=0.125 2023-11-28 06:05:37,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3386206.6666666665, ans=0.125 2023-11-28 06:05:51,640 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 507950 2023-11-28 06:05:54,923 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 2950, loss[loss=0.08521, simple_loss=0.1164, pruned_loss=0.01948, audio_tagging_loss=0.00752, over 15312.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08924, pruned_loss=0.01252, audio_tagging_loss=0.008667, over 3041367.53 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:06:02,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3386340.0, ans=0.07 2023-11-28 06:06:06,955 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.30 vs. limit=15.0 2023-11-28 06:06:14,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3386406.6666666665, ans=0.125 2023-11-28 06:06:17,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3386473.3333333335, ans=0.125 2023-11-28 06:06:41,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3386606.6666666665, ans=0.125 2023-11-28 06:06:49,060 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508000 2023-11-28 06:06:50,044 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.945e+01 9.706e+01 1.033e+02 1.300e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-28 06:06:55,281 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3000, loss[loss=0.06159, simple_loss=0.08377, pruned_loss=0.01104, audio_tagging_loss=0.008669, over 14944.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09016, pruned_loss=0.01261, audio_tagging_loss=0.008718, over 3038634.61 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:06:55,281 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 06:07:16,715 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.9672, 2.9873, 2.7973, 2.7159, 3.4116, 3.3325, 3.1528, 3.6119], device='cuda:2') 2023-11-28 06:07:29,849 INFO [train_asr.py:1267] (2/4) Epoch 43, validation: loss=0.0576, simple_loss=0.05056, pruned_loss=0.005189, audio_tagging_loss=0.02713, over 4681554.00 frames. 2023-11-28 06:07:29,850 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 06:07:46,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3386740.0, ans=0.2 2023-11-28 06:08:03,420 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.50 vs. limit=22.5 2023-11-28 06:08:24,956 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508050 2023-11-28 06:08:28,281 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3050, loss[loss=0.06092, simple_loss=0.07495, pruned_loss=0.01378, audio_tagging_loss=0.009674, over 14643.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09024, pruned_loss=0.01258, audio_tagging_loss=0.008769, over 3044024.06 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:08:36,543 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=15.0 2023-11-28 06:08:59,596 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=22.5 2023-11-28 06:09:05,118 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:09:22,212 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508100 2023-11-28 06:09:23,656 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 8.950e+01 9.666e+01 1.022e+02 1.393e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 06:09:26,337 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3100, loss[loss=0.05976, simple_loss=0.0868, pruned_loss=0.006869, audio_tagging_loss=0.009491, over 15221.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09082, pruned_loss=0.0126, audio_tagging_loss=0.008816, over 3048862.80 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:09:29,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3387340.0, ans=0.125 2023-11-28 06:09:54,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3387473.3333333335, ans=0.1 2023-11-28 06:10:11,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3387606.6666666665, ans=0.09899494936611666 2023-11-28 06:10:17,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3387606.6666666665, ans=0.0 2023-11-28 06:10:20,459 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508150 2023-11-28 06:10:23,659 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3150, loss[loss=0.06698, simple_loss=0.09401, pruned_loss=0.01193, audio_tagging_loss=0.00805, over 14895.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09091, pruned_loss=0.01259, audio_tagging_loss=0.008809, over 3043849.67 frames. ], batch size: 55, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:10:27,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3387673.3333333335, ans=0.2 2023-11-28 06:10:42,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3387740.0, ans=0.1 2023-11-28 06:10:56,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3387806.6666666665, ans=0.125 2023-11-28 06:10:56,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3387806.6666666665, ans=0.2 2023-11-28 06:10:59,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3387873.3333333335, ans=0.125 2023-11-28 06:11:15,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3387940.0, ans=0.125 2023-11-28 06:11:16,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3387940.0, ans=0.125 2023-11-28 06:11:17,471 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508200 2023-11-28 06:11:18,542 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.421e+01 8.906e+01 9.448e+01 1.016e+02 1.228e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 06:11:22,231 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3200, loss[loss=0.06507, simple_loss=0.08751, pruned_loss=0.01078, audio_tagging_loss=0.01053, over 14626.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09084, pruned_loss=0.01256, audio_tagging_loss=0.008982, over 3041463.76 frames. ], batch size: 57, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:11:39,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3388073.3333333335, ans=0.2 2023-11-28 06:12:07,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3388273.3333333335, ans=0.2 2023-11-28 06:12:15,223 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508250 2023-11-28 06:12:18,444 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3250, loss[loss=0.06508, simple_loss=0.08803, pruned_loss=0.01225, audio_tagging_loss=0.008819, over 16407.00 frames. ], tot_loss[loss=0.06696, simple_loss=0.09104, pruned_loss=0.01253, audio_tagging_loss=0.008919, over 3044180.12 frames. ], batch size: 62, lr: 1.58e-03, grad_scale: 32.0 2023-11-28 06:12:21,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3388340.0, ans=0.1 2023-11-28 06:12:23,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3388340.0, ans=0.125 2023-11-28 06:12:29,983 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=6.0 2023-11-28 06:12:34,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3388406.6666666665, ans=0.125 2023-11-28 06:12:38,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3388406.6666666665, ans=0.1 2023-11-28 06:12:38,369 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=15.0 2023-11-28 06:12:49,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3388473.3333333335, ans=0.125 2023-11-28 06:12:53,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3388540.0, ans=0.125 2023-11-28 06:12:59,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3388540.0, ans=0.0 2023-11-28 06:13:05,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3388606.6666666665, ans=0.125 2023-11-28 06:13:11,397 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2023-11-28 06:13:13,072 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508300 2023-11-28 06:13:15,105 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 8.835e+01 9.450e+01 1.028e+02 1.248e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 06:13:16,199 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3300, loss[loss=0.05078, simple_loss=0.06901, pruned_loss=0.008852, audio_tagging_loss=0.007427, over 16061.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09057, pruned_loss=0.01254, audio_tagging_loss=0.009019, over 3053368.98 frames. ], batch size: 61, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 06:13:18,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3388673.3333333335, ans=0.0 2023-11-28 06:13:30,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3388740.0, ans=0.125 2023-11-28 06:13:33,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3388740.0, ans=0.07 2023-11-28 06:13:47,954 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=12.0 2023-11-28 06:14:00,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3388873.3333333335, ans=0.1 2023-11-28 06:14:06,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3388940.0, ans=0.09899494936611666 2023-11-28 06:14:09,946 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508350 2023-11-28 06:14:10,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3388940.0, ans=0.125 2023-11-28 06:14:13,196 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3350, loss[loss=0.07155, simple_loss=0.09886, pruned_loss=0.01337, audio_tagging_loss=0.00875, over 14980.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09045, pruned_loss=0.01257, audio_tagging_loss=0.009034, over 3049687.96 frames. ], batch size: 56, lr: 1.58e-03, grad_scale: 16.0 2023-11-28 06:14:26,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3389073.3333333335, ans=0.0 2023-11-28 06:15:08,374 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508400 2023-11-28 06:15:10,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.815e+01 9.381e+01 1.020e+02 1.211e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 06:15:11,907 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3400, loss[loss=0.06905, simple_loss=0.09548, pruned_loss=0.01528, audio_tagging_loss=0.006024, over 15433.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09067, pruned_loss=0.01261, audio_tagging_loss=0.008871, over 3046593.99 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:15:13,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3389340.0, ans=0.0 2023-11-28 06:15:18,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3389340.0, ans=0.125 2023-11-28 06:15:24,221 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=22.5 2023-11-28 06:15:25,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3389406.6666666665, ans=0.125 2023-11-28 06:15:52,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3389540.0, ans=0.125 2023-11-28 06:15:53,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3389540.0, ans=0.125 2023-11-28 06:16:06,779 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508450 2023-11-28 06:16:09,955 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3450, loss[loss=0.06424, simple_loss=0.08234, pruned_loss=0.01312, audio_tagging_loss=0.009947, over 14470.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09027, pruned_loss=0.01258, audio_tagging_loss=0.008784, over 3045576.06 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:16:26,663 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.83 vs. limit=10.0 2023-11-28 06:16:46,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3389873.3333333335, ans=0.1 2023-11-28 06:16:52,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3389873.3333333335, ans=0.2 2023-11-28 06:17:03,735 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508500 2023-11-28 06:17:06,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3390006.6666666665, ans=0.2 2023-11-28 06:17:06,976 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 8.735e+01 9.510e+01 1.030e+02 1.229e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 06:17:07,002 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3500, loss[loss=0.1093, simple_loss=0.1513, pruned_loss=0.02693, audio_tagging_loss=0.006671, over 14980.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09072, pruned_loss=0.01272, audio_tagging_loss=0.008719, over 3049119.19 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:17:08,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3390006.6666666665, ans=0.125 2023-11-28 06:17:10,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3390006.6666666665, ans=0.1 2023-11-28 06:17:19,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3390073.3333333335, ans=0.0 2023-11-28 06:17:34,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3390140.0, ans=0.125 2023-11-28 06:17:38,573 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.14 vs. limit=12.0 2023-11-28 06:17:41,199 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:17:48,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3390206.6666666665, ans=0.125 2023-11-28 06:17:50,507 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2023-11-28 06:18:01,668 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508550 2023-11-28 06:18:04,914 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3550, loss[loss=0.07096, simple_loss=0.09693, pruned_loss=0.01418, audio_tagging_loss=0.008326, over 16432.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09033, pruned_loss=0.01258, audio_tagging_loss=0.008769, over 3050115.48 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:18:22,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.83 vs. limit=15.0 2023-11-28 06:18:25,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3390406.6666666665, ans=15.0 2023-11-28 06:19:00,557 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508600 2023-11-28 06:19:04,050 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 8.807e+01 9.210e+01 1.006e+02 1.301e+02, threshold=1.842e+02, percent-clipped=0.0 2023-11-28 06:19:04,076 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3600, loss[loss=0.06484, simple_loss=0.08374, pruned_loss=0.01307, audio_tagging_loss=0.009902, over 14866.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08993, pruned_loss=0.01246, audio_tagging_loss=0.008741, over 3045154.56 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:19:04,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3390673.3333333335, ans=0.125 2023-11-28 06:19:12,166 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.11 vs. limit=22.5 2023-11-28 06:19:28,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3390806.6666666665, ans=0.125 2023-11-28 06:19:35,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3390806.6666666665, ans=0.125 2023-11-28 06:19:35,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3390806.6666666665, ans=0.125 2023-11-28 06:19:57,665 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508650 2023-11-28 06:20:00,976 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3650, loss[loss=0.04727, simple_loss=0.06616, pruned_loss=0.005764, audio_tagging_loss=0.008427, over 14797.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08942, pruned_loss=0.01237, audio_tagging_loss=0.008713, over 3043418.91 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:20:02,845 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=15.0 2023-11-28 06:20:03,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3391006.6666666665, ans=0.07 2023-11-28 06:20:11,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3391073.3333333335, ans=0.125 2023-11-28 06:20:20,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3391073.3333333335, ans=0.125 2023-11-28 06:20:22,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3391073.3333333335, ans=0.2 2023-11-28 06:20:23,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3391140.0, ans=0.125 2023-11-28 06:20:40,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3391206.6666666665, ans=0.2 2023-11-28 06:20:51,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3391273.3333333335, ans=0.125 2023-11-28 06:20:54,988 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508700 2023-11-28 06:20:55,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3391273.3333333335, ans=0.125 2023-11-28 06:20:58,169 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.782e+01 9.556e+01 1.009e+02 1.270e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 06:20:58,195 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3700, loss[loss=0.05705, simple_loss=0.07896, pruned_loss=0.007934, audio_tagging_loss=0.009636, over 16107.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09012, pruned_loss=0.01237, audio_tagging_loss=0.008674, over 3048784.44 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:21:19,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3391406.6666666665, ans=0.125 2023-11-28 06:21:53,597 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508750 2023-11-28 06:21:56,257 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.87 vs. limit=10.0 2023-11-28 06:21:56,792 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3750, loss[loss=0.08538, simple_loss=0.1243, pruned_loss=0.01609, audio_tagging_loss=0.007161, over 14709.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09021, pruned_loss=0.01247, audio_tagging_loss=0.008683, over 3059331.57 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:22:11,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3391740.0, ans=0.0 2023-11-28 06:22:16,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3391740.0, ans=0.125 2023-11-28 06:22:24,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3391806.6666666665, ans=0.125 2023-11-28 06:22:25,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3391806.6666666665, ans=0.1 2023-11-28 06:22:40,349 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:22:48,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3391940.0, ans=10.0 2023-11-28 06:22:50,267 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508800 2023-11-28 06:22:53,857 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3800, loss[loss=0.06519, simple_loss=0.08414, pruned_loss=0.01364, audio_tagging_loss=0.009477, over 13865.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.0899, pruned_loss=0.01231, audio_tagging_loss=0.008762, over 3055606.98 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:22:54,974 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.959e+01 9.739e+01 1.027e+02 1.673e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-28 06:22:58,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3392006.6666666665, ans=0.125 2023-11-28 06:23:27,216 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.74 vs. limit=15.0 2023-11-28 06:23:48,345 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508850 2023-11-28 06:23:51,650 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3850, loss[loss=0.06047, simple_loss=0.07999, pruned_loss=0.01171, audio_tagging_loss=0.008767, over 14748.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08999, pruned_loss=0.01224, audio_tagging_loss=0.008815, over 3057186.45 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:24:04,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3392406.6666666665, ans=0.125 2023-11-28 06:24:29,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3392540.0, ans=0.0 2023-11-28 06:24:45,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3392606.6666666665, ans=0.1 2023-11-28 06:24:46,335 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508900 2023-11-28 06:24:50,031 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3900, loss[loss=0.0611, simple_loss=0.088, pruned_loss=0.009122, audio_tagging_loss=0.007984, over 14256.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08984, pruned_loss=0.01215, audio_tagging_loss=0.008817, over 3052393.60 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:24:51,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.618e+01 9.454e+01 1.005e+02 2.661e+02, threshold=1.891e+02, percent-clipped=1.0 2023-11-28 06:24:53,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3392673.3333333335, ans=0.1 2023-11-28 06:25:03,471 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.77 vs. limit=15.0 2023-11-28 06:25:23,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3392873.3333333335, ans=0.0 2023-11-28 06:25:25,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3392873.3333333335, ans=0.0 2023-11-28 06:25:25,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3392873.3333333335, ans=0.125 2023-11-28 06:25:29,686 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.95 vs. limit=10.0 2023-11-28 06:25:30,672 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=12.0 2023-11-28 06:25:33,257 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=22.5 2023-11-28 06:25:36,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3392940.0, ans=0.0 2023-11-28 06:25:44,680 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 508950 2023-11-28 06:25:48,018 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 3950, loss[loss=0.06601, simple_loss=0.08323, pruned_loss=0.01534, audio_tagging_loss=0.009051, over 15693.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09027, pruned_loss=0.0122, audio_tagging_loss=0.008856, over 3055596.78 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:25:59,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3393073.3333333335, ans=0.0 2023-11-28 06:26:04,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3393073.3333333335, ans=0.0 2023-11-28 06:26:19,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0 2023-11-28 06:26:36,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3393273.3333333335, ans=0.125 2023-11-28 06:26:38,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3393273.3333333335, ans=0.0 2023-11-28 06:26:42,372 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509000 2023-11-28 06:26:42,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3393273.3333333335, ans=0.0 2023-11-28 06:26:46,284 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4000, loss[loss=0.05243, simple_loss=0.07038, pruned_loss=0.007364, audio_tagging_loss=0.009871, over 15087.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09025, pruned_loss=0.01225, audio_tagging_loss=0.008919, over 3044729.70 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:26:47,325 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.879e+01 9.661e+01 1.044e+02 1.748e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 06:26:54,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3393340.0, ans=0.0 2023-11-28 06:26:57,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3393406.6666666665, ans=0.0 2023-11-28 06:27:01,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3393406.6666666665, ans=0.0 2023-11-28 06:27:02,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3393406.6666666665, ans=0.5 2023-11-28 06:27:13,577 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2023-11-28 06:27:39,970 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509050 2023-11-28 06:27:42,183 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2023-11-28 06:27:44,181 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4050, loss[loss=0.07386, simple_loss=0.09907, pruned_loss=0.01596, audio_tagging_loss=0.008365, over 15376.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.09093, pruned_loss=0.0125, audio_tagging_loss=0.008935, over 3038053.59 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:27:47,732 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:27:50,671 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:27:55,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3393740.0, ans=0.1 2023-11-28 06:27:59,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3393740.0, ans=0.0 2023-11-28 06:28:05,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3393806.6666666665, ans=0.125 2023-11-28 06:28:06,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3393806.6666666665, ans=0.2 2023-11-28 06:28:09,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3393806.6666666665, ans=0.95 2023-11-28 06:28:27,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3393873.3333333335, ans=0.0 2023-11-28 06:28:37,564 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509100 2023-11-28 06:28:38,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3393940.0, ans=0.1 2023-11-28 06:28:40,826 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4100, loss[loss=0.06889, simple_loss=0.08748, pruned_loss=0.01503, audio_tagging_loss=0.01011, over 15277.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.0907, pruned_loss=0.01243, audio_tagging_loss=0.008968, over 3038375.95 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:28:43,579 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.882e+01 9.020e+01 9.493e+01 1.014e+02 1.731e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 06:28:43,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3394006.6666666665, ans=0.0 2023-11-28 06:29:05,871 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.83 vs. limit=22.5 2023-11-28 06:29:34,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3394273.3333333335, ans=0.2 2023-11-28 06:29:35,290 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509150 2023-11-28 06:29:38,568 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4150, loss[loss=0.04771, simple_loss=0.06887, pruned_loss=0.005334, audio_tagging_loss=0.007941, over 14697.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.0902, pruned_loss=0.01221, audio_tagging_loss=0.008879, over 3039098.67 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:30:11,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3394473.3333333335, ans=0.2 2023-11-28 06:30:11,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3394473.3333333335, ans=0.125 2023-11-28 06:30:25,690 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:30:33,255 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509200 2023-11-28 06:30:36,806 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4200, loss[loss=0.07344, simple_loss=0.1057, pruned_loss=0.01169, audio_tagging_loss=0.008878, over 15886.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08919, pruned_loss=0.01205, audio_tagging_loss=0.00879, over 3040993.85 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:30:38,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3394673.3333333335, ans=0.0 2023-11-28 06:30:38,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3394673.3333333335, ans=0.125 2023-11-28 06:30:39,978 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.720e+01 9.332e+01 1.017e+02 1.296e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 06:30:53,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3394740.0, ans=0.1 2023-11-28 06:30:54,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3394740.0, ans=0.125 2023-11-28 06:30:57,321 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2023-11-28 06:31:01,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3394806.6666666665, ans=0.2 2023-11-28 06:31:01,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3394806.6666666665, ans=0.0 2023-11-28 06:31:32,121 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509250 2023-11-28 06:31:32,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3394940.0, ans=0.0 2023-11-28 06:31:35,363 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4250, loss[loss=0.06393, simple_loss=0.08375, pruned_loss=0.01117, audio_tagging_loss=0.01088, over 15702.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08917, pruned_loss=0.01211, audio_tagging_loss=0.008673, over 3044392.75 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:31:38,138 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2023-11-28 06:31:43,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3395006.6666666665, ans=0.1 2023-11-28 06:32:11,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3395206.6666666665, ans=0.125 2023-11-28 06:32:22,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3395273.3333333335, ans=0.125 2023-11-28 06:32:23,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3395273.3333333335, ans=10.0 2023-11-28 06:32:27,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3395273.3333333335, ans=0.0 2023-11-28 06:32:29,168 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509300 2023-11-28 06:32:29,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3395273.3333333335, ans=0.0 2023-11-28 06:32:33,201 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4300, loss[loss=0.05948, simple_loss=0.07567, pruned_loss=0.01246, audio_tagging_loss=0.009189, over 14644.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08961, pruned_loss=0.01209, audio_tagging_loss=0.008657, over 3043224.66 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:32:33,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3395340.0, ans=0.125 2023-11-28 06:32:34,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3395340.0, ans=0.125 2023-11-28 06:32:35,402 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.011e+01 8.843e+01 9.507e+01 1.023e+02 2.128e+02, threshold=1.901e+02, percent-clipped=1.0 2023-11-28 06:33:00,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3395473.3333333335, ans=0.0 2023-11-28 06:33:22,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3395606.6666666665, ans=0.2 2023-11-28 06:33:27,981 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509350 2023-11-28 06:33:31,220 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4350, loss[loss=0.06034, simple_loss=0.08744, pruned_loss=0.008919, audio_tagging_loss=0.0077, over 16229.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08974, pruned_loss=0.01204, audio_tagging_loss=0.008626, over 3043316.98 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 06:33:50,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3395740.0, ans=0.125 2023-11-28 06:34:06,930 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.40 vs. limit=10.0 2023-11-28 06:34:08,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3395873.3333333335, ans=0.2 2023-11-28 06:34:26,161 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509400 2023-11-28 06:34:29,644 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4400, loss[loss=0.07876, simple_loss=0.1082, pruned_loss=0.01618, audio_tagging_loss=0.00848, over 15031.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.09023, pruned_loss=0.01222, audio_tagging_loss=0.00863, over 3048368.48 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:34:31,826 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.603e+01 8.958e+01 9.354e+01 1.037e+02 1.325e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-28 06:34:32,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3396006.6666666665, ans=0.125 2023-11-28 06:34:48,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3396073.3333333335, ans=0.0 2023-11-28 06:34:55,762 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2023-11-28 06:34:57,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3396140.0, ans=0.04949747468305833 2023-11-28 06:35:08,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3396206.6666666665, ans=0.125 2023-11-28 06:35:23,650 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509450 2023-11-28 06:35:26,823 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4450, loss[loss=0.04988, simple_loss=0.06564, pruned_loss=0.006695, audio_tagging_loss=0.01036, over 16431.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08937, pruned_loss=0.0121, audio_tagging_loss=0.008589, over 3050437.96 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:35:33,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3396340.0, ans=0.125 2023-11-28 06:36:13,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3396606.6666666665, ans=0.1 2023-11-28 06:36:21,604 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509500 2023-11-28 06:36:22,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3396606.6666666665, ans=0.95 2023-11-28 06:36:24,869 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4500, loss[loss=0.06031, simple_loss=0.08763, pruned_loss=0.009749, audio_tagging_loss=0.006747, over 15027.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08855, pruned_loss=0.012, audio_tagging_loss=0.00861, over 3040102.93 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:36:27,122 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 8.759e+01 9.220e+01 9.806e+01 1.292e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-28 06:36:45,053 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2023-11-28 06:36:52,551 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:37:05,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3396873.3333333335, ans=0.2 2023-11-28 06:37:17,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3396940.0, ans=0.1 2023-11-28 06:37:18,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3396940.0, ans=0.0 2023-11-28 06:37:19,897 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509550 2023-11-28 06:37:23,168 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4550, loss[loss=0.06747, simple_loss=0.09419, pruned_loss=0.01342, audio_tagging_loss=0.006958, over 14805.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08883, pruned_loss=0.01199, audio_tagging_loss=0.008664, over 3037761.98 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:38:11,286 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:38:14,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3397273.3333333335, ans=0.125 2023-11-28 06:38:16,787 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509600 2023-11-28 06:38:20,304 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4600, loss[loss=0.05153, simple_loss=0.07244, pruned_loss=0.007931, audio_tagging_loss=0.007375, over 15572.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08957, pruned_loss=0.01213, audio_tagging_loss=0.008696, over 3037819.93 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:38:22,436 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.061e+01 8.724e+01 9.423e+01 1.019e+02 1.398e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 06:39:00,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3397540.0, ans=0.0 2023-11-28 06:39:01,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3397540.0, ans=0.125 2023-11-28 06:39:05,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3397606.6666666665, ans=0.0 2023-11-28 06:39:06,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3397606.6666666665, ans=0.0 2023-11-28 06:39:14,822 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509650 2023-11-28 06:39:18,109 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4650, loss[loss=0.07901, simple_loss=0.1042, pruned_loss=0.01667, audio_tagging_loss=0.01022, over 15546.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08953, pruned_loss=0.0122, audio_tagging_loss=0.00872, over 3040427.15 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:39:19,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3397673.3333333335, ans=0.0 2023-11-28 06:39:23,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3397673.3333333335, ans=0.0 2023-11-28 06:39:24,910 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.10 vs. limit=15.0 2023-11-28 06:39:38,329 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:39:43,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3397806.6666666665, ans=0.125 2023-11-28 06:39:49,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3397806.6666666665, ans=0.1 2023-11-28 06:40:13,659 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509700 2023-11-28 06:40:16,810 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4700, loss[loss=0.06938, simple_loss=0.09408, pruned_loss=0.01452, audio_tagging_loss=0.00782, over 15626.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08964, pruned_loss=0.01224, audio_tagging_loss=0.008816, over 3041020.98 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:40:18,959 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.949e+01 8.857e+01 9.480e+01 1.024e+02 1.425e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 06:40:25,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2023-11-28 06:41:00,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3398206.6666666665, ans=0.2 2023-11-28 06:41:05,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=12.0 2023-11-28 06:41:10,396 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509750 2023-11-28 06:41:13,617 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4750, loss[loss=0.05582, simple_loss=0.06318, pruned_loss=0.0115, audio_tagging_loss=0.01273, over 15703.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08922, pruned_loss=0.01214, audio_tagging_loss=0.008849, over 3031622.14 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:41:18,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3398340.0, ans=0.125 2023-11-28 06:41:20,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.82 vs. limit=15.0 2023-11-28 06:41:28,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3398406.6666666665, ans=0.0 2023-11-28 06:41:47,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3398540.0, ans=0.125 2023-11-28 06:41:54,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3398540.0, ans=0.0 2023-11-28 06:41:57,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3398540.0, ans=0.125 2023-11-28 06:42:07,291 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509800 2023-11-28 06:42:11,429 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4800, loss[loss=0.06511, simple_loss=0.08982, pruned_loss=0.01259, audio_tagging_loss=0.007612, over 15469.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08928, pruned_loss=0.01213, audio_tagging_loss=0.008902, over 3036419.25 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:42:13,640 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.791e+01 9.387e+01 1.001e+02 1.346e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 06:42:14,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3398673.3333333335, ans=0.125 2023-11-28 06:42:16,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2023-11-28 06:42:43,383 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2023-11-28 06:42:44,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3398806.6666666665, ans=0.125 2023-11-28 06:42:56,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3398940.0, ans=0.2 2023-11-28 06:42:58,835 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:42:58,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3398940.0, ans=0.025 2023-11-28 06:43:04,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3398940.0, ans=0.125 2023-11-28 06:43:05,723 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509850 2023-11-28 06:43:09,515 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4850, loss[loss=0.0708, simple_loss=0.09609, pruned_loss=0.01429, audio_tagging_loss=0.008464, over 15652.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08872, pruned_loss=0.01206, audio_tagging_loss=0.008964, over 3043501.11 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:43:10,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3399006.6666666665, ans=0.0 2023-11-28 06:43:26,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3399073.3333333335, ans=0.1 2023-11-28 06:43:30,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3399140.0, ans=0.035 2023-11-28 06:43:30,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3399140.0, ans=0.2 2023-11-28 06:43:51,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3399206.6666666665, ans=0.125 2023-11-28 06:43:52,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3399206.6666666665, ans=0.125 2023-11-28 06:44:02,647 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509900 2023-11-28 06:44:02,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3399273.3333333335, ans=0.1 2023-11-28 06:44:04,446 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.55 vs. limit=15.0 2023-11-28 06:44:05,794 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4900, loss[loss=0.06973, simple_loss=0.1038, pruned_loss=0.01033, audio_tagging_loss=0.007486, over 14861.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08951, pruned_loss=0.01214, audio_tagging_loss=0.008951, over 3043839.66 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:44:07,974 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.789e+01 9.268e+01 1.027e+02 1.406e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-28 06:44:13,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3399340.0, ans=0.2 2023-11-28 06:44:28,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3399473.3333333335, ans=0.025 2023-11-28 06:44:38,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3399473.3333333335, ans=0.5 2023-11-28 06:44:53,392 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2023-11-28 06:44:59,552 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 509950 2023-11-28 06:45:03,495 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 4950, loss[loss=0.06237, simple_loss=0.09104, pruned_loss=0.008678, audio_tagging_loss=0.008168, over 14900.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.0894, pruned_loss=0.01218, audio_tagging_loss=0.008785, over 3037890.80 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:45:17,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3399740.0, ans=0.1 2023-11-28 06:45:20,389 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2023-11-28 06:45:25,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3399740.0, ans=0.0 2023-11-28 06:45:26,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3399806.6666666665, ans=0.0 2023-11-28 06:45:28,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3399806.6666666665, ans=0.125 2023-11-28 06:45:37,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3399873.3333333335, ans=0.125 2023-11-28 06:45:43,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3399873.3333333335, ans=0.125 2023-11-28 06:45:57,760 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510000 2023-11-28 06:46:00,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3400006.6666666665, ans=0.2 2023-11-28 06:46:01,574 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5000, loss[loss=0.05247, simple_loss=0.07786, pruned_loss=0.007583, audio_tagging_loss=0.005962, over 13837.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08925, pruned_loss=0.01219, audio_tagging_loss=0.008656, over 3043842.68 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:46:05,257 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 8.682e+01 9.362e+01 1.003e+02 1.327e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 06:46:06,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3400006.6666666665, ans=0.0 2023-11-28 06:46:15,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3400073.3333333335, ans=0.125 2023-11-28 06:46:17,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3400073.3333333335, ans=0.2 2023-11-28 06:46:29,617 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.74 vs. limit=22.5 2023-11-28 06:46:32,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3400140.0, ans=0.2 2023-11-28 06:46:34,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3400140.0, ans=0.0 2023-11-28 06:46:36,984 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=12.0 2023-11-28 06:46:56,488 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510050 2023-11-28 06:46:59,731 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5050, loss[loss=0.07033, simple_loss=0.09653, pruned_loss=0.01482, audio_tagging_loss=0.007237, over 14830.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08884, pruned_loss=0.01214, audio_tagging_loss=0.008616, over 3038441.16 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:47:21,255 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:47:23,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3400473.3333333335, ans=0.125 2023-11-28 06:47:30,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3400473.3333333335, ans=0.0 2023-11-28 06:47:34,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3400540.0, ans=0.0 2023-11-28 06:47:53,300 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510100 2023-11-28 06:47:56,491 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5100, loss[loss=0.05091, simple_loss=0.06412, pruned_loss=0.009788, audio_tagging_loss=0.009057, over 16284.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08781, pruned_loss=0.01201, audio_tagging_loss=0.008661, over 3039578.78 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:48:00,389 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 8.708e+01 9.390e+01 1.019e+02 1.146e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 06:48:02,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3400673.3333333335, ans=0.125 2023-11-28 06:48:12,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3400740.0, ans=0.1 2023-11-28 06:48:13,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3400740.0, ans=0.125 2023-11-28 06:48:50,997 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510150 2023-11-28 06:48:54,192 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5150, loss[loss=0.08021, simple_loss=0.1186, pruned_loss=0.0157, audio_tagging_loss=0.005237, over 15506.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08859, pruned_loss=0.01208, audio_tagging_loss=0.008627, over 3038249.22 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:48:56,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3401006.6666666665, ans=0.0 2023-11-28 06:49:14,697 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:49:39,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3401273.3333333335, ans=0.2 2023-11-28 06:49:48,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3401273.3333333335, ans=0.2 2023-11-28 06:49:49,098 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510200 2023-11-28 06:49:52,706 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5200, loss[loss=0.07174, simple_loss=0.1059, pruned_loss=0.01163, audio_tagging_loss=0.007185, over 15048.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08976, pruned_loss=0.01219, audio_tagging_loss=0.008586, over 3043606.08 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:49:56,610 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 8.684e+01 9.283e+01 1.002e+02 1.274e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-28 06:50:01,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3401340.0, ans=0.2 2023-11-28 06:50:43,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3401606.6666666665, ans=0.0 2023-11-28 06:50:47,135 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510250 2023-11-28 06:50:50,398 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5250, loss[loss=0.07885, simple_loss=0.1071, pruned_loss=0.01619, audio_tagging_loss=0.009091, over 14852.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08987, pruned_loss=0.01221, audio_tagging_loss=0.008533, over 3034698.70 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:51:05,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3401740.0, ans=0.0 2023-11-28 06:51:07,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3401740.0, ans=0.125 2023-11-28 06:51:07,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3401740.0, ans=0.125 2023-11-28 06:51:09,534 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.62 vs. limit=15.0 2023-11-28 06:51:11,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3401740.0, ans=0.07 2023-11-28 06:51:17,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3401806.6666666665, ans=0.125 2023-11-28 06:51:27,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3401873.3333333335, ans=0.1 2023-11-28 06:51:29,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3401873.3333333335, ans=0.125 2023-11-28 06:51:44,515 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510300 2023-11-28 06:51:47,662 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5300, loss[loss=0.05941, simple_loss=0.07661, pruned_loss=0.01064, audio_tagging_loss=0.01046, over 14862.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09012, pruned_loss=0.0122, audio_tagging_loss=0.008595, over 3035094.51 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:51:48,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3402006.6666666665, ans=0.1 2023-11-28 06:51:50,942 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.521e+01 8.883e+01 9.472e+01 1.016e+02 1.198e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 06:52:09,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3402140.0, ans=0.125 2023-11-28 06:52:11,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3402140.0, ans=0.125 2023-11-28 06:52:14,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3402140.0, ans=0.1 2023-11-28 06:52:39,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3402273.3333333335, ans=0.5 2023-11-28 06:52:42,466 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510350 2023-11-28 06:52:45,621 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5350, loss[loss=0.08431, simple_loss=0.1136, pruned_loss=0.01942, audio_tagging_loss=0.008098, over 14916.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09028, pruned_loss=0.01218, audio_tagging_loss=0.008535, over 3036895.69 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:52:58,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3402406.6666666665, ans=0.0 2023-11-28 06:52:59,522 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:53:01,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3402406.6666666665, ans=0.0 2023-11-28 06:53:22,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3402540.0, ans=0.125 2023-11-28 06:53:34,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3402606.6666666665, ans=0.07 2023-11-28 06:53:39,052 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510400 2023-11-28 06:53:40,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3402606.6666666665, ans=0.125 2023-11-28 06:53:42,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3402673.3333333335, ans=0.0 2023-11-28 06:53:43,129 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5400, loss[loss=0.07426, simple_loss=0.1041, pruned_loss=0.01302, audio_tagging_loss=0.00921, over 14797.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09058, pruned_loss=0.01236, audio_tagging_loss=0.008613, over 3031808.03 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:53:43,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3402673.3333333335, ans=0.0 2023-11-28 06:53:47,394 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.415e+01 8.616e+01 9.187e+01 1.017e+02 1.243e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-28 06:53:55,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3402740.0, ans=0.125 2023-11-28 06:54:24,395 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2023-11-28 06:54:30,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3402940.0, ans=0.125 2023-11-28 06:54:37,155 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510450 2023-11-28 06:54:40,385 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5450, loss[loss=0.07883, simple_loss=0.1142, pruned_loss=0.01604, audio_tagging_loss=0.005667, over 15811.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09053, pruned_loss=0.01251, audio_tagging_loss=0.008645, over 3032646.51 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:54:54,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3403073.3333333335, ans=0.125 2023-11-28 06:55:05,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3403140.0, ans=0.1 2023-11-28 06:55:30,596 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.73 vs. limit=15.0 2023-11-28 06:55:34,789 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510500 2023-11-28 06:55:38,009 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5500, loss[loss=0.0729, simple_loss=0.0979, pruned_loss=0.01432, audio_tagging_loss=0.009634, over 14395.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09053, pruned_loss=0.01253, audio_tagging_loss=0.008646, over 3043022.31 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:55:41,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3403340.0, ans=0.0 2023-11-28 06:55:42,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.928e+01 9.472e+01 1.024e+02 1.464e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 06:55:43,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3403340.0, ans=0.125 2023-11-28 06:55:54,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3403406.6666666665, ans=0.125 2023-11-28 06:56:18,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3403540.0, ans=0.1 2023-11-28 06:56:26,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3403606.6666666665, ans=0.95 2023-11-28 06:56:31,240 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2023-11-28 06:56:31,785 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510550 2023-11-28 06:56:33,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3403673.3333333335, ans=0.125 2023-11-28 06:56:34,968 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5550, loss[loss=0.05305, simple_loss=0.06365, pruned_loss=0.008788, audio_tagging_loss=0.01244, over 16154.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08996, pruned_loss=0.0124, audio_tagging_loss=0.008789, over 3038559.72 frames. ], batch size: 64, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:56:39,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3403673.3333333335, ans=0.0 2023-11-28 06:56:54,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3403740.0, ans=0.0 2023-11-28 06:57:02,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3403806.6666666665, ans=0.0 2023-11-28 06:57:07,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3403806.6666666665, ans=0.07 2023-11-28 06:57:12,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3403873.3333333335, ans=0.0 2023-11-28 06:57:13,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3403873.3333333335, ans=0.125 2023-11-28 06:57:13,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3403873.3333333335, ans=0.125 2023-11-28 06:57:16,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3403873.3333333335, ans=0.125 2023-11-28 06:57:26,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3403940.0, ans=0.0 2023-11-28 06:57:29,866 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510600 2023-11-28 06:57:33,360 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5600, loss[loss=0.06158, simple_loss=0.08294, pruned_loss=0.009848, audio_tagging_loss=0.01026, over 15806.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09005, pruned_loss=0.01235, audio_tagging_loss=0.008878, over 3047508.03 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 06:57:33,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2023-11-28 06:57:37,616 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 9.053e+01 9.702e+01 1.068e+02 1.336e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-28 06:57:41,124 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 06:57:58,061 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-28 06:58:17,464 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 06:58:18,116 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.82 vs. limit=10.0 2023-11-28 06:58:27,119 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510650 2023-11-28 06:58:27,690 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2023-11-28 06:58:30,284 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5650, loss[loss=0.05637, simple_loss=0.07409, pruned_loss=0.007449, audio_tagging_loss=0.01187, over 17045.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08964, pruned_loss=0.01232, audio_tagging_loss=0.009036, over 3046732.95 frames. ], batch size: 65, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:58:34,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3404340.0, ans=0.125 2023-11-28 06:58:47,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3404406.6666666665, ans=0.125 2023-11-28 06:58:49,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3404406.6666666665, ans=0.1 2023-11-28 06:58:54,528 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2023-11-28 06:58:59,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3404473.3333333335, ans=0.0 2023-11-28 06:59:08,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3404540.0, ans=0.125 2023-11-28 06:59:15,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3404606.6666666665, ans=0.1 2023-11-28 06:59:16,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3404606.6666666665, ans=0.0 2023-11-28 06:59:16,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3404606.6666666665, ans=0.1 2023-11-28 06:59:24,101 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510700 2023-11-28 06:59:27,442 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5700, loss[loss=0.07186, simple_loss=0.08585, pruned_loss=0.01758, audio_tagging_loss=0.01135, over 15067.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09019, pruned_loss=0.01239, audio_tagging_loss=0.008927, over 3048505.60 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 06:59:32,862 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 8.734e+01 9.325e+01 1.007e+02 1.153e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 06:59:48,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3404740.0, ans=0.0 2023-11-28 06:59:54,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3404806.6666666665, ans=0.05 2023-11-28 06:59:54,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3404806.6666666665, ans=0.125 2023-11-28 07:00:03,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3404873.3333333335, ans=0.125 2023-11-28 07:00:09,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3404873.3333333335, ans=0.1 2023-11-28 07:00:21,515 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510750 2023-11-28 07:00:24,727 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5750, loss[loss=0.05787, simple_loss=0.06715, pruned_loss=0.009775, audio_tagging_loss=0.01452, over 14720.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09006, pruned_loss=0.01243, audio_tagging_loss=0.008879, over 3049688.16 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:00:33,423 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=22.5 2023-11-28 07:00:41,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3405073.3333333335, ans=0.125 2023-11-28 07:00:43,439 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.91 vs. limit=12.0 2023-11-28 07:01:06,837 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2023-11-28 07:01:12,838 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.34 vs. limit=22.5 2023-11-28 07:01:19,149 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510800 2023-11-28 07:01:22,732 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5800, loss[loss=0.0549, simple_loss=0.08549, pruned_loss=0.004203, audio_tagging_loss=0.007951, over 14777.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08963, pruned_loss=0.0123, audio_tagging_loss=0.00877, over 3044259.44 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:01:26,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=3405340.0, ans=0.2 2023-11-28 07:01:26,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3405340.0, ans=0.04949747468305833 2023-11-28 07:01:28,130 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.777e+01 9.348e+01 1.032e+02 1.624e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 07:01:43,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3405473.3333333335, ans=0.125 2023-11-28 07:01:53,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3405473.3333333335, ans=0.125 2023-11-28 07:01:56,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3405540.0, ans=0.125 2023-11-28 07:02:15,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3405606.6666666665, ans=0.2 2023-11-28 07:02:16,398 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510850 2023-11-28 07:02:18,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3405673.3333333335, ans=0.125 2023-11-28 07:02:19,677 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5850, loss[loss=0.08182, simple_loss=0.1194, pruned_loss=0.01505, audio_tagging_loss=0.007074, over 16439.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.0896, pruned_loss=0.0123, audio_tagging_loss=0.00872, over 3052870.88 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:03:13,178 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510900 2023-11-28 07:03:16,957 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5900, loss[loss=0.05954, simple_loss=0.07373, pruned_loss=0.01284, audio_tagging_loss=0.009843, over 15190.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09016, pruned_loss=0.01241, audio_tagging_loss=0.008673, over 3049866.87 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:03:22,381 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.089e+01 8.808e+01 9.419e+01 9.961e+01 1.259e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 07:03:23,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3406006.6666666665, ans=0.0 2023-11-28 07:03:30,764 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-11-28 07:03:47,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3406140.0, ans=0.2 2023-11-28 07:03:58,268 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:04:09,093 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-28 07:04:11,260 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 510950 2023-11-28 07:04:11,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3406273.3333333335, ans=22.5 2023-11-28 07:04:14,894 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 5950, loss[loss=0.05461, simple_loss=0.07045, pruned_loss=0.009292, audio_tagging_loss=0.01009, over 15875.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09045, pruned_loss=0.01237, audio_tagging_loss=0.00862, over 3050616.07 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:04:15,295 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=12.0 2023-11-28 07:04:17,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3406340.0, ans=0.125 2023-11-28 07:04:44,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3406473.3333333335, ans=0.0 2023-11-28 07:04:45,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3406473.3333333335, ans=0.1 2023-11-28 07:05:05,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=12.0 2023-11-28 07:05:09,076 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511000 2023-11-28 07:05:09,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3406606.6666666665, ans=0.2 2023-11-28 07:05:12,610 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6000, loss[loss=0.08262, simple_loss=0.1211, pruned_loss=0.01599, audio_tagging_loss=0.006068, over 16025.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.0905, pruned_loss=0.01251, audio_tagging_loss=0.008561, over 3050921.55 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:05:12,611 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 07:05:47,616 INFO [train_asr.py:1267] (2/4) Epoch 43, validation: loss=0.0577, simple_loss=0.05058, pruned_loss=0.005244, audio_tagging_loss=0.02717, over 4681554.00 frames. 2023-11-28 07:05:47,617 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 07:05:53,014 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 8.750e+01 9.275e+01 1.001e+02 1.273e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-28 07:05:54,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3406673.3333333335, ans=0.125 2023-11-28 07:06:06,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3406740.0, ans=0.125 2023-11-28 07:06:22,421 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.27 vs. limit=22.5 2023-11-28 07:06:32,431 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 07:06:40,591 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2023-11-28 07:06:41,901 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511050 2023-11-28 07:06:45,675 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6050, loss[loss=0.08025, simple_loss=0.1082, pruned_loss=0.01696, audio_tagging_loss=0.009207, over 15559.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08971, pruned_loss=0.01249, audio_tagging_loss=0.008494, over 3047214.31 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:06:49,470 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.18 vs. limit=22.5 2023-11-28 07:06:51,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3407006.6666666665, ans=0.1 2023-11-28 07:07:05,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3407073.3333333335, ans=0.015 2023-11-28 07:07:13,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3407140.0, ans=0.1 2023-11-28 07:07:25,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3407206.6666666665, ans=0.125 2023-11-28 07:07:39,128 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511100 2023-11-28 07:07:42,394 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6100, loss[loss=0.04558, simple_loss=0.06651, pruned_loss=0.005266, audio_tagging_loss=0.007057, over 14849.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08921, pruned_loss=0.01245, audio_tagging_loss=0.008554, over 3040303.69 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:07:47,817 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.837e+01 9.364e+01 1.005e+02 1.238e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 07:08:04,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3407473.3333333335, ans=0.125 2023-11-28 07:08:28,181 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-11-28 07:08:30,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3407606.6666666665, ans=0.1 2023-11-28 07:08:35,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3407606.6666666665, ans=0.0 2023-11-28 07:08:36,558 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511150 2023-11-28 07:08:39,903 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6150, loss[loss=0.07189, simple_loss=0.09533, pruned_loss=0.01332, audio_tagging_loss=0.0109, over 15451.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08982, pruned_loss=0.01247, audio_tagging_loss=0.008517, over 3041222.06 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:08:46,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3407673.3333333335, ans=0.95 2023-11-28 07:08:49,743 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.39 vs. limit=15.0 2023-11-28 07:09:13,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3407873.3333333335, ans=0.1 2023-11-28 07:09:13,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3407873.3333333335, ans=0.125 2023-11-28 07:09:33,515 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511200 2023-11-28 07:09:37,605 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6200, loss[loss=0.04408, simple_loss=0.05159, pruned_loss=0.008302, audio_tagging_loss=0.009983, over 15134.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08869, pruned_loss=0.01213, audio_tagging_loss=0.008619, over 3041006.03 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:09:43,628 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.472e+01 8.632e+01 9.387e+01 1.018e+02 1.235e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 07:09:51,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3408073.3333333335, ans=0.05 2023-11-28 07:10:01,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3408140.0, ans=0.0 2023-11-28 07:10:11,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3408206.6666666665, ans=0.1 2023-11-28 07:10:14,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3408206.6666666665, ans=0.125 2023-11-28 07:10:19,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3408206.6666666665, ans=0.1 2023-11-28 07:10:27,955 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.90 vs. limit=12.0 2023-11-28 07:10:31,624 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511250 2023-11-28 07:10:34,779 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6250, loss[loss=0.06609, simple_loss=0.08661, pruned_loss=0.01201, audio_tagging_loss=0.01077, over 14114.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08925, pruned_loss=0.01224, audio_tagging_loss=0.008682, over 3035207.74 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:10:37,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3408340.0, ans=0.125 2023-11-28 07:11:03,630 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.42 vs. limit=22.5 2023-11-28 07:11:06,735 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.21 vs. limit=15.0 2023-11-28 07:11:28,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3408606.6666666665, ans=0.0 2023-11-28 07:11:28,899 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511300 2023-11-28 07:11:31,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3408673.3333333335, ans=0.0 2023-11-28 07:11:32,052 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6300, loss[loss=0.07591, simple_loss=0.1056, pruned_loss=0.01634, audio_tagging_loss=0.006775, over 15112.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09002, pruned_loss=0.01241, audio_tagging_loss=0.008689, over 3037796.43 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:11:38,159 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.880e+01 9.504e+01 1.024e+02 1.327e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 07:11:41,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3408673.3333333335, ans=0.0 2023-11-28 07:12:09,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3408873.3333333335, ans=0.125 2023-11-28 07:12:14,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3408873.3333333335, ans=0.125 2023-11-28 07:12:15,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2023-11-28 07:12:26,503 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511350 2023-11-28 07:12:29,720 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6350, loss[loss=0.04543, simple_loss=0.06096, pruned_loss=0.006039, audio_tagging_loss=0.00891, over 14786.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08928, pruned_loss=0.01229, audio_tagging_loss=0.00883, over 3037955.43 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:12:56,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3409140.0, ans=0.09899494936611666 2023-11-28 07:13:24,507 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511400 2023-11-28 07:13:28,600 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6400, loss[loss=0.06467, simple_loss=0.08683, pruned_loss=0.0129, audio_tagging_loss=0.008358, over 15227.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.0894, pruned_loss=0.01236, audio_tagging_loss=0.008935, over 3039561.15 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:13:35,201 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.831e+01 9.327e+01 9.903e+01 1.480e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 07:13:46,769 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2023-11-28 07:13:51,631 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.05 vs. limit=15.0 2023-11-28 07:13:52,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3409473.3333333335, ans=0.2 2023-11-28 07:14:04,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3409540.0, ans=0.1 2023-11-28 07:14:21,599 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511450 2023-11-28 07:14:24,818 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6450, loss[loss=0.06386, simple_loss=0.08478, pruned_loss=0.01258, audio_tagging_loss=0.008889, over 16598.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08892, pruned_loss=0.01223, audio_tagging_loss=0.008994, over 3039311.15 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:15:01,026 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=15.0 2023-11-28 07:15:18,582 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511500 2023-11-28 07:15:21,726 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6500, loss[loss=0.07235, simple_loss=0.09774, pruned_loss=0.0154, audio_tagging_loss=0.008076, over 14787.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08837, pruned_loss=0.01203, audio_tagging_loss=0.008966, over 3035963.58 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:15:28,776 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.593e+01 8.791e+01 9.611e+01 1.014e+02 1.471e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 07:15:32,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3410073.3333333335, ans=0.1 2023-11-28 07:15:35,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=3410073.3333333335, ans=0.2 2023-11-28 07:15:42,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3410073.3333333335, ans=0.1 2023-11-28 07:15:57,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3410206.6666666665, ans=0.125 2023-11-28 07:15:59,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3410206.6666666665, ans=0.125 2023-11-28 07:16:13,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3410273.3333333335, ans=0.0 2023-11-28 07:16:14,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3410273.3333333335, ans=0.125 2023-11-28 07:16:15,964 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511550 2023-11-28 07:16:19,207 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6550, loss[loss=0.06381, simple_loss=0.1009, pruned_loss=0.007897, audio_tagging_loss=0.005449, over 14169.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08884, pruned_loss=0.01213, audio_tagging_loss=0.008799, over 3036432.44 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:16:31,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3410406.6666666665, ans=0.07 2023-11-28 07:16:34,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3410406.6666666665, ans=0.0 2023-11-28 07:16:41,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3410473.3333333335, ans=0.0 2023-11-28 07:16:45,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3410473.3333333335, ans=0.125 2023-11-28 07:16:46,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3410473.3333333335, ans=0.125 2023-11-28 07:16:48,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3410473.3333333335, ans=0.1 2023-11-28 07:16:48,579 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2023-11-28 07:16:56,653 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.15 vs. limit=15.0 2023-11-28 07:17:06,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3410606.6666666665, ans=0.125 2023-11-28 07:17:08,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3410606.6666666665, ans=0.2 2023-11-28 07:17:12,740 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511600 2023-11-28 07:17:16,224 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6600, loss[loss=0.05891, simple_loss=0.08081, pruned_loss=0.008854, audio_tagging_loss=0.009645, over 14492.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08886, pruned_loss=0.0121, audio_tagging_loss=0.008826, over 3034561.53 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:17:16,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3410673.3333333335, ans=0.125 2023-11-28 07:17:24,287 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.209e+01 8.683e+01 9.479e+01 1.018e+02 1.462e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 07:17:38,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3410806.6666666665, ans=0.125 2023-11-28 07:18:09,942 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511650 2023-11-28 07:18:13,101 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6650, loss[loss=0.05254, simple_loss=0.07395, pruned_loss=0.008948, audio_tagging_loss=0.006616, over 14663.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08917, pruned_loss=0.01228, audio_tagging_loss=0.008714, over 3038375.61 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:18:14,641 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2023-11-28 07:18:16,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3411006.6666666665, ans=0.0 2023-11-28 07:18:16,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3411006.6666666665, ans=0.0 2023-11-28 07:18:36,132 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:18:51,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3411206.6666666665, ans=0.07 2023-11-28 07:18:51,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3411206.6666666665, ans=0.125 2023-11-28 07:18:55,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3411206.6666666665, ans=0.2 2023-11-28 07:19:07,114 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511700 2023-11-28 07:19:07,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3411273.3333333335, ans=0.0 2023-11-28 07:19:10,394 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6700, loss[loss=0.0695, simple_loss=0.09749, pruned_loss=0.0151, audio_tagging_loss=0.005658, over 14336.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08977, pruned_loss=0.01246, audio_tagging_loss=0.008589, over 3043742.37 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:19:17,921 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 9.075e+01 9.531e+01 1.012e+02 1.694e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 07:19:18,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3411340.0, ans=0.125 2023-11-28 07:19:24,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3411406.6666666665, ans=0.125 2023-11-28 07:19:40,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3411473.3333333335, ans=0.2 2023-11-28 07:19:49,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3411540.0, ans=0.125 2023-11-28 07:19:52,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3411540.0, ans=0.0 2023-11-28 07:19:55,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3411606.6666666665, ans=0.0 2023-11-28 07:19:56,668 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=12.0 2023-11-28 07:20:03,881 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511750 2023-11-28 07:20:04,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3411606.6666666665, ans=0.2 2023-11-28 07:20:07,117 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6750, loss[loss=0.07126, simple_loss=0.09796, pruned_loss=0.0141, audio_tagging_loss=0.008176, over 15604.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.0906, pruned_loss=0.01259, audio_tagging_loss=0.008528, over 3044168.34 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:20:09,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3411673.3333333335, ans=0.2 2023-11-28 07:20:21,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3411740.0, ans=0.125 2023-11-28 07:20:35,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3411806.6666666665, ans=0.2 2023-11-28 07:20:45,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3411873.3333333335, ans=0.07 2023-11-28 07:20:57,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3411940.0, ans=0.09899494936611666 2023-11-28 07:21:00,796 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511800 2023-11-28 07:21:04,736 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6800, loss[loss=0.05456, simple_loss=0.06566, pruned_loss=0.009243, audio_tagging_loss=0.01249, over 13815.00 frames. ], tot_loss[loss=0.06704, simple_loss=0.09145, pruned_loss=0.01285, audio_tagging_loss=0.008463, over 3047577.56 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:21:11,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3412006.6666666665, ans=0.0 2023-11-28 07:21:12,432 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 8.904e+01 9.309e+01 9.890e+01 1.281e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-28 07:21:12,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3412006.6666666665, ans=0.0 2023-11-28 07:21:20,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3412073.3333333335, ans=0.0 2023-11-28 07:21:28,289 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2023-11-28 07:21:28,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3412140.0, ans=0.125 2023-11-28 07:21:55,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3412273.3333333335, ans=0.1 2023-11-28 07:21:58,880 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511850 2023-11-28 07:22:02,564 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6850, loss[loss=0.0709, simple_loss=0.1065, pruned_loss=0.01186, audio_tagging_loss=0.005815, over 14978.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.09076, pruned_loss=0.01268, audio_tagging_loss=0.008496, over 3043474.75 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:22:02,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3412340.0, ans=0.125 2023-11-28 07:22:10,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3412340.0, ans=0.125 2023-11-28 07:22:47,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3412606.6666666665, ans=0.0 2023-11-28 07:22:50,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3412606.6666666665, ans=0.125 2023-11-28 07:22:56,259 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511900 2023-11-28 07:22:57,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3412606.6666666665, ans=0.125 2023-11-28 07:22:59,472 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6900, loss[loss=0.05498, simple_loss=0.07699, pruned_loss=0.005683, audio_tagging_loss=0.0108, over 14415.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.0905, pruned_loss=0.01255, audio_tagging_loss=0.008471, over 3037166.75 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:23:07,193 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.647e+01 8.771e+01 9.385e+01 1.023e+02 1.493e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 07:23:48,836 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 07:23:53,299 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 511950 2023-11-28 07:23:55,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3413006.6666666665, ans=0.125 2023-11-28 07:23:57,047 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 6950, loss[loss=0.04833, simple_loss=0.06542, pruned_loss=0.007369, audio_tagging_loss=0.008256, over 14699.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.0906, pruned_loss=0.01233, audio_tagging_loss=0.008513, over 3041197.05 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 07:24:08,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3413073.3333333335, ans=0.1 2023-11-28 07:24:18,567 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.59 vs. limit=15.0 2023-11-28 07:24:25,282 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.36 vs. limit=15.0 2023-11-28 07:24:39,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3413206.6666666665, ans=0.1 2023-11-28 07:24:50,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3413273.3333333335, ans=0.0 2023-11-28 07:24:51,242 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512000 2023-11-28 07:24:57,469 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7000, loss[loss=0.0596, simple_loss=0.07961, pruned_loss=0.01142, audio_tagging_loss=0.008373, over 16392.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09007, pruned_loss=0.0124, audio_tagging_loss=0.008555, over 3042596.40 frames. ], batch size: 63, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:25:06,154 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.579e+01 9.421e+01 1.029e+02 1.258e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 07:25:25,526 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.72 vs. limit=15.0 2023-11-28 07:25:26,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3413473.3333333335, ans=0.2 2023-11-28 07:25:27,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3413473.3333333335, ans=0.0 2023-11-28 07:25:50,603 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512050 2023-11-28 07:25:53,833 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7050, loss[loss=0.06971, simple_loss=0.09864, pruned_loss=0.01066, audio_tagging_loss=0.009729, over 16012.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08987, pruned_loss=0.01235, audio_tagging_loss=0.008617, over 3053697.53 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:26:01,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3413673.3333333335, ans=0.1 2023-11-28 07:26:21,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3413806.6666666665, ans=0.2 2023-11-28 07:26:21,845 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.34 vs. limit=22.5 2023-11-28 07:26:34,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3413873.3333333335, ans=0.125 2023-11-28 07:26:35,073 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:26:36,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3413873.3333333335, ans=0.125 2023-11-28 07:26:37,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3413873.3333333335, ans=0.125 2023-11-28 07:26:46,967 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512100 2023-11-28 07:26:50,153 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7100, loss[loss=0.06824, simple_loss=0.08599, pruned_loss=0.01579, audio_tagging_loss=0.009455, over 15578.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08931, pruned_loss=0.01231, audio_tagging_loss=0.008758, over 3045791.10 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:26:57,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3414006.6666666665, ans=0.125 2023-11-28 07:27:01,321 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 9.094e+01 9.538e+01 1.011e+02 1.389e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 07:27:09,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=3414073.3333333335, ans=0.02 2023-11-28 07:27:34,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3414273.3333333335, ans=0.125 2023-11-28 07:27:41,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3414273.3333333335, ans=0.125 2023-11-28 07:27:44,910 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512150 2023-11-28 07:27:48,118 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7150, loss[loss=0.04797, simple_loss=0.06808, pruned_loss=0.007047, audio_tagging_loss=0.006886, over 15232.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08923, pruned_loss=0.0123, audio_tagging_loss=0.008796, over 3046769.79 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 4.0 2023-11-28 07:28:13,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3414473.3333333335, ans=0.125 2023-11-28 07:28:15,395 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=15.0 2023-11-28 07:28:24,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3414540.0, ans=10.0 2023-11-28 07:28:30,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3414540.0, ans=0.2 2023-11-28 07:28:35,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3414606.6666666665, ans=0.125 2023-11-28 07:28:41,963 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512200 2023-11-28 07:28:44,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3414673.3333333335, ans=0.125 2023-11-28 07:28:45,578 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7200, loss[loss=0.06767, simple_loss=0.08344, pruned_loss=0.01546, audio_tagging_loss=0.0105, over 16192.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08869, pruned_loss=0.01222, audio_tagging_loss=0.008982, over 3047864.98 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:28:50,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3414673.3333333335, ans=0.0 2023-11-28 07:28:52,491 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=22.5 2023-11-28 07:28:53,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3414673.3333333335, ans=0.5 2023-11-28 07:28:56,392 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.923e+01 8.861e+01 9.668e+01 1.042e+02 2.032e+02, threshold=1.934e+02, percent-clipped=1.0 2023-11-28 07:28:57,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3414740.0, ans=0.2 2023-11-28 07:29:00,523 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.44 vs. limit=10.0 2023-11-28 07:29:07,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3414806.6666666665, ans=0.125 2023-11-28 07:29:15,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3414806.6666666665, ans=0.125 2023-11-28 07:29:18,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3414806.6666666665, ans=0.125 2023-11-28 07:29:26,058 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.40 vs. limit=15.0 2023-11-28 07:29:31,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.76 vs. limit=15.0 2023-11-28 07:29:34,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3414940.0, ans=0.0 2023-11-28 07:29:38,903 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512250 2023-11-28 07:29:42,141 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7250, loss[loss=0.04366, simple_loss=0.0537, pruned_loss=0.006493, audio_tagging_loss=0.01032, over 14887.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08859, pruned_loss=0.01215, audio_tagging_loss=0.009051, over 3046986.78 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:29:45,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3415006.6666666665, ans=0.0 2023-11-28 07:29:58,064 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.12 vs. limit=22.5 2023-11-28 07:30:04,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3415140.0, ans=10.0 2023-11-28 07:30:12,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3415140.0, ans=0.0 2023-11-28 07:30:14,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3415140.0, ans=0.125 2023-11-28 07:30:18,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3415206.6666666665, ans=0.0 2023-11-28 07:30:26,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3415273.3333333335, ans=0.0 2023-11-28 07:30:32,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3415273.3333333335, ans=0.2 2023-11-28 07:30:36,102 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512300 2023-11-28 07:30:39,336 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7300, loss[loss=0.05463, simple_loss=0.06933, pruned_loss=0.009683, audio_tagging_loss=0.01027, over 15894.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08821, pruned_loss=0.01207, audio_tagging_loss=0.009038, over 3048582.48 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:30:51,222 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 8.861e+01 9.411e+01 1.033e+02 1.259e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 07:30:53,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3415406.6666666665, ans=0.125 2023-11-28 07:31:14,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3415540.0, ans=0.1 2023-11-28 07:31:18,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3415540.0, ans=0.125 2023-11-28 07:31:27,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3415606.6666666665, ans=0.0 2023-11-28 07:31:32,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3415606.6666666665, ans=0.5 2023-11-28 07:31:33,715 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512350 2023-11-28 07:31:37,001 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7350, loss[loss=0.06177, simple_loss=0.09032, pruned_loss=0.009516, audio_tagging_loss=0.007096, over 15163.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08852, pruned_loss=0.01203, audio_tagging_loss=0.008825, over 3051689.28 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:32:04,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3415806.6666666665, ans=0.0 2023-11-28 07:32:25,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3415940.0, ans=0.0 2023-11-28 07:32:29,965 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512400 2023-11-28 07:32:30,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3415940.0, ans=0.125 2023-11-28 07:32:33,363 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7400, loss[loss=0.04987, simple_loss=0.06676, pruned_loss=0.006607, audio_tagging_loss=0.009881, over 15045.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08884, pruned_loss=0.01219, audio_tagging_loss=0.008689, over 3047584.99 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:32:35,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3416006.6666666665, ans=0.125 2023-11-28 07:32:44,953 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.822e+01 9.327e+01 1.016e+02 1.231e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 07:33:27,683 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512450 2023-11-28 07:33:30,879 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7450, loss[loss=0.06064, simple_loss=0.08294, pruned_loss=0.01037, audio_tagging_loss=0.008792, over 16275.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08903, pruned_loss=0.01215, audio_tagging_loss=0.008537, over 3040797.29 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:33:31,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3416340.0, ans=0.0 2023-11-28 07:33:52,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3416406.6666666665, ans=0.0 2023-11-28 07:34:16,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3416606.6666666665, ans=0.025 2023-11-28 07:34:26,002 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512500 2023-11-28 07:34:28,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3416673.3333333335, ans=0.125 2023-11-28 07:34:28,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3416673.3333333335, ans=0.125 2023-11-28 07:34:29,281 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7500, loss[loss=0.05544, simple_loss=0.07899, pruned_loss=0.00955, audio_tagging_loss=0.006393, over 14645.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.0896, pruned_loss=0.01231, audio_tagging_loss=0.008527, over 3045891.53 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:34:29,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3416673.3333333335, ans=10.0 2023-11-28 07:34:40,252 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.775e+01 9.275e+01 9.988e+01 1.436e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-28 07:35:12,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3416873.3333333335, ans=0.1 2023-11-28 07:35:22,810 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512550 2023-11-28 07:35:26,332 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7550, loss[loss=0.05292, simple_loss=0.06878, pruned_loss=0.00885, audio_tagging_loss=0.009677, over 14550.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08913, pruned_loss=0.01228, audio_tagging_loss=0.008469, over 3042209.91 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:35:26,944 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2023-11-28 07:35:44,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3417073.3333333335, ans=0.2 2023-11-28 07:36:20,891 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512600 2023-11-28 07:36:25,103 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7600, loss[loss=0.05889, simple_loss=0.0824, pruned_loss=0.007751, audio_tagging_loss=0.009936, over 16005.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08922, pruned_loss=0.01226, audio_tagging_loss=0.008506, over 3043763.28 frames. ], batch size: 62, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:36:36,985 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.736e+01 9.227e+01 9.964e+01 1.331e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-28 07:36:38,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3417406.6666666665, ans=0.125 2023-11-28 07:36:54,763 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2023-11-28 07:37:13,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3417606.6666666665, ans=0.125 2023-11-28 07:37:20,211 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512650 2023-11-28 07:37:22,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3417673.3333333335, ans=0.125 2023-11-28 07:37:23,480 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7650, loss[loss=0.06101, simple_loss=0.08086, pruned_loss=0.009391, audio_tagging_loss=0.01119, over 14313.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08869, pruned_loss=0.01211, audio_tagging_loss=0.008586, over 3040653.46 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:37:32,622 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-28 07:37:39,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3417740.0, ans=0.125 2023-11-28 07:37:49,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3417806.6666666665, ans=0.09899494936611666 2023-11-28 07:37:56,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3417806.6666666665, ans=0.1 2023-11-28 07:37:58,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3417873.3333333335, ans=0.1 2023-11-28 07:38:01,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3417873.3333333335, ans=0.125 2023-11-28 07:38:09,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3417940.0, ans=0.125 2023-11-28 07:38:18,877 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512700 2023-11-28 07:38:21,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3418006.6666666665, ans=0.125 2023-11-28 07:38:22,124 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7700, loss[loss=0.07573, simple_loss=0.1038, pruned_loss=0.01328, audio_tagging_loss=0.01056, over 14501.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08841, pruned_loss=0.01206, audio_tagging_loss=0.008678, over 3038245.32 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:38:34,285 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 8.901e+01 9.400e+01 1.006e+02 1.251e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 07:38:36,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3418073.3333333335, ans=0.0 2023-11-28 07:38:48,974 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-28 07:39:48,600 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2023-11-28 07:39:51,794 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512750 2023-11-28 07:40:09,402 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7750, loss[loss=0.06385, simple_loss=0.08572, pruned_loss=0.01292, audio_tagging_loss=0.008068, over 15127.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08883, pruned_loss=0.01204, audio_tagging_loss=0.008701, over 3043898.19 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:40:09,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3418340.0, ans=0.5 2023-11-28 07:42:13,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3418540.0, ans=0.125 2023-11-28 07:42:36,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3418540.0, ans=0.05 2023-11-28 07:42:40,732 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.65 vs. limit=15.0 2023-11-28 07:42:46,522 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.54 vs. limit=15.0 2023-11-28 07:43:25,501 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2023-11-28 07:43:32,271 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512800 2023-11-28 07:43:46,036 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7800, loss[loss=0.06314, simple_loss=0.07756, pruned_loss=0.01423, audio_tagging_loss=0.01013, over 15276.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08994, pruned_loss=0.01228, audio_tagging_loss=0.008649, over 3048207.96 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:44:02,428 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 07:44:31,024 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 8.859e+01 9.420e+01 1.032e+02 1.560e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 07:45:29,281 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2023-11-28 07:46:48,954 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512850 2023-11-28 07:46:55,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3418940.0, ans=0.5 2023-11-28 07:47:05,868 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7850, loss[loss=0.07012, simple_loss=0.1014, pruned_loss=0.008496, audio_tagging_loss=0.01094, over 15143.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09006, pruned_loss=0.01225, audio_tagging_loss=0.008713, over 3053563.07 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:49:55,736 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.16 vs. limit=22.5 2023-11-28 07:50:04,610 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.58 vs. limit=10.0 2023-11-28 07:50:19,083 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-11-28 07:50:20,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3419273.3333333335, ans=0.1 2023-11-28 07:50:22,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3419273.3333333335, ans=0.0 2023-11-28 07:50:32,227 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512900 2023-11-28 07:50:32,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3419273.3333333335, ans=0.125 2023-11-28 07:50:32,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3419273.3333333335, ans=0.0 2023-11-28 07:50:44,772 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7900, loss[loss=0.06531, simple_loss=0.09192, pruned_loss=0.01154, audio_tagging_loss=0.007805, over 16299.00 frames. ], tot_loss[loss=0.06651, simple_loss=0.09052, pruned_loss=0.01245, audio_tagging_loss=0.008805, over 3054088.14 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:51:25,724 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.969e+01 8.861e+01 9.655e+01 1.039e+02 1.530e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-28 07:51:35,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3419406.6666666665, ans=0.125 2023-11-28 07:52:30,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3419473.3333333335, ans=0.0 2023-11-28 07:52:38,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3419540.0, ans=0.0 2023-11-28 07:53:32,089 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.77 vs. limit=22.5 2023-11-28 07:53:42,624 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 512950 2023-11-28 07:53:52,966 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 7950, loss[loss=0.07645, simple_loss=0.1059, pruned_loss=0.01354, audio_tagging_loss=0.009959, over 16080.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08932, pruned_loss=0.0123, audio_tagging_loss=0.008945, over 3053795.35 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 07:54:54,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3419740.0, ans=0.125 2023-11-28 07:54:58,343 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 07:55:20,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3419806.6666666665, ans=0.2 2023-11-28 07:56:12,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3419873.3333333335, ans=0.125 2023-11-28 07:57:24,723 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513000 2023-11-28 07:57:41,007 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8000, loss[loss=0.05318, simple_loss=0.0711, pruned_loss=0.008659, audio_tagging_loss=0.008968, over 15171.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08901, pruned_loss=0.01238, audio_tagging_loss=0.009039, over 3044538.58 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 07:58:27,795 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.711e+01 9.409e+01 1.028e+02 1.220e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 07:58:58,857 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2023-11-28 08:01:07,418 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513050 2023-11-28 08:01:07,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3420273.3333333335, ans=0.2 2023-11-28 08:01:19,603 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8050, loss[loss=0.0512, simple_loss=0.06663, pruned_loss=0.00829, audio_tagging_loss=0.009596, over 15288.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08875, pruned_loss=0.01228, audio_tagging_loss=0.009054, over 3036148.23 frames. ], batch size: 61, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:01:54,861 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.33 vs. limit=15.0 2023-11-28 08:02:12,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3420406.6666666665, ans=0.1 2023-11-28 08:02:24,007 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=12.0 2023-11-28 08:02:29,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3420473.3333333335, ans=0.125 2023-11-28 08:02:40,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3420473.3333333335, ans=0.125 2023-11-28 08:02:51,606 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.44 vs. limit=15.0 2023-11-28 08:04:01,791 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513100 2023-11-28 08:04:02,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3420606.6666666665, ans=0.0 2023-11-28 08:04:11,732 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8100, loss[loss=0.04329, simple_loss=0.05487, pruned_loss=0.007624, audio_tagging_loss=0.008228, over 14025.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08852, pruned_loss=0.01217, audio_tagging_loss=0.009029, over 3038135.83 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:04:26,908 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.91 vs. limit=22.5 2023-11-28 08:04:26,930 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.15 vs. limit=10.0 2023-11-28 08:04:29,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3420673.3333333335, ans=0.2 2023-11-28 08:04:43,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3420740.0, ans=0.0 2023-11-28 08:04:51,143 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.964e+01 9.574e+01 1.024e+02 1.325e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 08:04:55,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3420740.0, ans=0.2 2023-11-28 08:06:36,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3420873.3333333335, ans=0.0 2023-11-28 08:06:49,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3420940.0, ans=15.0 2023-11-28 08:07:02,419 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513150 2023-11-28 08:07:11,766 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8150, loss[loss=0.05722, simple_loss=0.07328, pruned_loss=0.009543, audio_tagging_loss=0.01103, over 15112.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08867, pruned_loss=0.01228, audio_tagging_loss=0.008756, over 3035767.89 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:07:45,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3421073.3333333335, ans=0.0 2023-11-28 08:07:56,580 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:09:11,617 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2023-11-28 08:09:39,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3421273.3333333335, ans=0.0 2023-11-28 08:10:00,208 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513200 2023-11-28 08:10:00,658 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.41 vs. limit=15.0 2023-11-28 08:10:09,171 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8200, loss[loss=0.06586, simple_loss=0.09033, pruned_loss=0.009965, audio_tagging_loss=0.01073, over 14400.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08889, pruned_loss=0.01212, audio_tagging_loss=0.008712, over 3040556.70 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:10:12,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3421340.0, ans=0.125 2023-11-28 08:10:23,250 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 08:10:32,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3421340.0, ans=0.0 2023-11-28 08:10:48,242 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.547e+01 8.671e+01 9.315e+01 1.033e+02 1.596e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-28 08:11:34,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3421473.3333333335, ans=0.1 2023-11-28 08:12:11,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3421540.0, ans=0.125 2023-11-28 08:12:13,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3421540.0, ans=0.02 2023-11-28 08:12:53,027 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513250 2023-11-28 08:12:58,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3421606.6666666665, ans=0.1 2023-11-28 08:13:04,403 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8250, loss[loss=0.06252, simple_loss=0.09113, pruned_loss=0.01262, audio_tagging_loss=0.004324, over 14222.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08879, pruned_loss=0.01213, audio_tagging_loss=0.008591, over 3039757.91 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:14:09,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3421806.6666666665, ans=0.2 2023-11-28 08:14:11,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3421806.6666666665, ans=0.125 2023-11-28 08:14:23,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3421806.6666666665, ans=0.125 2023-11-28 08:14:32,058 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=10.0 2023-11-28 08:14:38,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3421873.3333333335, ans=0.125 2023-11-28 08:14:38,875 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2023-11-28 08:15:44,349 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=12.0 2023-11-28 08:15:54,369 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513300 2023-11-28 08:16:08,451 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8300, loss[loss=0.04696, simple_loss=0.06735, pruned_loss=0.006114, audio_tagging_loss=0.007167, over 13380.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08875, pruned_loss=0.01212, audio_tagging_loss=0.00857, over 3038161.88 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:16:49,374 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.431e+01 8.832e+01 9.492e+01 1.019e+02 1.242e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 08:17:30,889 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-28 08:17:34,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3422140.0, ans=0.0 2023-11-28 08:18:15,583 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.50 vs. limit=22.5 2023-11-28 08:18:23,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3422206.6666666665, ans=0.0 2023-11-28 08:18:57,044 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.54 vs. limit=22.5 2023-11-28 08:19:10,140 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513350 2023-11-28 08:19:14,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3422273.3333333335, ans=0.1 2023-11-28 08:19:20,922 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8350, loss[loss=0.03807, simple_loss=0.05223, pruned_loss=0.003946, audio_tagging_loss=0.008012, over 13848.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08743, pruned_loss=0.01191, audio_tagging_loss=0.008659, over 3037035.47 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:19:21,656 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=15.0 2023-11-28 08:19:39,094 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.13 vs. limit=15.0 2023-11-28 08:19:41,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3422340.0, ans=0.125 2023-11-28 08:19:58,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3422406.6666666665, ans=0.125 2023-11-28 08:19:58,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3422406.6666666665, ans=0.0 2023-11-28 08:21:43,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3422606.6666666665, ans=0.125 2023-11-28 08:21:56,003 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513400 2023-11-28 08:22:05,626 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8400, loss[loss=0.0411, simple_loss=0.0525, pruned_loss=0.00573, audio_tagging_loss=0.009119, over 16692.00 frames. ], tot_loss[loss=0.06385, simple_loss=0.08682, pruned_loss=0.0118, audio_tagging_loss=0.008635, over 3044611.24 frames. ], batch size: 65, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:22:29,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3422673.3333333335, ans=0.125 2023-11-28 08:22:35,389 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.870e+01 9.331e+01 1.011e+02 1.281e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-28 08:23:00,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3422806.6666666665, ans=22.5 2023-11-28 08:24:19,096 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513450 2023-11-28 08:24:27,023 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8450, loss[loss=0.06611, simple_loss=0.08385, pruned_loss=0.01487, audio_tagging_loss=0.009306, over 15007.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.0877, pruned_loss=0.012, audio_tagging_loss=0.008593, over 3044885.91 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:24:39,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3423006.6666666665, ans=0.1 2023-11-28 08:24:40,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3423006.6666666665, ans=0.1 2023-11-28 08:24:42,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3423006.6666666665, ans=0.125 2023-11-28 08:25:08,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3423073.3333333335, ans=0.2 2023-11-28 08:25:27,211 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.47 vs. limit=12.0 2023-11-28 08:26:05,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3423206.6666666665, ans=0.0 2023-11-28 08:26:27,689 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.97 vs. limit=15.0 2023-11-28 08:26:29,664 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513500 2023-11-28 08:26:35,506 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8500, loss[loss=0.06464, simple_loss=0.09178, pruned_loss=0.009468, audio_tagging_loss=0.009286, over 14647.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08883, pruned_loss=0.01217, audio_tagging_loss=0.008561, over 3050085.26 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:26:58,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3423340.0, ans=0.1 2023-11-28 08:27:00,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3423406.6666666665, ans=0.125 2023-11-28 08:27:06,467 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.889e+01 8.890e+01 9.437e+01 1.019e+02 2.913e+02, threshold=1.887e+02, percent-clipped=1.0 2023-11-28 08:27:15,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3423406.6666666665, ans=0.0 2023-11-28 08:28:13,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3423540.0, ans=0.07 2023-11-28 08:28:22,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3423606.6666666665, ans=0.04949747468305833 2023-11-28 08:28:36,232 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513550 2023-11-28 08:28:38,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3423606.6666666665, ans=0.2 2023-11-28 08:28:39,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3423606.6666666665, ans=0.2 2023-11-28 08:28:44,641 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8550, loss[loss=0.07906, simple_loss=0.1131, pruned_loss=0.01605, audio_tagging_loss=0.00644, over 14914.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08946, pruned_loss=0.01242, audio_tagging_loss=0.008596, over 3052103.64 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:29:16,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3423740.0, ans=0.1 2023-11-28 08:29:20,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3423740.0, ans=0.1 2023-11-28 08:29:50,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3423806.6666666665, ans=0.125 2023-11-28 08:29:56,996 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.25 vs. limit=22.5 2023-11-28 08:30:19,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3423940.0, ans=0.1 2023-11-28 08:30:30,593 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513600 2023-11-28 08:30:37,838 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8600, loss[loss=0.0693, simple_loss=0.09168, pruned_loss=0.0132, audio_tagging_loss=0.01026, over 14635.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.0893, pruned_loss=0.01232, audio_tagging_loss=0.008746, over 3046945.17 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:30:55,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3424073.3333333335, ans=0.035 2023-11-28 08:30:57,227 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.892e+01 9.588e+01 1.028e+02 1.351e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 08:31:39,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3424206.6666666665, ans=0.125 2023-11-28 08:32:09,530 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513650 2023-11-28 08:32:14,248 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8650, loss[loss=0.05988, simple_loss=0.08203, pruned_loss=0.008608, audio_tagging_loss=0.01026, over 14395.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08956, pruned_loss=0.01242, audio_tagging_loss=0.008804, over 3053205.84 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:32:42,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3424406.6666666665, ans=0.125 2023-11-28 08:33:15,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3424540.0, ans=0.1 2023-11-28 08:33:37,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3424606.6666666665, ans=0.125 2023-11-28 08:33:45,405 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513700 2023-11-28 08:33:50,715 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8700, loss[loss=0.06298, simple_loss=0.07891, pruned_loss=0.01105, audio_tagging_loss=0.01248, over 14426.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.08997, pruned_loss=0.01246, audio_tagging_loss=0.008868, over 3056317.62 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:34:04,524 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.15 vs. limit=22.5 2023-11-28 08:34:13,067 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.836e+01 9.429e+01 1.013e+02 1.223e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 08:34:38,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3424806.6666666665, ans=0.125 2023-11-28 08:34:55,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3424873.3333333335, ans=0.2 2023-11-28 08:35:11,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3424940.0, ans=0.1 2023-11-28 08:35:15,337 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513750 2023-11-28 08:35:15,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3424940.0, ans=0.2 2023-11-28 08:35:20,355 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8750, loss[loss=0.06195, simple_loss=0.08213, pruned_loss=0.01044, audio_tagging_loss=0.01045, over 14882.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.09039, pruned_loss=0.01246, audio_tagging_loss=0.00892, over 3056586.10 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:35:27,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3425006.6666666665, ans=0.2 2023-11-28 08:35:33,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3425006.6666666665, ans=0.2 2023-11-28 08:36:04,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3425140.0, ans=0.09899494936611666 2023-11-28 08:36:34,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3425273.3333333335, ans=0.1 2023-11-28 08:36:48,714 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513800 2023-11-28 08:36:54,345 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8800, loss[loss=0.05207, simple_loss=0.06394, pruned_loss=0.01152, audio_tagging_loss=0.008584, over 15696.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09123, pruned_loss=0.01243, audio_tagging_loss=0.008955, over 3056459.63 frames. ], batch size: 59, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:37:04,776 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.97 vs. limit=15.0 2023-11-28 08:37:13,365 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.671e+01 8.831e+01 9.235e+01 9.998e+01 1.254e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-28 08:37:21,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3425406.6666666665, ans=0.125 2023-11-28 08:37:26,116 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.95 vs. limit=10.0 2023-11-28 08:37:40,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3425473.3333333335, ans=0.125 2023-11-28 08:37:41,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3425540.0, ans=0.125 2023-11-28 08:38:10,618 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513850 2023-11-28 08:38:14,994 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8850, loss[loss=0.08115, simple_loss=0.1079, pruned_loss=0.01861, audio_tagging_loss=0.008573, over 14829.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09067, pruned_loss=0.01243, audio_tagging_loss=0.008986, over 3044557.60 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:38:26,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3425673.3333333335, ans=0.1 2023-11-28 08:38:28,909 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2023-11-28 08:38:36,560 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 08:38:41,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3425740.0, ans=0.125 2023-11-28 08:38:43,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3425740.0, ans=0.0 2023-11-28 08:39:13,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.99 vs. limit=10.0 2023-11-28 08:39:26,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3425940.0, ans=0.1 2023-11-28 08:39:30,134 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2023-11-28 08:39:31,310 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513900 2023-11-28 08:39:36,072 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8900, loss[loss=0.05532, simple_loss=0.07399, pruned_loss=0.008299, audio_tagging_loss=0.01002, over 14982.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09039, pruned_loss=0.01229, audio_tagging_loss=0.008861, over 3044874.76 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:39:57,373 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.695e+01 8.722e+01 9.445e+01 1.012e+02 1.187e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 08:39:59,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.84 vs. limit=22.5 2023-11-28 08:40:12,609 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2023-11-28 08:40:28,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3426206.6666666665, ans=0.1 2023-11-28 08:40:46,722 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 513950 2023-11-28 08:40:48,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3426273.3333333335, ans=0.0 2023-11-28 08:40:48,596 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-11-28 08:40:50,559 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 8950, loss[loss=0.05748, simple_loss=0.08484, pruned_loss=0.007383, audio_tagging_loss=0.007673, over 14744.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09022, pruned_loss=0.01229, audio_tagging_loss=0.008719, over 3046887.39 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:40:54,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3426340.0, ans=0.1 2023-11-28 08:41:16,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3426473.3333333335, ans=0.125 2023-11-28 08:41:18,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3426473.3333333335, ans=0.125 2023-11-28 08:41:24,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3426473.3333333335, ans=0.125 2023-11-28 08:41:26,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3426473.3333333335, ans=0.125 2023-11-28 08:41:37,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3426540.0, ans=0.0 2023-11-28 08:41:52,885 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514000 2023-11-28 08:41:57,094 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9000, loss[loss=0.0809, simple_loss=0.1217, pruned_loss=0.01364, audio_tagging_loss=0.006388, over 16107.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09084, pruned_loss=0.01245, audio_tagging_loss=0.008594, over 3058967.25 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:41:57,095 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 08:42:18,673 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1530, 4.0356, 3.7382, 3.2762], device='cuda:2') 2023-11-28 08:42:26,961 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9907, 5.8522, 5.6713, 5.5428], device='cuda:2') 2023-11-28 08:42:35,459 INFO [train_asr.py:1267] (2/4) Epoch 43, validation: loss=0.05867, simple_loss=0.05056, pruned_loss=0.005241, audio_tagging_loss=0.02815, over 4681554.00 frames. 2023-11-28 08:42:35,460 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 08:42:53,921 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 8.891e+01 9.730e+01 1.046e+02 2.169e+02, threshold=1.946e+02, percent-clipped=1.0 2023-11-28 08:43:15,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3426873.3333333335, ans=0.2 2023-11-28 08:43:20,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3426873.3333333335, ans=0.0 2023-11-28 08:43:28,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3426940.0, ans=0.125 2023-11-28 08:43:36,112 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514050 2023-11-28 08:43:37,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3426940.0, ans=0.0 2023-11-28 08:43:40,592 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9050, loss[loss=0.05579, simple_loss=0.06823, pruned_loss=0.007185, audio_tagging_loss=0.01449, over 15477.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09123, pruned_loss=0.01249, audio_tagging_loss=0.00861, over 3053958.32 frames. ], batch size: 63, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:44:12,802 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-11-28 08:44:19,336 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-11-28 08:44:30,420 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-28 08:44:39,494 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514100 2023-11-28 08:44:40,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3427273.3333333335, ans=0.0 2023-11-28 08:44:43,134 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9100, loss[loss=0.07691, simple_loss=0.1041, pruned_loss=0.01495, audio_tagging_loss=0.009891, over 15046.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09092, pruned_loss=0.01253, audio_tagging_loss=0.008553, over 3061003.31 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:44:53,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3427340.0, ans=0.1 2023-11-28 08:44:57,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3427406.6666666665, ans=0.125 2023-11-28 08:45:00,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3427406.6666666665, ans=0.1 2023-11-28 08:45:01,263 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 9.021e+01 9.381e+01 1.003e+02 1.228e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 08:45:11,442 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.72 vs. limit=10.0 2023-11-28 08:45:15,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3427473.3333333335, ans=0.95 2023-11-28 08:45:40,565 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514150 2023-11-28 08:45:44,477 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9150, loss[loss=0.07112, simple_loss=0.1008, pruned_loss=0.01352, audio_tagging_loss=0.00722, over 15674.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09038, pruned_loss=0.01246, audio_tagging_loss=0.00852, over 3057394.16 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 8.0 2023-11-28 08:45:55,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3427740.0, ans=0.2 2023-11-28 08:46:16,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3427806.6666666665, ans=0.07 2023-11-28 08:46:39,297 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514200 2023-11-28 08:46:42,864 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9200, loss[loss=0.06997, simple_loss=0.08834, pruned_loss=0.01605, audio_tagging_loss=0.009748, over 16143.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08969, pruned_loss=0.01235, audio_tagging_loss=0.008626, over 3049637.73 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:46:52,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3428006.6666666665, ans=0.125 2023-11-28 08:46:58,826 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 8.605e+01 9.339e+01 9.879e+01 1.258e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 08:47:13,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3428140.0, ans=0.125 2023-11-28 08:47:31,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3428273.3333333335, ans=0.125 2023-11-28 08:47:37,016 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514250 2023-11-28 08:47:38,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=3428273.3333333335, ans=15.0 2023-11-28 08:47:40,304 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9250, loss[loss=0.05805, simple_loss=0.08309, pruned_loss=0.008239, audio_tagging_loss=0.008268, over 15354.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09006, pruned_loss=0.01238, audio_tagging_loss=0.00857, over 3047512.75 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:47:52,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3428406.6666666665, ans=0.125 2023-11-28 08:47:59,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3428406.6666666665, ans=0.125 2023-11-28 08:48:34,868 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514300 2023-11-28 08:48:38,060 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9300, loss[loss=0.07533, simple_loss=0.1067, pruned_loss=0.01246, audio_tagging_loss=0.009529, over 14773.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.0895, pruned_loss=0.01222, audio_tagging_loss=0.00864, over 3046420.38 frames. ], batch size: 56, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:48:46,090 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.10 vs. limit=22.5 2023-11-28 08:48:54,143 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.614e+01 9.246e+01 9.788e+01 1.593e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-28 08:49:08,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3428806.6666666665, ans=0.0 2023-11-28 08:49:24,335 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:49:27,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3428940.0, ans=0.2 2023-11-28 08:49:32,167 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514350 2023-11-28 08:49:35,351 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9350, loss[loss=0.05989, simple_loss=0.07346, pruned_loss=0.01389, audio_tagging_loss=0.009267, over 16257.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08955, pruned_loss=0.01227, audio_tagging_loss=0.008616, over 3054380.43 frames. ], batch size: 60, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:49:49,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3429073.3333333335, ans=0.0 2023-11-28 08:49:49,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3429073.3333333335, ans=0.125 2023-11-28 08:49:53,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3429073.3333333335, ans=0.125 2023-11-28 08:50:09,611 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:50:17,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3429206.6666666665, ans=0.05 2023-11-28 08:50:18,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3429206.6666666665, ans=0.5 2023-11-28 08:50:28,896 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514400 2023-11-28 08:50:32,405 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9400, loss[loss=0.07292, simple_loss=0.1014, pruned_loss=0.01516, audio_tagging_loss=0.007066, over 15094.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.08936, pruned_loss=0.01235, audio_tagging_loss=0.008745, over 3053091.76 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:50:36,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=12.0 2023-11-28 08:50:38,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3429340.0, ans=0.2 2023-11-28 08:50:45,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3429406.6666666665, ans=0.125 2023-11-28 08:50:48,719 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.921e+01 9.569e+01 1.013e+02 1.910e+02, threshold=1.914e+02, percent-clipped=1.0 2023-11-28 08:50:53,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3429406.6666666665, ans=0.125 2023-11-28 08:51:09,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3429540.0, ans=0.125 2023-11-28 08:51:27,024 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514450 2023-11-28 08:51:27,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3429606.6666666665, ans=0.0 2023-11-28 08:51:30,081 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9450, loss[loss=0.08216, simple_loss=0.1135, pruned_loss=0.01644, audio_tagging_loss=0.008992, over 15154.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09024, pruned_loss=0.01251, audio_tagging_loss=0.00878, over 3053590.53 frames. ], batch size: 54, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:51:32,947 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 08:51:44,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3429740.0, ans=0.125 2023-11-28 08:51:47,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3429740.0, ans=0.1 2023-11-28 08:52:01,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3429806.6666666665, ans=0.2 2023-11-28 08:52:18,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3429940.0, ans=0.2 2023-11-28 08:52:23,967 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514500 2023-11-28 08:52:27,124 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9500, loss[loss=0.06772, simple_loss=0.09312, pruned_loss=0.0119, audio_tagging_loss=0.009262, over 15249.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09077, pruned_loss=0.01257, audio_tagging_loss=0.008758, over 3058290.02 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:52:42,224 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 9.109e+01 9.581e+01 1.028e+02 2.016e+02, threshold=1.916e+02, percent-clipped=1.0 2023-11-28 08:52:55,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3430140.0, ans=0.0 2023-11-28 08:53:04,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3430206.6666666665, ans=0.125 2023-11-28 08:53:19,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3430273.3333333335, ans=0.0 2023-11-28 08:53:20,681 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514550 2023-11-28 08:53:23,771 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9550, loss[loss=0.06805, simple_loss=0.09764, pruned_loss=0.01194, audio_tagging_loss=0.007293, over 14419.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09078, pruned_loss=0.01254, audio_tagging_loss=0.008883, over 3050587.25 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:53:23,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3430340.0, ans=0.09899494936611666 2023-11-28 08:53:26,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3430340.0, ans=0.125 2023-11-28 08:53:41,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3430406.6666666665, ans=0.0 2023-11-28 08:53:47,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3430473.3333333335, ans=0.95 2023-11-28 08:53:54,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3430473.3333333335, ans=0.05 2023-11-28 08:53:54,781 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2023-11-28 08:54:15,044 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 08:54:17,653 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514600 2023-11-28 08:54:21,545 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9600, loss[loss=0.07264, simple_loss=0.09606, pruned_loss=0.0168, audio_tagging_loss=0.00781, over 15029.00 frames. ], tot_loss[loss=0.06647, simple_loss=0.09013, pruned_loss=0.01246, audio_tagging_loss=0.008949, over 3048133.64 frames. ], batch size: 53, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:54:37,741 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.954e+01 8.927e+01 9.333e+01 1.014e+02 1.212e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-28 08:54:55,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3430873.3333333335, ans=0.125 2023-11-28 08:55:11,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3430940.0, ans=0.0 2023-11-28 08:55:15,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3430940.0, ans=15.0 2023-11-28 08:55:16,153 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514650 2023-11-28 08:55:16,843 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.83 vs. limit=10.0 2023-11-28 08:55:19,454 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9650, loss[loss=0.07723, simple_loss=0.1083, pruned_loss=0.01659, audio_tagging_loss=0.006482, over 15008.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09072, pruned_loss=0.01261, audio_tagging_loss=0.008871, over 3048676.08 frames. ], batch size: 55, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:55:21,133 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.60 vs. limit=15.0 2023-11-28 08:55:28,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3431006.6666666665, ans=0.2 2023-11-28 08:55:33,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3431073.3333333335, ans=0.0 2023-11-28 08:56:13,092 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514700 2023-11-28 08:56:16,243 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9700, loss[loss=0.09005, simple_loss=0.1242, pruned_loss=0.02073, audio_tagging_loss=0.00721, over 15828.00 frames. ], tot_loss[loss=0.06682, simple_loss=0.09103, pruned_loss=0.01269, audio_tagging_loss=0.008624, over 3053965.22 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:56:30,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3431406.6666666665, ans=0.0 2023-11-28 08:56:32,353 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.192e+01 8.860e+01 9.541e+01 1.023e+02 1.192e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 08:56:41,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3431473.3333333335, ans=0.2 2023-11-28 08:56:51,573 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=15.0 2023-11-28 08:56:53,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3431540.0, ans=0.2 2023-11-28 08:56:57,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3431540.0, ans=0.035 2023-11-28 08:57:09,649 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514750 2023-11-28 08:57:11,218 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.39 vs. limit=22.5 2023-11-28 08:57:13,583 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9750, loss[loss=0.0424, simple_loss=0.05494, pruned_loss=0.005899, audio_tagging_loss=0.009028, over 14696.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09005, pruned_loss=0.01239, audio_tagging_loss=0.008564, over 3051781.65 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 32.0 2023-11-28 08:57:13,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3431673.3333333335, ans=0.1 2023-11-28 08:57:14,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3431673.3333333335, ans=0.2 2023-11-28 08:57:23,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3431673.3333333335, ans=0.07 2023-11-28 08:57:45,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3431806.6666666665, ans=0.0 2023-11-28 08:58:07,898 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514800 2023-11-28 08:58:08,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3431940.0, ans=0.0 2023-11-28 08:58:11,283 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9800, loss[loss=0.05783, simple_loss=0.0725, pruned_loss=0.01157, audio_tagging_loss=0.01001, over 15670.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09051, pruned_loss=0.0124, audio_tagging_loss=0.008566, over 3052251.21 frames. ], batch size: 65, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:58:13,144 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.46 vs. limit=15.0 2023-11-28 08:58:16,046 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=15.0 2023-11-28 08:58:22,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3432073.3333333335, ans=0.125 2023-11-28 08:58:27,622 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 8.900e+01 9.501e+01 1.026e+02 1.176e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 08:58:57,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3432273.3333333335, ans=0.1 2023-11-28 08:59:05,106 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514850 2023-11-28 08:59:06,140 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 08:59:08,271 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9850, loss[loss=0.09231, simple_loss=0.13, pruned_loss=0.01994, audio_tagging_loss=0.007351, over 15661.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09125, pruned_loss=0.01257, audio_tagging_loss=0.008512, over 3054085.45 frames. ], batch size: 57, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 08:59:19,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3432406.6666666665, ans=0.125 2023-11-28 08:59:23,631 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.73 vs. limit=10.0 2023-11-28 08:59:27,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3432406.6666666665, ans=0.2 2023-11-28 08:59:28,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3432406.6666666665, ans=0.125 2023-11-28 08:59:37,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3432473.3333333335, ans=0.0 2023-11-28 08:59:44,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3432540.0, ans=0.125 2023-11-28 08:59:44,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3432540.0, ans=0.05 2023-11-28 08:59:49,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3432540.0, ans=0.0 2023-11-28 09:00:00,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3432606.6666666665, ans=0.125 2023-11-28 09:00:01,451 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514900 2023-11-28 09:00:04,608 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9900, loss[loss=0.0569, simple_loss=0.07424, pruned_loss=0.008438, audio_tagging_loss=0.01134, over 15128.00 frames. ], tot_loss[loss=0.0668, simple_loss=0.09129, pruned_loss=0.01259, audio_tagging_loss=0.008561, over 3049772.74 frames. ], batch size: 58, lr: 1.57e-03, grad_scale: 16.0 2023-11-28 09:00:12,318 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.62 vs. limit=12.0 2023-11-28 09:00:23,101 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 8.866e+01 9.531e+01 1.026e+02 1.362e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 09:00:35,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3432806.6666666665, ans=0.1 2023-11-28 09:00:59,728 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 514950 2023-11-28 09:00:59,966 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:01:03,287 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 9950, loss[loss=0.07308, simple_loss=0.09699, pruned_loss=0.01544, audio_tagging_loss=0.009141, over 15496.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09118, pruned_loss=0.01256, audio_tagging_loss=0.008469, over 3047334.36 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:01:06,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3433006.6666666665, ans=0.125 2023-11-28 09:01:20,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3433073.3333333335, ans=0.125 2023-11-28 09:01:27,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3433140.0, ans=0.05 2023-11-28 09:01:27,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3433140.0, ans=0.0 2023-11-28 09:01:57,345 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515000 2023-11-28 09:02:00,832 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10000, loss[loss=0.06885, simple_loss=0.1013, pruned_loss=0.01236, audio_tagging_loss=0.005843, over 15568.00 frames. ], tot_loss[loss=0.06618, simple_loss=0.09039, pruned_loss=0.01248, audio_tagging_loss=0.008512, over 3051335.80 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:02:13,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3433406.6666666665, ans=0.0 2023-11-28 09:02:15,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3433406.6666666665, ans=0.125 2023-11-28 09:02:18,951 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.546e+01 8.838e+01 9.507e+01 1.055e+02 1.169e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 09:02:43,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3433540.0, ans=0.0 2023-11-28 09:02:52,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3433606.6666666665, ans=10.0 2023-11-28 09:02:54,302 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515050 2023-11-28 09:02:57,630 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10050, loss[loss=0.06399, simple_loss=0.08404, pruned_loss=0.01435, audio_tagging_loss=0.007623, over 15147.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08986, pruned_loss=0.01233, audio_tagging_loss=0.008507, over 3049867.03 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:03:32,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3433873.3333333335, ans=0.07 2023-11-28 09:03:37,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3433873.3333333335, ans=0.2 2023-11-28 09:03:37,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3433873.3333333335, ans=0.2 2023-11-28 09:03:46,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3433940.0, ans=0.125 2023-11-28 09:03:51,614 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515100 2023-11-28 09:03:55,267 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10100, loss[loss=0.04669, simple_loss=0.06006, pruned_loss=0.006655, audio_tagging_loss=0.01, over 15603.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08998, pruned_loss=0.01234, audio_tagging_loss=0.008581, over 3049656.12 frames. ], batch size: 62, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:04:05,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3434073.3333333335, ans=0.125 2023-11-28 09:04:13,636 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.780e+01 9.411e+01 9.939e+01 1.267e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 09:04:17,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3434140.0, ans=0.125 2023-11-28 09:04:20,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3434140.0, ans=0.2 2023-11-28 09:04:26,567 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.52 vs. limit=22.5 2023-11-28 09:04:28,911 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.06 vs. limit=15.0 2023-11-28 09:04:43,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3434273.3333333335, ans=0.0 2023-11-28 09:04:45,962 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:04:46,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3434273.3333333335, ans=0.125 2023-11-28 09:04:47,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2023-11-28 09:04:49,313 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515150 2023-11-28 09:04:53,045 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10150, loss[loss=0.06638, simple_loss=0.09334, pruned_loss=0.01292, audio_tagging_loss=0.006791, over 15278.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08981, pruned_loss=0.01242, audio_tagging_loss=0.008572, over 3046310.42 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:04:59,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3434340.0, ans=0.125 2023-11-28 09:05:23,668 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:05:37,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3434606.6666666665, ans=0.0 2023-11-28 09:05:45,876 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515200 2023-11-28 09:05:48,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3434673.3333333335, ans=0.0 2023-11-28 09:05:49,224 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10200, loss[loss=0.06993, simple_loss=0.09297, pruned_loss=0.01714, audio_tagging_loss=0.006303, over 15521.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.0902, pruned_loss=0.01249, audio_tagging_loss=0.008575, over 3051605.34 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:06:03,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3434740.0, ans=10.0 2023-11-28 09:06:08,200 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.883e+01 9.493e+01 1.013e+02 1.248e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 09:06:14,799 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:06:14,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3434806.6666666665, ans=0.0 2023-11-28 09:06:15,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3434806.6666666665, ans=0.0 2023-11-28 09:06:28,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3434873.3333333335, ans=0.05 2023-11-28 09:06:37,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3434940.0, ans=0.04949747468305833 2023-11-28 09:06:39,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3434940.0, ans=0.05 2023-11-28 09:06:41,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3434940.0, ans=0.05 2023-11-28 09:06:43,215 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515250 2023-11-28 09:06:46,373 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10250, loss[loss=0.06629, simple_loss=0.08184, pruned_loss=0.01179, audio_tagging_loss=0.01358, over 14483.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09053, pruned_loss=0.01267, audio_tagging_loss=0.008686, over 3052633.80 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:06:52,295 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.57 vs. limit=6.0 2023-11-28 09:06:56,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3435006.6666666665, ans=10.0 2023-11-28 09:07:03,457 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.46 vs. limit=15.0 2023-11-28 09:07:14,099 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.64 vs. limit=6.0 2023-11-28 09:07:40,807 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515300 2023-11-28 09:07:44,063 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10300, loss[loss=0.05168, simple_loss=0.06784, pruned_loss=0.007844, audio_tagging_loss=0.009914, over 14835.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09015, pruned_loss=0.0125, audio_tagging_loss=0.008735, over 3052336.45 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:07:48,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3435340.0, ans=0.035 2023-11-28 09:07:48,982 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2023-11-28 09:08:01,879 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.744e+01 9.048e+01 9.599e+01 1.061e+02 1.681e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 09:08:02,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3435406.6666666665, ans=0.1 2023-11-28 09:08:17,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=3435540.0, ans=12.0 2023-11-28 09:08:36,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3435606.6666666665, ans=0.09899494936611666 2023-11-28 09:08:37,145 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515350 2023-11-28 09:08:40,331 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10350, loss[loss=0.05607, simple_loss=0.07848, pruned_loss=0.008558, audio_tagging_loss=0.00827, over 15326.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08936, pruned_loss=0.01229, audio_tagging_loss=0.008917, over 3050342.87 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:09:04,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3435806.6666666665, ans=0.1 2023-11-28 09:09:14,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3435873.3333333335, ans=0.0 2023-11-28 09:09:21,713 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2023-11-28 09:09:25,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3435940.0, ans=0.125 2023-11-28 09:09:33,524 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515400 2023-11-28 09:09:34,979 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.20 vs. limit=15.0 2023-11-28 09:09:36,953 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10400, loss[loss=0.07439, simple_loss=0.106, pruned_loss=0.01249, audio_tagging_loss=0.00888, over 15437.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08838, pruned_loss=0.01221, audio_tagging_loss=0.00907, over 3050413.36 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:09:40,207 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=22.5 2023-11-28 09:09:54,531 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.993e+01 9.634e+01 1.025e+02 1.288e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 09:10:00,673 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2023-11-28 09:10:10,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3436206.6666666665, ans=0.125 2023-11-28 09:10:12,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3436206.6666666665, ans=0.1 2023-11-28 09:10:23,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3436273.3333333335, ans=0.2 2023-11-28 09:10:24,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3436273.3333333335, ans=0.125 2023-11-28 09:10:30,071 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515450 2023-11-28 09:10:33,226 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10450, loss[loss=0.07058, simple_loss=0.09568, pruned_loss=0.01398, audio_tagging_loss=0.008762, over 14583.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08804, pruned_loss=0.01221, audio_tagging_loss=0.009086, over 3046479.47 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:10:47,486 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=22.5 2023-11-28 09:10:53,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3436406.6666666665, ans=0.125 2023-11-28 09:11:15,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3436540.0, ans=0.1 2023-11-28 09:11:26,976 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515500 2023-11-28 09:11:30,124 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10500, loss[loss=0.04706, simple_loss=0.06139, pruned_loss=0.008333, audio_tagging_loss=0.008029, over 15221.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08905, pruned_loss=0.01222, audio_tagging_loss=0.008872, over 3047504.75 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:11:36,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3436673.3333333335, ans=0.2 2023-11-28 09:11:40,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3436673.3333333335, ans=0.2 2023-11-28 09:11:40,766 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.00 vs. limit=15.0 2023-11-28 09:11:46,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3436740.0, ans=0.1 2023-11-28 09:11:48,949 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.855e+01 9.374e+01 1.019e+02 1.300e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-28 09:11:55,755 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=15.0 2023-11-28 09:11:56,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3436806.6666666665, ans=0.125 2023-11-28 09:12:25,158 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515550 2023-11-28 09:12:28,320 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10550, loss[loss=0.05117, simple_loss=0.07471, pruned_loss=0.007386, audio_tagging_loss=0.006432, over 15263.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08917, pruned_loss=0.01219, audio_tagging_loss=0.008764, over 3044465.63 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:13:17,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3437273.3333333335, ans=0.1 2023-11-28 09:13:21,720 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515600 2023-11-28 09:13:25,232 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10600, loss[loss=0.06847, simple_loss=0.09717, pruned_loss=0.01427, audio_tagging_loss=0.005614, over 14665.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09025, pruned_loss=0.01233, audio_tagging_loss=0.00865, over 3041490.62 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:13:35,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3437406.6666666665, ans=0.125 2023-11-28 09:13:42,820 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.765e+01 9.109e+01 9.906e+01 1.072e+02 1.462e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-28 09:14:03,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2023-11-28 09:14:08,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3437540.0, ans=0.1 2023-11-28 09:14:09,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3437606.6666666665, ans=0.125 2023-11-28 09:14:17,929 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515650 2023-11-28 09:14:21,260 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10650, loss[loss=0.07847, simple_loss=0.1098, pruned_loss=0.01568, audio_tagging_loss=0.007913, over 16109.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09036, pruned_loss=0.0125, audio_tagging_loss=0.008577, over 3045317.26 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:14:24,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3437673.3333333335, ans=0.025 2023-11-28 09:14:29,167 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.18 vs. limit=15.0 2023-11-28 09:14:31,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3437740.0, ans=0.2 2023-11-28 09:14:36,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3437740.0, ans=0.0 2023-11-28 09:14:56,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3437873.3333333335, ans=0.125 2023-11-28 09:15:02,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3437873.3333333335, ans=0.125 2023-11-28 09:15:12,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3437940.0, ans=0.0 2023-11-28 09:15:13,642 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515700 2023-11-28 09:15:14,247 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=15.0 2023-11-28 09:15:17,396 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10700, loss[loss=0.06269, simple_loss=0.08923, pruned_loss=0.01126, audio_tagging_loss=0.006814, over 15572.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08916, pruned_loss=0.01221, audio_tagging_loss=0.008648, over 3041222.91 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:15:26,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3438006.6666666665, ans=0.0 2023-11-28 09:15:31,590 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:15:36,689 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.621e+01 8.910e+01 9.467e+01 1.013e+02 1.295e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 09:15:45,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.83 vs. limit=15.0 2023-11-28 09:16:10,830 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515750 2023-11-28 09:16:13,976 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10750, loss[loss=0.05816, simple_loss=0.07969, pruned_loss=0.00865, audio_tagging_loss=0.009667, over 14752.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.0897, pruned_loss=0.0123, audio_tagging_loss=0.008597, over 3044297.59 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:16:16,716 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.45 vs. limit=22.5 2023-11-28 09:16:27,565 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2023-11-28 09:16:31,907 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.43 vs. limit=10.0 2023-11-28 09:16:55,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3438540.0, ans=0.0 2023-11-28 09:17:06,594 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515800 2023-11-28 09:17:06,997 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=22.5 2023-11-28 09:17:10,051 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10800, loss[loss=0.08353, simple_loss=0.1181, pruned_loss=0.01584, audio_tagging_loss=0.008629, over 16760.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09034, pruned_loss=0.01234, audio_tagging_loss=0.00848, over 3052177.91 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:17:15,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3438673.3333333335, ans=0.0 2023-11-28 09:17:19,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3438673.3333333335, ans=0.0 2023-11-28 09:17:21,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3438740.0, ans=0.125 2023-11-28 09:17:22,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3438740.0, ans=15.0 2023-11-28 09:17:29,073 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.283e+01 8.659e+01 9.192e+01 9.823e+01 1.353e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-28 09:18:02,524 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515850 2023-11-28 09:18:03,115 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=15.0 2023-11-28 09:18:06,532 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10850, loss[loss=0.05517, simple_loss=0.07312, pruned_loss=0.008939, audio_tagging_loss=0.009673, over 14461.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08923, pruned_loss=0.0121, audio_tagging_loss=0.008617, over 3046749.02 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:18:21,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3439073.3333333335, ans=0.2 2023-11-28 09:18:27,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3439140.0, ans=0.2 2023-11-28 09:18:31,389 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.13 vs. limit=22.5 2023-11-28 09:18:59,461 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515900 2023-11-28 09:19:03,242 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10900, loss[loss=0.07662, simple_loss=0.1057, pruned_loss=0.01605, audio_tagging_loss=0.007728, over 15511.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08883, pruned_loss=0.01213, audio_tagging_loss=0.008711, over 3043827.79 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:19:03,255 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:19:07,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3439340.0, ans=0.2 2023-11-28 09:19:21,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.849e+01 9.090e+01 9.658e+01 1.040e+02 1.317e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 09:19:28,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3439473.3333333335, ans=0.95 2023-11-28 09:19:42,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3439540.0, ans=0.0 2023-11-28 09:19:52,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3439606.6666666665, ans=0.0 2023-11-28 09:19:56,343 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 515950 2023-11-28 09:19:59,471 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 10950, loss[loss=0.08786, simple_loss=0.1103, pruned_loss=0.02399, audio_tagging_loss=0.008741, over 15850.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08956, pruned_loss=0.01227, audio_tagging_loss=0.008778, over 3047831.68 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:20:01,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3439673.3333333335, ans=0.1 2023-11-28 09:20:19,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3439740.0, ans=0.05 2023-11-28 09:20:34,978 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2023-11-28 09:20:40,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3439873.3333333335, ans=0.125 2023-11-28 09:20:40,545 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=15.0 2023-11-28 09:20:52,063 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516000 2023-11-28 09:20:57,618 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11000, loss[loss=0.07708, simple_loss=0.119, pruned_loss=0.01252, audio_tagging_loss=0.005035, over 15102.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08928, pruned_loss=0.01204, audio_tagging_loss=0.008715, over 3049587.08 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:21:10,786 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:21:11,056 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:21:17,808 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.160e+01 8.606e+01 9.397e+01 9.983e+01 1.237e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 09:21:44,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3440273.3333333335, ans=0.0 2023-11-28 09:21:51,205 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516050 2023-11-28 09:21:54,933 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11050, loss[loss=0.05938, simple_loss=0.07996, pruned_loss=0.007585, audio_tagging_loss=0.01182, over 15086.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09007, pruned_loss=0.0123, audio_tagging_loss=0.008769, over 3047500.96 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:21:58,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3440340.0, ans=0.125 2023-11-28 09:22:11,002 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.36 vs. limit=22.5 2023-11-28 09:22:14,028 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:22:21,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3440473.3333333335, ans=0.0 2023-11-28 09:22:42,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3440606.6666666665, ans=0.1 2023-11-28 09:22:48,686 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516100 2023-11-28 09:22:52,004 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11100, loss[loss=0.05048, simple_loss=0.06677, pruned_loss=0.004223, audio_tagging_loss=0.01287, over 14892.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09015, pruned_loss=0.01227, audio_tagging_loss=0.008853, over 3048165.48 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:22:54,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3440673.3333333335, ans=0.09899494936611666 2023-11-28 09:22:55,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3440673.3333333335, ans=0.2 2023-11-28 09:23:00,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3440673.3333333335, ans=0.2 2023-11-28 09:23:07,123 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.65 vs. limit=15.0 2023-11-28 09:23:12,317 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 8.852e+01 9.435e+01 1.052e+02 1.493e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 09:23:22,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3440806.6666666665, ans=0.125 2023-11-28 09:23:45,865 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516150 2023-11-28 09:23:49,021 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11150, loss[loss=0.06631, simple_loss=0.09538, pruned_loss=0.01077, audio_tagging_loss=0.007845, over 16307.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08986, pruned_loss=0.01225, audio_tagging_loss=0.008924, over 3053470.45 frames. ], batch size: 59, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:23:49,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3441006.6666666665, ans=0.0 2023-11-28 09:23:51,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3441006.6666666665, ans=0.125 2023-11-28 09:24:12,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3441140.0, ans=0.0 2023-11-28 09:24:12,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3441140.0, ans=0.07 2023-11-28 09:24:24,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3441206.6666666665, ans=0.125 2023-11-28 09:24:32,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3441206.6666666665, ans=0.125 2023-11-28 09:24:39,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3441273.3333333335, ans=0.05 2023-11-28 09:24:43,301 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516200 2023-11-28 09:24:47,383 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11200, loss[loss=0.06604, simple_loss=0.08779, pruned_loss=0.01378, audio_tagging_loss=0.008365, over 13432.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08874, pruned_loss=0.01202, audio_tagging_loss=0.009029, over 3052288.08 frames. ], batch size: 50, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:24:50,561 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.60 vs. limit=22.5 2023-11-28 09:25:07,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.180e+01 8.684e+01 9.493e+01 1.049e+02 1.376e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 09:25:12,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3441473.3333333335, ans=15.0 2023-11-28 09:25:27,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3441540.0, ans=0.125 2023-11-28 09:25:29,110 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.57 vs. limit=15.0 2023-11-28 09:25:35,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3441606.6666666665, ans=0.0 2023-11-28 09:25:37,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3441606.6666666665, ans=0.04949747468305833 2023-11-28 09:25:41,294 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516250 2023-11-28 09:25:44,997 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11250, loss[loss=0.05421, simple_loss=0.07541, pruned_loss=0.009682, audio_tagging_loss=0.006827, over 17376.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08835, pruned_loss=0.01192, audio_tagging_loss=0.009061, over 3058063.12 frames. ], batch size: 65, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:25:45,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3441673.3333333335, ans=0.0 2023-11-28 09:25:48,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3441673.3333333335, ans=0.125 2023-11-28 09:25:50,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3441673.3333333335, ans=0.0 2023-11-28 09:26:10,384 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-28 09:26:15,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3441806.6666666665, ans=0.125 2023-11-28 09:26:15,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3441806.6666666665, ans=0.0 2023-11-28 09:26:38,709 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516300 2023-11-28 09:26:41,910 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11300, loss[loss=0.04586, simple_loss=0.05833, pruned_loss=0.006701, audio_tagging_loss=0.009989, over 15779.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08891, pruned_loss=0.01214, audio_tagging_loss=0.008893, over 3056286.64 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:27:01,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3442073.3333333335, ans=0.125 2023-11-28 09:27:02,754 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.901e+01 9.622e+01 1.003e+02 2.071e+02, threshold=1.924e+02, percent-clipped=1.0 2023-11-28 09:27:08,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3442140.0, ans=0.1 2023-11-28 09:27:26,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3442273.3333333335, ans=0.0 2023-11-28 09:27:35,484 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516350 2023-11-28 09:27:36,963 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.46 vs. limit=12.0 2023-11-28 09:27:38,733 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11350, loss[loss=0.0512, simple_loss=0.06703, pruned_loss=0.006935, audio_tagging_loss=0.01075, over 14408.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09044, pruned_loss=0.01256, audio_tagging_loss=0.008772, over 3053646.00 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:27:44,535 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=12.0 2023-11-28 09:27:49,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3442406.6666666665, ans=0.125 2023-11-28 09:28:32,955 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516400 2023-11-28 09:28:36,502 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11400, loss[loss=0.07178, simple_loss=0.1119, pruned_loss=0.009071, audio_tagging_loss=0.006754, over 16828.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09013, pruned_loss=0.01235, audio_tagging_loss=0.008737, over 3048754.90 frames. ], batch size: 63, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:28:38,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3442673.3333333335, ans=0.125 2023-11-28 09:28:56,366 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.771e+01 9.196e+01 9.896e+01 1.286e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-28 09:28:56,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3442740.0, ans=0.0 2023-11-28 09:29:10,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3442873.3333333335, ans=0.125 2023-11-28 09:29:12,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3442873.3333333335, ans=0.125 2023-11-28 09:29:15,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3442873.3333333335, ans=0.1 2023-11-28 09:29:17,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3442873.3333333335, ans=10.0 2023-11-28 09:29:20,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.48 vs. limit=15.0 2023-11-28 09:29:30,204 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516450 2023-11-28 09:29:33,425 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11450, loss[loss=0.06013, simple_loss=0.0885, pruned_loss=0.01058, audio_tagging_loss=0.005303, over 14576.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08993, pruned_loss=0.01231, audio_tagging_loss=0.008657, over 3050648.54 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:29:54,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3443073.3333333335, ans=0.125 2023-11-28 09:30:00,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3443140.0, ans=0.125 2023-11-28 09:30:20,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3443273.3333333335, ans=0.125 2023-11-28 09:30:27,800 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516500 2023-11-28 09:30:30,961 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11500, loss[loss=0.06382, simple_loss=0.09161, pruned_loss=0.01068, audio_tagging_loss=0.007331, over 15179.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08923, pruned_loss=0.01213, audio_tagging_loss=0.008726, over 3051447.84 frames. ], batch size: 57, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:30:31,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3443340.0, ans=0.125 2023-11-28 09:30:37,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3443340.0, ans=0.0 2023-11-28 09:30:39,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3443340.0, ans=0.125 2023-11-28 09:30:41,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3443406.6666666665, ans=0.2 2023-11-28 09:30:52,629 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.962e+01 8.608e+01 9.367e+01 9.940e+01 1.192e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 09:31:02,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3443473.3333333335, ans=0.125 2023-11-28 09:31:08,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3443540.0, ans=0.2 2023-11-28 09:31:16,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3443606.6666666665, ans=0.2 2023-11-28 09:31:22,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3443606.6666666665, ans=0.0 2023-11-28 09:31:25,522 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516550 2023-11-28 09:31:27,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3443673.3333333335, ans=0.1 2023-11-28 09:31:28,718 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11550, loss[loss=0.05067, simple_loss=0.06933, pruned_loss=0.00755, audio_tagging_loss=0.00846, over 15051.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08911, pruned_loss=0.01204, audio_tagging_loss=0.008654, over 3047418.03 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:31:56,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3443806.6666666665, ans=0.0 2023-11-28 09:32:06,697 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 09:32:13,504 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2023-11-28 09:32:20,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3443940.0, ans=0.125 2023-11-28 09:32:21,767 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516600 2023-11-28 09:32:25,235 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11600, loss[loss=0.05121, simple_loss=0.06254, pruned_loss=0.008688, audio_tagging_loss=0.01125, over 13499.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.0886, pruned_loss=0.01186, audio_tagging_loss=0.008651, over 3044025.68 frames. ], batch size: 52, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:32:32,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3444006.6666666665, ans=0.5 2023-11-28 09:32:47,366 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.769e+01 9.333e+01 1.033e+02 1.788e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-28 09:32:50,187 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:33:07,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3444206.6666666665, ans=0.1 2023-11-28 09:33:09,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3444273.3333333335, ans=0.125 2023-11-28 09:33:18,720 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516650 2023-11-28 09:33:22,537 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11650, loss[loss=0.06126, simple_loss=0.08568, pruned_loss=0.01212, audio_tagging_loss=0.006294, over 15769.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08908, pruned_loss=0.01201, audio_tagging_loss=0.008625, over 3043198.23 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:33:41,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3444406.6666666665, ans=0.0 2023-11-28 09:34:00,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3444540.0, ans=0.125 2023-11-28 09:34:06,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3444540.0, ans=0.0 2023-11-28 09:34:17,091 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516700 2023-11-28 09:34:18,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3444606.6666666665, ans=0.1 2023-11-28 09:34:20,379 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11700, loss[loss=0.05811, simple_loss=0.0755, pruned_loss=0.01016, audio_tagging_loss=0.01019, over 15770.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08904, pruned_loss=0.0121, audio_tagging_loss=0.008599, over 3047620.74 frames. ], batch size: 60, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:34:21,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3444673.3333333335, ans=0.0 2023-11-28 09:34:22,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3444673.3333333335, ans=0.07 2023-11-28 09:34:42,355 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.351e+01 8.763e+01 9.224e+01 1.034e+02 1.340e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-28 09:34:46,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3444806.6666666665, ans=0.0 2023-11-28 09:35:05,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3444940.0, ans=0.0 2023-11-28 09:35:14,268 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516750 2023-11-28 09:35:17,412 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11750, loss[loss=0.06163, simple_loss=0.08791, pruned_loss=0.009438, audio_tagging_loss=0.008239, over 14678.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.0891, pruned_loss=0.01219, audio_tagging_loss=0.008595, over 3049031.23 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:35:19,171 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-11-28 09:35:33,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3445073.3333333335, ans=0.125 2023-11-28 09:35:35,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3445073.3333333335, ans=0.125 2023-11-28 09:35:52,206 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=22.5 2023-11-28 09:36:10,196 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516800 2023-11-28 09:36:14,212 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11800, loss[loss=0.09087, simple_loss=0.119, pruned_loss=0.02273, audio_tagging_loss=0.008642, over 15363.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08849, pruned_loss=0.01219, audio_tagging_loss=0.008706, over 3051917.89 frames. ], batch size: 55, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:36:23,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3445340.0, ans=0.125 2023-11-28 09:36:25,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3445406.6666666665, ans=0.1 2023-11-28 09:36:37,266 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.864e+01 9.665e+01 1.018e+02 1.283e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 09:36:40,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3445473.3333333335, ans=0.0 2023-11-28 09:36:59,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3445606.6666666665, ans=0.0 2023-11-28 09:37:04,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3445606.6666666665, ans=0.0 2023-11-28 09:37:09,161 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516850 2023-11-28 09:37:12,340 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11850, loss[loss=0.06422, simple_loss=0.08133, pruned_loss=0.01333, audio_tagging_loss=0.01022, over 15740.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08883, pruned_loss=0.01215, audio_tagging_loss=0.008799, over 3057132.47 frames. ], batch size: 61, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:37:31,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3445740.0, ans=0.125 2023-11-28 09:37:43,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3445806.6666666665, ans=0.125 2023-11-28 09:38:06,157 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516900 2023-11-28 09:38:06,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-28 09:38:09,351 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11900, loss[loss=0.05232, simple_loss=0.06138, pruned_loss=0.01033, audio_tagging_loss=0.0113, over 14111.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08891, pruned_loss=0.01213, audio_tagging_loss=0.008886, over 3051936.36 frames. ], batch size: 54, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:38:16,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3446006.6666666665, ans=0.0 2023-11-28 09:38:32,383 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.705e+01 9.389e+01 1.010e+02 1.284e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 09:38:39,065 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:38:41,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3446140.0, ans=0.0 2023-11-28 09:38:52,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3446206.6666666665, ans=0.0 2023-11-28 09:38:52,452 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.37 vs. limit=15.0 2023-11-28 09:38:55,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3446273.3333333335, ans=0.2 2023-11-28 09:38:56,109 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.81 vs. limit=15.0 2023-11-28 09:39:00,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=3446273.3333333335, ans=0.2 2023-11-28 09:39:02,991 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 516950 2023-11-28 09:39:06,129 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 11950, loss[loss=0.06713, simple_loss=0.0934, pruned_loss=0.01219, audio_tagging_loss=0.008245, over 15538.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08805, pruned_loss=0.01195, audio_tagging_loss=0.009019, over 3056761.05 frames. ], batch size: 58, lr: 1.56e-03, grad_scale: 16.0 2023-11-28 09:39:06,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3446340.0, ans=0.1 2023-11-28 09:39:09,525 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.71 vs. limit=15.0 2023-11-28 09:39:20,663 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2023-11-28 09:39:30,507 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.59 vs. limit=10.0 2023-11-28 09:39:31,547 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=22.5 2023-11-28 09:39:35,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3446473.3333333335, ans=0.0 2023-11-28 09:39:44,301 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:39:51,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3446606.6666666665, ans=0.125 2023-11-28 09:39:58,675 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517000 2023-11-28 09:40:01,988 INFO [train_asr.py:1235] (2/4) Epoch 43, batch 12000, loss[loss=0.07438, simple_loss=0.1021, pruned_loss=0.01426, audio_tagging_loss=0.009076, over 15195.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08864, pruned_loss=0.01216, audio_tagging_loss=0.009032, over 3056490.61 frames. ], batch size: 56, lr: 1.56e-03, grad_scale: 32.0 2023-11-28 09:40:01,989 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 09:40:36,963 INFO [train_asr.py:1267] (2/4) Epoch 43, validation: loss=0.05826, simple_loss=0.05053, pruned_loss=0.005231, audio_tagging_loss=0.02777, over 4681554.00 frames. 2023-11-28 09:40:36,964 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 09:40:57,624 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.981e+01 9.596e+01 1.044e+02 1.233e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-28 09:41:18,033 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 0, loss[loss=0.08361, simple_loss=0.1062, pruned_loss=0.0128, audio_tagging_loss=0.01773, over 15479.00 frames. ], tot_loss[loss=0.08361, simple_loss=0.1062, pruned_loss=0.0128, audio_tagging_loss=0.01773, over 15479.00 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:41:18,034 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 09:41:44,121 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3080, 4.2519, 4.4637, 4.4728], device='cuda:2') 2023-11-28 09:41:52,344 INFO [train_asr.py:1267] (2/4) Epoch 44, validation: loss=0.05791, simple_loss=0.05054, pruned_loss=0.00521, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-28 09:41:52,345 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 09:41:58,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3446840.0, ans=0.1 2023-11-28 09:42:12,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3446906.6666666665, ans=0.125 2023-11-28 09:42:18,944 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517050 2023-11-28 09:42:19,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3446973.3333333335, ans=0.1 2023-11-28 09:42:25,958 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:42:33,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3447040.0, ans=0.0 2023-11-28 09:42:39,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3447106.6666666665, ans=0.1 2023-11-28 09:42:43,519 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.45 vs. limit=15.0 2023-11-28 09:42:44,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3447106.6666666665, ans=0.125 2023-11-28 09:42:45,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3447106.6666666665, ans=0.125 2023-11-28 09:42:50,858 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 50, loss[loss=0.07848, simple_loss=0.09621, pruned_loss=0.01432, audio_tagging_loss=0.01606, over 16386.00 frames. ], tot_loss[loss=0.0742, simple_loss=0.0901, pruned_loss=0.01243, audio_tagging_loss=0.01672, over 687564.80 frames. ], batch size: 63, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:42:51,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3447173.3333333335, ans=0.125 2023-11-28 09:42:51,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.92 vs. limit=15.0 2023-11-28 09:43:02,293 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.11 vs. limit=22.5 2023-11-28 09:43:15,006 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.36 vs. limit=10.0 2023-11-28 09:43:15,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3447306.6666666665, ans=0.0 2023-11-28 09:43:16,882 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517100 2023-11-28 09:43:21,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3447306.6666666665, ans=0.0 2023-11-28 09:43:27,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3447373.3333333335, ans=0.125 2023-11-28 09:43:35,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3447373.3333333335, ans=0.125 2023-11-28 09:43:44,303 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.529e+01 9.824e+01 1.052e+02 1.128e+02 1.642e+02, threshold=2.105e+02, percent-clipped=0.0 2023-11-28 09:43:48,355 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-11-28 09:43:50,422 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 100, loss[loss=0.06304, simple_loss=0.07805, pruned_loss=0.008378, audio_tagging_loss=0.01564, over 14449.00 frames. ], tot_loss[loss=0.07248, simple_loss=0.08872, pruned_loss=0.01205, audio_tagging_loss=0.01608, over 1208666.88 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:43:52,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3447506.6666666665, ans=0.1 2023-11-28 09:44:00,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3447573.3333333335, ans=0.2 2023-11-28 09:44:10,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3447573.3333333335, ans=0.125 2023-11-28 09:44:12,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3447640.0, ans=0.04949747468305833 2023-11-28 09:44:15,367 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517150 2023-11-28 09:44:22,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3447640.0, ans=0.125 2023-11-28 09:44:24,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3447706.6666666665, ans=0.0 2023-11-28 09:44:29,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3447706.6666666665, ans=0.125 2023-11-28 09:44:30,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3447706.6666666665, ans=0.125 2023-11-28 09:44:47,932 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 150, loss[loss=0.07247, simple_loss=0.09191, pruned_loss=0.01353, audio_tagging_loss=0.01298, over 15688.00 frames. ], tot_loss[loss=0.07122, simple_loss=0.08963, pruned_loss=0.01208, audio_tagging_loss=0.01432, over 1617131.72 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:44:52,868 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-28 09:45:00,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3447906.6666666665, ans=0.125 2023-11-28 09:45:01,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3447906.6666666665, ans=0.125 2023-11-28 09:45:14,026 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517200 2023-11-28 09:45:16,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3447973.3333333335, ans=0.125 2023-11-28 09:45:34,239 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=22.5 2023-11-28 09:45:38,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3448106.6666666665, ans=0.0 2023-11-28 09:45:41,866 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 9.000e+01 9.478e+01 1.042e+02 1.328e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 09:45:46,277 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 200, loss[loss=0.08083, simple_loss=0.1183, pruned_loss=0.01361, audio_tagging_loss=0.008059, over 15700.00 frames. ], tot_loss[loss=0.07016, simple_loss=0.09055, pruned_loss=0.01215, audio_tagging_loss=0.01273, over 1927987.80 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:45:51,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3448173.3333333335, ans=0.125 2023-11-28 09:46:05,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3448240.0, ans=0.1 2023-11-28 09:46:11,898 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517250 2023-11-28 09:46:22,900 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2023-11-28 09:46:29,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3448373.3333333335, ans=0.2 2023-11-28 09:46:34,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3448440.0, ans=0.125 2023-11-28 09:46:43,888 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 250, loss[loss=0.06558, simple_loss=0.08664, pruned_loss=0.01346, audio_tagging_loss=0.008808, over 14978.00 frames. ], tot_loss[loss=0.06886, simple_loss=0.09047, pruned_loss=0.01216, audio_tagging_loss=0.01146, over 2175704.00 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:46:47,828 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.94 vs. limit=15.0 2023-11-28 09:46:56,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3448573.3333333335, ans=0.025 2023-11-28 09:47:07,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3448640.0, ans=0.1 2023-11-28 09:47:09,210 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517300 2023-11-28 09:47:12,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3448640.0, ans=0.05 2023-11-28 09:47:35,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3448773.3333333335, ans=0.125 2023-11-28 09:47:36,517 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.779e+01 9.287e+01 9.816e+01 1.058e+02 1.436e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-28 09:47:41,509 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 300, loss[loss=0.07546, simple_loss=0.1105, pruned_loss=0.01319, audio_tagging_loss=0.007007, over 14774.00 frames. ], tot_loss[loss=0.06833, simple_loss=0.09092, pruned_loss=0.01226, audio_tagging_loss=0.01061, over 2367690.54 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:47:56,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3448906.6666666665, ans=0.0 2023-11-28 09:48:07,249 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517350 2023-11-28 09:48:08,523 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:48:09,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3448973.3333333335, ans=0.125 2023-11-28 09:48:26,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3449106.6666666665, ans=0.125 2023-11-28 09:48:39,229 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 350, loss[loss=0.04639, simple_loss=0.05514, pruned_loss=0.00795, audio_tagging_loss=0.01087, over 16804.00 frames. ], tot_loss[loss=0.06733, simple_loss=0.09026, pruned_loss=0.01212, audio_tagging_loss=0.01008, over 2515818.88 frames. ], batch size: 65, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:48:39,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3449173.3333333335, ans=0.125 2023-11-28 09:48:52,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3449240.0, ans=0.125 2023-11-28 09:48:59,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3449240.0, ans=0.0 2023-11-28 09:49:04,251 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517400 2023-11-28 09:49:15,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3449373.3333333335, ans=0.125 2023-11-28 09:49:25,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3449440.0, ans=0.07 2023-11-28 09:49:32,631 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 9.082e+01 9.709e+01 1.033e+02 1.269e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 09:49:37,631 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 400, loss[loss=0.07738, simple_loss=0.1021, pruned_loss=0.0141, audio_tagging_loss=0.01223, over 15865.00 frames. ], tot_loss[loss=0.06683, simple_loss=0.08977, pruned_loss=0.01216, audio_tagging_loss=0.009789, over 2635747.56 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:49:50,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3449573.3333333335, ans=0.125 2023-11-28 09:49:59,934 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.61 vs. limit=6.0 2023-11-28 09:50:03,342 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517450 2023-11-28 09:50:14,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3449706.6666666665, ans=0.125 2023-11-28 09:50:22,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3449773.3333333335, ans=0.07 2023-11-28 09:50:25,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3449773.3333333335, ans=0.07 2023-11-28 09:50:27,678 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.05 vs. limit=22.5 2023-11-28 09:50:33,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3449840.0, ans=0.125 2023-11-28 09:50:34,877 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 450, loss[loss=0.05844, simple_loss=0.08146, pruned_loss=0.009568, audio_tagging_loss=0.008146, over 15129.00 frames. ], tot_loss[loss=0.06657, simple_loss=0.08948, pruned_loss=0.01227, audio_tagging_loss=0.009553, over 2721964.56 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:50:57,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3449973.3333333335, ans=0.2 2023-11-28 09:51:00,741 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517500 2023-11-28 09:51:05,006 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.79 vs. limit=10.0 2023-11-28 09:51:06,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3449973.3333333335, ans=0.125 2023-11-28 09:51:28,852 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 8.576e+01 9.362e+01 1.011e+02 1.317e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 09:51:31,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3450173.3333333335, ans=0.1 2023-11-28 09:51:32,723 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 500, loss[loss=0.08568, simple_loss=0.1179, pruned_loss=0.01911, audio_tagging_loss=0.007612, over 14710.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.08929, pruned_loss=0.01228, audio_tagging_loss=0.009329, over 2796515.11 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:51:35,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3450173.3333333335, ans=0.1 2023-11-28 09:51:57,726 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2023-11-28 09:51:58,186 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517550 2023-11-28 09:52:02,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3450306.6666666665, ans=0.0 2023-11-28 09:52:08,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3450373.3333333335, ans=0.2 2023-11-28 09:52:09,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3450373.3333333335, ans=0.125 2023-11-28 09:52:10,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3450373.3333333335, ans=0.05 2023-11-28 09:52:30,028 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 550, loss[loss=0.05415, simple_loss=0.06538, pruned_loss=0.01286, audio_tagging_loss=0.0086, over 14619.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09028, pruned_loss=0.01239, audio_tagging_loss=0.009094, over 2854584.04 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:52:40,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3450573.3333333335, ans=0.125 2023-11-28 09:52:55,422 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517600 2023-11-28 09:53:13,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3450706.6666666665, ans=0.0 2023-11-28 09:53:24,171 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.868e+01 9.461e+01 1.003e+02 1.214e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-28 09:53:27,497 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 600, loss[loss=0.05095, simple_loss=0.06276, pruned_loss=0.01, audio_tagging_loss=0.009567, over 15146.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.08955, pruned_loss=0.01235, audio_tagging_loss=0.009105, over 2894316.24 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:53:29,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3450840.0, ans=0.125 2023-11-28 09:53:39,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3450906.6666666665, ans=0.1 2023-11-28 09:53:41,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3450906.6666666665, ans=0.0 2023-11-28 09:53:53,175 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517650 2023-11-28 09:53:58,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3450973.3333333335, ans=0.125 2023-11-28 09:54:01,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3450973.3333333335, ans=0.125 2023-11-28 09:54:18,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3451106.6666666665, ans=15.0 2023-11-28 09:54:19,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3451106.6666666665, ans=0.125 2023-11-28 09:54:25,030 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 650, loss[loss=0.07016, simple_loss=0.09227, pruned_loss=0.01631, audio_tagging_loss=0.007714, over 14819.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09021, pruned_loss=0.01241, audio_tagging_loss=0.009076, over 2930403.62 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:54:40,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3451240.0, ans=15.0 2023-11-28 09:54:40,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3451240.0, ans=0.0 2023-11-28 09:54:50,117 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517700 2023-11-28 09:54:50,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3451306.6666666665, ans=0.2 2023-11-28 09:54:53,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3451306.6666666665, ans=0.1 2023-11-28 09:55:16,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3451440.0, ans=0.125 2023-11-28 09:55:18,038 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 9.000e+01 9.495e+01 1.012e+02 1.235e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 09:55:21,740 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 700, loss[loss=0.07482, simple_loss=0.09895, pruned_loss=0.01732, audio_tagging_loss=0.008025, over 15757.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09032, pruned_loss=0.01239, audio_tagging_loss=0.00896, over 2957972.02 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:55:24,408 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.15 vs. limit=15.0 2023-11-28 09:55:31,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3451506.6666666665, ans=0.1 2023-11-28 09:55:46,366 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517750 2023-11-28 09:55:50,274 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:55:50,493 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=15.0 2023-11-28 09:56:17,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3451840.0, ans=0.0 2023-11-28 09:56:18,697 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 750, loss[loss=0.08138, simple_loss=0.1059, pruned_loss=0.01947, audio_tagging_loss=0.008936, over 15210.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09044, pruned_loss=0.01245, audio_tagging_loss=0.008966, over 2984553.83 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:56:20,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3451840.0, ans=0.1 2023-11-28 09:56:42,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3451973.3333333335, ans=0.0 2023-11-28 09:56:44,398 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517800 2023-11-28 09:56:50,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3451973.3333333335, ans=0.125 2023-11-28 09:57:12,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3452106.6666666665, ans=0.0 2023-11-28 09:57:13,223 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 8.892e+01 9.576e+01 1.074e+02 1.448e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 09:57:14,989 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.25 vs. limit=10.0 2023-11-28 09:57:16,371 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 800, loss[loss=0.08191, simple_loss=0.11, pruned_loss=0.0181, audio_tagging_loss=0.008791, over 15915.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09075, pruned_loss=0.01253, audio_tagging_loss=0.008942, over 2995906.43 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 09:57:31,271 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2023-11-28 09:57:36,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3452240.0, ans=0.125 2023-11-28 09:57:42,657 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517850 2023-11-28 09:58:09,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3452440.0, ans=0.1 2023-11-28 09:58:11,982 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.45 vs. limit=22.5 2023-11-28 09:58:12,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3452440.0, ans=0.2 2023-11-28 09:58:12,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3452440.0, ans=0.0 2023-11-28 09:58:14,588 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 850, loss[loss=0.05043, simple_loss=0.07218, pruned_loss=0.005522, audio_tagging_loss=0.008823, over 14883.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.0897, pruned_loss=0.01232, audio_tagging_loss=0.008955, over 3006677.63 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:58:16,490 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 09:58:25,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3452573.3333333335, ans=0.0 2023-11-28 09:58:26,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3452573.3333333335, ans=0.1 2023-11-28 09:58:39,969 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517900 2023-11-28 09:58:43,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3452640.0, ans=0.2 2023-11-28 09:58:49,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3452706.6666666665, ans=0.0 2023-11-28 09:59:05,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3452773.3333333335, ans=0.125 2023-11-28 09:59:10,924 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.934e+01 9.404e+01 1.018e+02 1.329e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 09:59:12,464 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.97 vs. limit=15.0 2023-11-28 09:59:13,130 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 900, loss[loss=0.06112, simple_loss=0.08939, pruned_loss=0.0077, audio_tagging_loss=0.008724, over 15254.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08916, pruned_loss=0.01225, audio_tagging_loss=0.009016, over 3014285.27 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 09:59:34,834 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-28 09:59:37,873 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 517950 2023-11-28 09:59:49,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3453040.0, ans=0.125 2023-11-28 09:59:53,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3453040.0, ans=10.0 2023-11-28 10:00:09,550 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 950, loss[loss=0.06198, simple_loss=0.07816, pruned_loss=0.01287, audio_tagging_loss=0.01003, over 15350.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.08963, pruned_loss=0.01231, audio_tagging_loss=0.009038, over 3027523.92 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:00:13,757 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=12.0 2023-11-28 10:00:20,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3453240.0, ans=0.1 2023-11-28 10:00:22,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3453240.0, ans=0.125 2023-11-28 10:00:35,428 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518000 2023-11-28 10:00:42,877 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.47 vs. limit=10.0 2023-11-28 10:00:43,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3453373.3333333335, ans=0.0 2023-11-28 10:00:44,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3453373.3333333335, ans=0.1 2023-11-28 10:01:01,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3453440.0, ans=0.2 2023-11-28 10:01:05,936 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.727e+01 8.698e+01 9.447e+01 1.001e+02 1.435e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 10:01:07,028 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1000, loss[loss=0.07658, simple_loss=0.1002, pruned_loss=0.01574, audio_tagging_loss=0.01077, over 14997.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09007, pruned_loss=0.01242, audio_tagging_loss=0.008871, over 3036254.23 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:01:28,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3453573.3333333335, ans=0.0 2023-11-28 10:01:32,567 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518050 2023-11-28 10:01:33,677 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:01:35,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3453640.0, ans=0.125 2023-11-28 10:01:48,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3453706.6666666665, ans=0.0 2023-11-28 10:01:49,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3453706.6666666665, ans=0.125 2023-11-28 10:01:57,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3453773.3333333335, ans=0.125 2023-11-28 10:02:05,509 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1050, loss[loss=0.0772, simple_loss=0.1065, pruned_loss=0.0168, audio_tagging_loss=0.007141, over 15222.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08968, pruned_loss=0.01233, audio_tagging_loss=0.00869, over 3034828.05 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:02:15,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3453906.6666666665, ans=0.0 2023-11-28 10:02:21,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3453906.6666666665, ans=0.125 2023-11-28 10:02:21,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3453906.6666666665, ans=0.125 2023-11-28 10:02:30,875 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518100 2023-11-28 10:02:48,369 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:02:48,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3454040.0, ans=0.0 2023-11-28 10:02:57,703 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.39 vs. limit=12.0 2023-11-28 10:03:01,559 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.979e+01 9.409e+01 9.986e+01 1.298e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 10:03:02,653 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1100, loss[loss=0.05186, simple_loss=0.07866, pruned_loss=0.004118, audio_tagging_loss=0.00841, over 15274.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08923, pruned_loss=0.01228, audio_tagging_loss=0.008667, over 3038139.39 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:03:07,514 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.48 vs. limit=22.5 2023-11-28 10:03:08,622 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:03:11,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3454173.3333333335, ans=0.0 2023-11-28 10:03:13,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3454240.0, ans=0.125 2023-11-28 10:03:19,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3454240.0, ans=0.125 2023-11-28 10:03:28,439 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518150 2023-11-28 10:03:32,858 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:03:41,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3454373.3333333335, ans=0.125 2023-11-28 10:03:50,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3454440.0, ans=0.125 2023-11-28 10:03:52,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3454440.0, ans=0.07 2023-11-28 10:03:59,614 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1150, loss[loss=0.0723, simple_loss=0.09279, pruned_loss=0.01569, audio_tagging_loss=0.01022, over 15333.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.0888, pruned_loss=0.01225, audio_tagging_loss=0.008679, over 3038159.45 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:04:23,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2023-11-28 10:04:24,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3454640.0, ans=0.0 2023-11-28 10:04:24,959 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518200 2023-11-28 10:04:35,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3454706.6666666665, ans=0.0 2023-11-28 10:04:40,245 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:04:53,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3454773.3333333335, ans=0.0 2023-11-28 10:04:57,108 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 8.839e+01 9.353e+01 1.036e+02 1.275e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-28 10:04:58,225 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1200, loss[loss=0.06032, simple_loss=0.08923, pruned_loss=0.006928, audio_tagging_loss=0.008774, over 16064.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08917, pruned_loss=0.01227, audio_tagging_loss=0.008628, over 3033387.37 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:05:18,419 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.82 vs. limit=15.0 2023-11-28 10:05:21,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3454973.3333333335, ans=0.125 2023-11-28 10:05:22,755 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518250 2023-11-28 10:05:29,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3454973.3333333335, ans=0.125 2023-11-28 10:05:30,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3455040.0, ans=0.1 2023-11-28 10:05:33,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3455040.0, ans=0.2 2023-11-28 10:05:53,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3455173.3333333335, ans=0.1 2023-11-28 10:05:54,741 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1250, loss[loss=0.05961, simple_loss=0.09475, pruned_loss=0.005325, audio_tagging_loss=0.006911, over 15941.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.0891, pruned_loss=0.01198, audio_tagging_loss=0.008611, over 3035162.25 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:06:20,699 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518300 2023-11-28 10:06:37,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3455373.3333333335, ans=0.125 2023-11-28 10:06:38,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3455373.3333333335, ans=0.95 2023-11-28 10:06:38,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3455373.3333333335, ans=0.2 2023-11-28 10:06:48,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3455440.0, ans=0.0 2023-11-28 10:06:50,828 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.624e+01 8.649e+01 9.225e+01 9.865e+01 1.174e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-28 10:06:51,955 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1300, loss[loss=0.03896, simple_loss=0.04126, pruned_loss=0.00604, audio_tagging_loss=0.01229, over 14322.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08825, pruned_loss=0.01178, audio_tagging_loss=0.008596, over 3034892.56 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:07:03,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3455573.3333333335, ans=0.1 2023-11-28 10:07:16,464 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.66 vs. limit=15.0 2023-11-28 10:07:17,147 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518350 2023-11-28 10:07:27,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3455706.6666666665, ans=0.125 2023-11-28 10:07:29,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3455706.6666666665, ans=0.1 2023-11-28 10:07:49,282 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1350, loss[loss=0.05818, simple_loss=0.07437, pruned_loss=0.01095, audio_tagging_loss=0.01005, over 15903.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08786, pruned_loss=0.01174, audio_tagging_loss=0.008609, over 3034991.41 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:07:53,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3455840.0, ans=0.125 2023-11-28 10:08:14,042 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518400 2023-11-28 10:08:20,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3455973.3333333335, ans=0.0 2023-11-28 10:08:33,610 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:08:43,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3456106.6666666665, ans=0.0 2023-11-28 10:08:45,074 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.326e+01 8.591e+01 9.504e+01 1.020e+02 1.211e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 10:08:46,251 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1400, loss[loss=0.06712, simple_loss=0.09348, pruned_loss=0.009888, audio_tagging_loss=0.0105, over 14884.00 frames. ], tot_loss[loss=0.06404, simple_loss=0.08768, pruned_loss=0.01159, audio_tagging_loss=0.008604, over 3035295.75 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:08:51,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3456173.3333333335, ans=0.125 2023-11-28 10:09:11,808 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518450 2023-11-28 10:09:31,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3456440.0, ans=0.125 2023-11-28 10:09:43,530 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1450, loss[loss=0.05502, simple_loss=0.0717, pruned_loss=0.01174, audio_tagging_loss=0.007427, over 14763.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08789, pruned_loss=0.01174, audio_tagging_loss=0.008663, over 3035646.12 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:09:43,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3456506.6666666665, ans=0.125 2023-11-28 10:09:53,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3456573.3333333335, ans=0.0 2023-11-28 10:10:00,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3456573.3333333335, ans=0.125 2023-11-28 10:10:08,630 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518500 2023-11-28 10:10:28,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=15.0 2023-11-28 10:10:34,126 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:10:34,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3456773.3333333335, ans=0.125 2023-11-28 10:10:35,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3456773.3333333335, ans=0.125 2023-11-28 10:10:39,657 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 8.920e+01 9.408e+01 1.027e+02 1.400e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 10:10:40,343 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2023-11-28 10:10:41,240 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1500, loss[loss=0.05753, simple_loss=0.07774, pruned_loss=0.01094, audio_tagging_loss=0.007716, over 14645.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08937, pruned_loss=0.0121, audio_tagging_loss=0.008674, over 3038466.56 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:10:45,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3456840.0, ans=0.0 2023-11-28 10:10:46,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3456840.0, ans=0.0 2023-11-28 10:10:51,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3456906.6666666665, ans=0.125 2023-11-28 10:10:54,795 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-28 10:10:58,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3456906.6666666665, ans=0.0 2023-11-28 10:11:06,398 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518550 2023-11-28 10:11:14,477 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2023-11-28 10:11:26,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3457106.6666666665, ans=0.125 2023-11-28 10:11:37,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3457173.3333333335, ans=0.2 2023-11-28 10:11:37,955 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1550, loss[loss=0.09088, simple_loss=0.1241, pruned_loss=0.02009, audio_tagging_loss=0.008735, over 14680.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08962, pruned_loss=0.01227, audio_tagging_loss=0.008691, over 3037989.25 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:11:45,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3457173.3333333335, ans=0.1 2023-11-28 10:11:45,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3457173.3333333335, ans=0.0 2023-11-28 10:11:46,588 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-11-28 10:12:03,086 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518600 2023-11-28 10:12:27,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3457440.0, ans=0.125 2023-11-28 10:12:35,067 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 8.956e+01 9.382e+01 1.022e+02 1.472e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 10:12:36,218 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1600, loss[loss=0.06211, simple_loss=0.08898, pruned_loss=0.009732, audio_tagging_loss=0.007888, over 15507.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08953, pruned_loss=0.01239, audio_tagging_loss=0.008858, over 3035050.27 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:12:38,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3457506.6666666665, ans=0.125 2023-11-28 10:12:47,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3457573.3333333335, ans=0.125 2023-11-28 10:13:01,284 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518650 2023-11-28 10:13:19,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3457706.6666666665, ans=0.0 2023-11-28 10:13:23,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3457773.3333333335, ans=0.125 2023-11-28 10:13:24,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3457773.3333333335, ans=0.035 2023-11-28 10:13:30,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3457773.3333333335, ans=0.07 2023-11-28 10:13:33,622 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1650, loss[loss=0.06987, simple_loss=0.09411, pruned_loss=0.01236, audio_tagging_loss=0.01046, over 15749.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08949, pruned_loss=0.01233, audio_tagging_loss=0.00892, over 3038811.07 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:13:35,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3457840.0, ans=0.5 2023-11-28 10:13:49,973 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.86 vs. limit=22.5 2023-11-28 10:13:58,876 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518700 2023-11-28 10:14:04,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3457973.3333333335, ans=0.0 2023-11-28 10:14:15,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3458040.0, ans=0.125 2023-11-28 10:14:20,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3458106.6666666665, ans=0.125 2023-11-28 10:14:25,851 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:14:30,052 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.751e+01 9.360e+01 1.005e+02 1.461e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 10:14:31,137 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1700, loss[loss=0.07453, simple_loss=0.1021, pruned_loss=0.01589, audio_tagging_loss=0.007593, over 15834.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.08909, pruned_loss=0.01237, audio_tagging_loss=0.008994, over 3035891.77 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:14:33,776 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.63 vs. limit=10.0 2023-11-28 10:14:44,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3458240.0, ans=0.1 2023-11-28 10:14:50,535 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.16 vs. limit=15.0 2023-11-28 10:14:56,371 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518750 2023-11-28 10:15:19,994 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.24 vs. limit=15.0 2023-11-28 10:15:24,274 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2023-11-28 10:15:28,836 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1750, loss[loss=0.05815, simple_loss=0.07076, pruned_loss=0.01134, audio_tagging_loss=0.01143, over 14213.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08937, pruned_loss=0.01225, audio_tagging_loss=0.008946, over 3040101.42 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:15:34,802 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.64 vs. limit=15.0 2023-11-28 10:15:35,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3458506.6666666665, ans=0.0 2023-11-28 10:15:37,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3458506.6666666665, ans=0.125 2023-11-28 10:15:54,027 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518800 2023-11-28 10:15:58,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3458640.0, ans=0.04949747468305833 2023-11-28 10:16:06,478 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=22.5 2023-11-28 10:16:10,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3458706.6666666665, ans=0.0 2023-11-28 10:16:24,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=3458840.0, ans=12.0 2023-11-28 10:16:25,425 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 8.578e+01 9.174e+01 9.766e+01 1.256e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-28 10:16:25,453 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1800, loss[loss=0.06506, simple_loss=0.08452, pruned_loss=0.01526, audio_tagging_loss=0.007541, over 13314.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08919, pruned_loss=0.01215, audio_tagging_loss=0.008879, over 3038605.93 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:16:30,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3458840.0, ans=0.125 2023-11-28 10:16:33,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3458840.0, ans=0.125 2023-11-28 10:16:34,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3458840.0, ans=0.5 2023-11-28 10:16:37,152 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.42 vs. limit=10.0 2023-11-28 10:16:46,850 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.97 vs. limit=15.0 2023-11-28 10:16:50,460 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518850 2023-11-28 10:16:51,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3458973.3333333335, ans=0.125 2023-11-28 10:16:58,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3458973.3333333335, ans=0.0 2023-11-28 10:17:01,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3459040.0, ans=0.0 2023-11-28 10:17:23,167 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1850, loss[loss=0.07762, simple_loss=0.1089, pruned_loss=0.01591, audio_tagging_loss=0.007278, over 15094.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.0887, pruned_loss=0.01209, audio_tagging_loss=0.008855, over 3040589.49 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:17:25,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3459173.3333333335, ans=0.1 2023-11-28 10:17:39,163 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.13 vs. limit=22.5 2023-11-28 10:17:39,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3459240.0, ans=0.0 2023-11-28 10:17:39,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3459240.0, ans=0.125 2023-11-28 10:17:47,728 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518900 2023-11-28 10:18:05,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3459373.3333333335, ans=0.125 2023-11-28 10:18:06,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3459373.3333333335, ans=0.1 2023-11-28 10:18:08,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3459440.0, ans=0.035 2023-11-28 10:18:10,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3459440.0, ans=0.125 2023-11-28 10:18:18,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3459506.6666666665, ans=0.1 2023-11-28 10:18:19,426 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.665e+01 9.197e+01 1.005e+02 1.247e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-28 10:18:19,452 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1900, loss[loss=0.07467, simple_loss=0.1097, pruned_loss=0.01352, audio_tagging_loss=0.006278, over 15125.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.0894, pruned_loss=0.01209, audio_tagging_loss=0.008719, over 3051081.37 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:18:30,975 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.70 vs. limit=15.0 2023-11-28 10:18:38,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3459573.3333333335, ans=0.125 2023-11-28 10:18:42,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3459640.0, ans=0.125 2023-11-28 10:18:45,651 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 518950 2023-11-28 10:18:49,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3459640.0, ans=0.1 2023-11-28 10:18:54,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3459706.6666666665, ans=0.1 2023-11-28 10:19:06,243 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.61 vs. limit=22.5 2023-11-28 10:19:16,890 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 1950, loss[loss=0.06262, simple_loss=0.07912, pruned_loss=0.01245, audio_tagging_loss=0.01061, over 15139.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08914, pruned_loss=0.01202, audio_tagging_loss=0.008723, over 3037665.14 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:19:22,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3459840.0, ans=0.0 2023-11-28 10:19:27,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3459906.6666666665, ans=0.125 2023-11-28 10:19:28,038 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.40 vs. limit=10.0 2023-11-28 10:19:34,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3459906.6666666665, ans=0.1 2023-11-28 10:19:37,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3459906.6666666665, ans=0.1 2023-11-28 10:19:41,707 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519000 2023-11-28 10:19:51,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3460040.0, ans=0.125 2023-11-28 10:20:01,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3460106.6666666665, ans=0.0 2023-11-28 10:20:04,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3460106.6666666665, ans=0.1 2023-11-28 10:20:12,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3460106.6666666665, ans=0.025 2023-11-28 10:20:14,534 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.984e+01 9.500e+01 1.035e+02 1.289e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 10:20:14,561 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2000, loss[loss=0.07778, simple_loss=0.1021, pruned_loss=0.01768, audio_tagging_loss=0.00903, over 14549.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08928, pruned_loss=0.01211, audio_tagging_loss=0.008738, over 3035772.57 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:20:14,981 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.61 vs. limit=12.0 2023-11-28 10:20:15,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3460173.3333333335, ans=0.1 2023-11-28 10:20:30,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3460240.0, ans=0.05 2023-11-28 10:20:39,478 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519050 2023-11-28 10:21:03,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3460440.0, ans=0.125 2023-11-28 10:21:04,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3460440.0, ans=0.0 2023-11-28 10:21:07,427 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.75 vs. limit=22.5 2023-11-28 10:21:11,333 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2050, loss[loss=0.06292, simple_loss=0.08001, pruned_loss=0.01482, audio_tagging_loss=0.008094, over 14085.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.0896, pruned_loss=0.01219, audio_tagging_loss=0.008733, over 3032689.74 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:21:15,676 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=15.0 2023-11-28 10:21:19,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3460506.6666666665, ans=15.0 2023-11-28 10:21:22,465 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2023-11-28 10:21:38,252 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519100 2023-11-28 10:21:40,908 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.77 vs. limit=10.0 2023-11-28 10:21:50,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3460706.6666666665, ans=0.125 2023-11-28 10:22:09,711 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2100, loss[loss=0.07285, simple_loss=0.09548, pruned_loss=0.01654, audio_tagging_loss=0.008572, over 14682.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09008, pruned_loss=0.01234, audio_tagging_loss=0.008689, over 3033839.39 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:22:10,760 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 8.721e+01 9.366e+01 1.002e+02 1.628e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 10:22:23,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3460906.6666666665, ans=0.0 2023-11-28 10:22:24,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3460906.6666666665, ans=0.0 2023-11-28 10:22:35,463 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519150 2023-11-28 10:23:00,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3461106.6666666665, ans=0.0 2023-11-28 10:23:08,711 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2150, loss[loss=0.05143, simple_loss=0.06403, pruned_loss=0.009829, audio_tagging_loss=0.009587, over 15326.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.0912, pruned_loss=0.01256, audio_tagging_loss=0.008638, over 3036838.01 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:23:11,561 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.69 vs. limit=10.0 2023-11-28 10:23:17,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3461173.3333333335, ans=0.125 2023-11-28 10:23:29,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3461240.0, ans=0.125 2023-11-28 10:23:32,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3461306.6666666665, ans=0.125 2023-11-28 10:23:33,910 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519200 2023-11-28 10:23:48,103 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:23:51,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3461373.3333333335, ans=0.0 2023-11-28 10:24:00,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3461440.0, ans=0.125 2023-11-28 10:24:03,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3461440.0, ans=0.0 2023-11-28 10:24:06,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3461506.6666666665, ans=0.0 2023-11-28 10:24:07,079 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2200, loss[loss=0.06206, simple_loss=0.0869, pruned_loss=0.01231, audio_tagging_loss=0.006293, over 15532.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.09108, pruned_loss=0.01258, audio_tagging_loss=0.008668, over 3035542.64 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:24:08,085 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.940e+01 9.417e+01 1.003e+02 1.474e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 10:24:12,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3461506.6666666665, ans=0.1 2023-11-28 10:24:23,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3461573.3333333335, ans=0.125 2023-11-28 10:24:25,901 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2023-11-28 10:24:32,994 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519250 2023-11-28 10:24:45,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3461706.6666666665, ans=0.125 2023-11-28 10:24:52,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3461773.3333333335, ans=0.125 2023-11-28 10:24:54,487 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=12.0 2023-11-28 10:24:56,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3461773.3333333335, ans=0.2 2023-11-28 10:24:57,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3461773.3333333335, ans=0.125 2023-11-28 10:25:04,103 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2250, loss[loss=0.05262, simple_loss=0.06497, pruned_loss=0.01125, audio_tagging_loss=0.008887, over 17166.00 frames. ], tot_loss[loss=0.06674, simple_loss=0.09092, pruned_loss=0.01263, audio_tagging_loss=0.008656, over 3044773.84 frames. ], batch size: 64, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:25:13,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3461840.0, ans=0.1 2023-11-28 10:25:24,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3461906.6666666665, ans=0.1 2023-11-28 10:25:25,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3461906.6666666665, ans=0.125 2023-11-28 10:25:29,753 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519300 2023-11-28 10:25:35,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3461973.3333333335, ans=0.125 2023-11-28 10:25:36,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3461973.3333333335, ans=0.125 2023-11-28 10:25:53,229 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.02 vs. limit=15.0 2023-11-28 10:25:53,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3462106.6666666665, ans=0.125 2023-11-28 10:26:02,945 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2300, loss[loss=0.07018, simple_loss=0.1008, pruned_loss=0.01261, audio_tagging_loss=0.007167, over 16080.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09046, pruned_loss=0.01242, audio_tagging_loss=0.008742, over 3044025.84 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:26:04,005 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.619e+01 8.792e+01 9.298e+01 1.006e+02 1.302e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 10:26:28,110 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519350 2023-11-28 10:26:28,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=3462306.6666666665, ans=0.02 2023-11-28 10:26:29,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3462306.6666666665, ans=0.1 2023-11-28 10:26:38,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3462373.3333333335, ans=0.1 2023-11-28 10:26:47,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3462373.3333333335, ans=0.125 2023-11-28 10:26:56,169 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:26:57,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3462440.0, ans=0.125 2023-11-28 10:27:00,537 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2350, loss[loss=0.08571, simple_loss=0.109, pruned_loss=0.0226, audio_tagging_loss=0.008628, over 15917.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.0902, pruned_loss=0.01232, audio_tagging_loss=0.008777, over 3050951.01 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:27:08,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3462506.6666666665, ans=0.0 2023-11-28 10:27:16,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3462573.3333333335, ans=0.125 2023-11-28 10:27:17,880 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:27:23,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3462640.0, ans=0.125 2023-11-28 10:27:25,742 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519400 2023-11-28 10:27:27,757 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2023-11-28 10:27:30,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3462640.0, ans=0.125 2023-11-28 10:27:59,274 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2400, loss[loss=0.07821, simple_loss=0.09388, pruned_loss=0.02046, audio_tagging_loss=0.01081, over 15268.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09038, pruned_loss=0.01233, audio_tagging_loss=0.00885, over 3049665.70 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:28:00,335 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.676e+01 9.385e+01 1.010e+02 1.342e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 10:28:02,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3462840.0, ans=0.0 2023-11-28 10:28:19,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3462906.6666666665, ans=0.125 2023-11-28 10:28:24,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3462973.3333333335, ans=0.125 2023-11-28 10:28:25,715 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519450 2023-11-28 10:28:29,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3462973.3333333335, ans=0.125 2023-11-28 10:28:32,963 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2023-11-28 10:28:40,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3463040.0, ans=0.1 2023-11-28 10:28:58,258 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2450, loss[loss=0.06328, simple_loss=0.07877, pruned_loss=0.01336, audio_tagging_loss=0.01053, over 15533.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09099, pruned_loss=0.01237, audio_tagging_loss=0.00891, over 3058563.05 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:29:20,303 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2023-11-28 10:29:22,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3463306.6666666665, ans=0.125 2023-11-28 10:29:23,787 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519500 2023-11-28 10:29:33,072 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=15.0 2023-11-28 10:29:40,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3463373.3333333335, ans=0.0 2023-11-28 10:29:56,340 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2500, loss[loss=0.06858, simple_loss=0.09545, pruned_loss=0.01217, audio_tagging_loss=0.008688, over 16128.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09038, pruned_loss=0.01235, audio_tagging_loss=0.008987, over 3057869.90 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:29:57,383 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.145e+01 8.648e+01 9.240e+01 1.001e+02 1.352e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-28 10:30:21,400 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519550 2023-11-28 10:30:22,833 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2023-11-28 10:30:26,358 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=22.5 2023-11-28 10:30:54,578 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2550, loss[loss=0.0521, simple_loss=0.06865, pruned_loss=0.009403, audio_tagging_loss=0.008369, over 15129.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08932, pruned_loss=0.01226, audio_tagging_loss=0.008912, over 3052935.03 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:31:00,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3463840.0, ans=0.0 2023-11-28 10:31:01,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3463840.0, ans=0.2 2023-11-28 10:31:15,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3463906.6666666665, ans=0.0 2023-11-28 10:31:19,989 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519600 2023-11-28 10:31:45,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3464106.6666666665, ans=0.125 2023-11-28 10:31:52,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3464173.3333333335, ans=0.125 2023-11-28 10:31:53,546 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2600, loss[loss=0.05744, simple_loss=0.08596, pruned_loss=0.008295, audio_tagging_loss=0.006163, over 15915.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08884, pruned_loss=0.01226, audio_tagging_loss=0.008754, over 3044691.66 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:31:56,360 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.673e+01 9.368e+01 9.896e+01 1.178e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 10:31:57,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3464173.3333333335, ans=0.125 2023-11-28 10:32:19,446 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519650 2023-11-28 10:32:52,185 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2650, loss[loss=0.06331, simple_loss=0.08721, pruned_loss=0.01225, audio_tagging_loss=0.00746, over 14696.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08931, pruned_loss=0.0124, audio_tagging_loss=0.008676, over 3041499.27 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:33:01,920 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:33:09,632 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.16 vs. limit=15.0 2023-11-28 10:33:17,820 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519700 2023-11-28 10:33:17,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3464640.0, ans=0.125 2023-11-28 10:33:21,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3464640.0, ans=0.5 2023-11-28 10:33:47,603 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.93 vs. limit=15.0 2023-11-28 10:33:50,927 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2700, loss[loss=0.06232, simple_loss=0.08749, pruned_loss=0.009649, audio_tagging_loss=0.008921, over 15651.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08948, pruned_loss=0.01248, audio_tagging_loss=0.008636, over 3044696.75 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:33:50,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3464840.0, ans=0.125 2023-11-28 10:33:53,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3464840.0, ans=0.125 2023-11-28 10:33:54,286 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 9.167e+01 9.683e+01 1.022e+02 1.162e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-28 10:33:54,520 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:34:01,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3464906.6666666665, ans=0.0 2023-11-28 10:34:13,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3464973.3333333335, ans=0.1 2023-11-28 10:34:16,141 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519750 2023-11-28 10:34:18,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3464973.3333333335, ans=0.125 2023-11-28 10:34:33,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3465040.0, ans=0.0 2023-11-28 10:34:33,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3465040.0, ans=0.125 2023-11-28 10:34:43,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3465106.6666666665, ans=0.125 2023-11-28 10:34:48,179 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2750, loss[loss=0.06908, simple_loss=0.1054, pruned_loss=0.01116, audio_tagging_loss=0.005238, over 14865.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08878, pruned_loss=0.01229, audio_tagging_loss=0.008646, over 3049404.61 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:35:06,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3465240.0, ans=0.125 2023-11-28 10:35:14,271 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519800 2023-11-28 10:35:25,729 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.50 vs. limit=6.0 2023-11-28 10:35:42,908 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:35:43,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=3465440.0, ans=0.025 2023-11-28 10:35:47,376 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2800, loss[loss=0.07772, simple_loss=0.115, pruned_loss=0.01457, audio_tagging_loss=0.005665, over 16443.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08874, pruned_loss=0.01232, audio_tagging_loss=0.008581, over 3049458.48 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:35:50,658 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 8.532e+01 9.536e+01 1.008e+02 1.642e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 10:35:55,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3465506.6666666665, ans=0.0 2023-11-28 10:35:56,801 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.84 vs. limit=22.5 2023-11-28 10:36:07,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3465573.3333333335, ans=0.125 2023-11-28 10:36:08,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3465573.3333333335, ans=0.2 2023-11-28 10:36:11,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3465640.0, ans=0.0 2023-11-28 10:36:12,949 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519850 2023-11-28 10:36:45,212 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2850, loss[loss=0.06762, simple_loss=0.09215, pruned_loss=0.0117, audio_tagging_loss=0.009844, over 14832.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08859, pruned_loss=0.01232, audio_tagging_loss=0.008518, over 3047228.42 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:36:50,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3465840.0, ans=0.125 2023-11-28 10:36:57,161 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.29 vs. limit=15.0 2023-11-28 10:37:11,156 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519900 2023-11-28 10:37:43,590 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2900, loss[loss=0.08289, simple_loss=0.1132, pruned_loss=0.01813, audio_tagging_loss=0.008187, over 15060.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08858, pruned_loss=0.01222, audio_tagging_loss=0.008499, over 3047973.75 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:37:46,054 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:37:46,880 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 8.834e+01 9.612e+01 1.019e+02 1.318e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 10:38:09,139 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 519950 2023-11-28 10:38:18,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3466373.3333333335, ans=0.2 2023-11-28 10:38:23,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3466373.3333333335, ans=0.125 2023-11-28 10:38:25,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3466373.3333333335, ans=0.0 2023-11-28 10:38:39,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3466440.0, ans=0.05 2023-11-28 10:38:42,314 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 2950, loss[loss=0.06518, simple_loss=0.08599, pruned_loss=0.01189, audio_tagging_loss=0.0103, over 14906.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08904, pruned_loss=0.01229, audio_tagging_loss=0.008607, over 3050667.69 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:38:47,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=22.5 2023-11-28 10:39:05,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3466640.0, ans=0.95 2023-11-28 10:39:06,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3466640.0, ans=0.0 2023-11-28 10:39:08,010 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520000 2023-11-28 10:39:35,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3466773.3333333335, ans=0.0 2023-11-28 10:39:42,310 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3000, loss[loss=0.08738, simple_loss=0.1306, pruned_loss=0.01739, audio_tagging_loss=0.004673, over 15431.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08963, pruned_loss=0.01235, audio_tagging_loss=0.008678, over 3052217.75 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:39:42,311 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 10:40:13,778 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9992, 3.9670, 4.8772, 4.4230], device='cuda:2') 2023-11-28 10:40:18,162 INFO [train_asr.py:1267] (2/4) Epoch 44, validation: loss=0.05741, simple_loss=0.05054, pruned_loss=0.005252, audio_tagging_loss=0.02689, over 4681554.00 frames. 2023-11-28 10:40:18,162 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 10:40:21,404 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 8.904e+01 9.559e+01 1.030e+02 1.233e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 10:40:42,562 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520050 2023-11-28 10:40:45,319 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2023-11-28 10:40:48,559 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2023-11-28 10:41:03,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3467106.6666666665, ans=0.1 2023-11-28 10:41:15,343 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.02 vs. limit=12.0 2023-11-28 10:41:15,705 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3050, loss[loss=0.083, simple_loss=0.1092, pruned_loss=0.02163, audio_tagging_loss=0.006775, over 15480.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08895, pruned_loss=0.01212, audio_tagging_loss=0.008789, over 3058535.64 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:41:18,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3467173.3333333335, ans=0.125 2023-11-28 10:41:41,501 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520100 2023-11-28 10:41:48,647 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.65 vs. limit=15.0 2023-11-28 10:41:53,534 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:41:54,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3467373.3333333335, ans=0.0 2023-11-28 10:41:56,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3467373.3333333335, ans=0.125 2023-11-28 10:42:13,277 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3100, loss[loss=0.04771, simple_loss=0.05737, pruned_loss=0.0072, audio_tagging_loss=0.01183, over 14325.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08965, pruned_loss=0.01236, audio_tagging_loss=0.008912, over 3054411.25 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:42:16,617 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 8.845e+01 9.349e+01 1.011e+02 1.262e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 10:42:33,895 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.34 vs. limit=15.0 2023-11-28 10:42:35,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3467573.3333333335, ans=0.1 2023-11-28 10:42:40,053 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520150 2023-11-28 10:42:41,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3467640.0, ans=0.1 2023-11-28 10:42:44,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3467640.0, ans=0.125 2023-11-28 10:42:49,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3467706.6666666665, ans=0.035 2023-11-28 10:43:11,808 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3150, loss[loss=0.08561, simple_loss=0.1115, pruned_loss=0.01854, audio_tagging_loss=0.01131, over 14835.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.08959, pruned_loss=0.01235, audio_tagging_loss=0.008925, over 3049739.58 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:43:13,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3467840.0, ans=0.015 2023-11-28 10:43:16,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3467840.0, ans=0.125 2023-11-28 10:43:33,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3467906.6666666665, ans=0.2 2023-11-28 10:43:37,570 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520200 2023-11-28 10:43:41,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3467973.3333333335, ans=0.125 2023-11-28 10:43:50,052 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.84 vs. limit=15.0 2023-11-28 10:43:56,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3468040.0, ans=0.04949747468305833 2023-11-28 10:44:08,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3468106.6666666665, ans=0.125 2023-11-28 10:44:10,808 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3200, loss[loss=0.05896, simple_loss=0.08007, pruned_loss=0.00912, audio_tagging_loss=0.009803, over 14656.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.089, pruned_loss=0.01224, audio_tagging_loss=0.009007, over 3044574.75 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:44:14,053 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.853e+01 9.488e+01 1.043e+02 1.212e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 10:44:15,417 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:44:35,570 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520250 2023-11-28 10:44:41,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3468306.6666666665, ans=0.125 2023-11-28 10:44:42,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3468306.6666666665, ans=0.09899494936611666 2023-11-28 10:45:07,190 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3250, loss[loss=0.04657, simple_loss=0.06072, pruned_loss=0.006356, audio_tagging_loss=0.009851, over 14719.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08859, pruned_loss=0.01213, audio_tagging_loss=0.009026, over 3038580.77 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:45:17,020 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2023-11-28 10:45:21,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3468573.3333333335, ans=0.125 2023-11-28 10:45:33,407 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520300 2023-11-28 10:45:39,253 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2023-11-28 10:45:41,844 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.58 vs. limit=15.0 2023-11-28 10:45:43,836 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.33 vs. limit=22.5 2023-11-28 10:45:48,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3468706.6666666665, ans=0.0 2023-11-28 10:46:00,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3468773.3333333335, ans=0.125 2023-11-28 10:46:03,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3468773.3333333335, ans=10.0 2023-11-28 10:46:05,077 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3300, loss[loss=0.08147, simple_loss=0.1158, pruned_loss=0.01225, audio_tagging_loss=0.01129, over 16488.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.08985, pruned_loss=0.01227, audio_tagging_loss=0.008991, over 3048324.39 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:46:05,717 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.35 vs. limit=22.5 2023-11-28 10:46:08,845 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.715e+01 8.967e+01 9.560e+01 1.010e+02 1.793e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 10:46:13,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3468840.0, ans=0.125 2023-11-28 10:46:22,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3468906.6666666665, ans=0.1 2023-11-28 10:46:27,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3468973.3333333335, ans=0.2 2023-11-28 10:46:30,797 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520350 2023-11-28 10:46:33,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3468973.3333333335, ans=0.0 2023-11-28 10:46:34,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3468973.3333333335, ans=0.125 2023-11-28 10:46:49,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3469040.0, ans=0.125 2023-11-28 10:46:51,005 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=15.0 2023-11-28 10:47:03,677 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3350, loss[loss=0.0469, simple_loss=0.06583, pruned_loss=0.005243, audio_tagging_loss=0.008747, over 15330.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08969, pruned_loss=0.01228, audio_tagging_loss=0.00887, over 3052129.31 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 10:47:05,642 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=22.5 2023-11-28 10:47:16,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3469240.0, ans=0.1 2023-11-28 10:47:28,690 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520400 2023-11-28 10:47:39,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3469373.3333333335, ans=0.125 2023-11-28 10:48:01,329 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3400, loss[loss=0.07807, simple_loss=0.111, pruned_loss=0.01415, audio_tagging_loss=0.008412, over 15723.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09011, pruned_loss=0.0124, audio_tagging_loss=0.00876, over 3060706.23 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:48:05,757 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.926e+01 9.389e+01 1.002e+02 1.280e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 10:48:07,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3469506.6666666665, ans=0.025 2023-11-28 10:48:11,383 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2023-11-28 10:48:17,778 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.63 vs. limit=22.5 2023-11-28 10:48:25,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3469640.0, ans=0.2 2023-11-28 10:48:27,289 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520450 2023-11-28 10:48:53,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3469773.3333333335, ans=0.125 2023-11-28 10:48:59,558 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3450, loss[loss=0.06956, simple_loss=0.1, pruned_loss=0.01019, audio_tagging_loss=0.009372, over 15541.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08966, pruned_loss=0.01235, audio_tagging_loss=0.008619, over 3063537.30 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:49:20,426 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=22.5 2023-11-28 10:49:22,395 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:49:25,428 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520500 2023-11-28 10:49:31,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3469973.3333333335, ans=0.1 2023-11-28 10:49:58,049 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3500, loss[loss=0.05581, simple_loss=0.07154, pruned_loss=0.00983, audio_tagging_loss=0.01021, over 15169.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09042, pruned_loss=0.01236, audio_tagging_loss=0.008519, over 3057606.43 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:50:02,334 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 9.047e+01 9.689e+01 1.031e+02 1.305e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-28 10:50:06,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3470173.3333333335, ans=0.025 2023-11-28 10:50:13,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3470240.0, ans=0.0 2023-11-28 10:50:23,670 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520550 2023-11-28 10:50:26,452 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2023-11-28 10:50:30,286 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:50:48,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3470440.0, ans=0.125 2023-11-28 10:50:53,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3470440.0, ans=0.0 2023-11-28 10:50:53,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3470440.0, ans=0.1 2023-11-28 10:50:56,673 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3550, loss[loss=0.07444, simple_loss=0.1092, pruned_loss=0.01153, audio_tagging_loss=0.008332, over 15161.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09048, pruned_loss=0.01224, audio_tagging_loss=0.008574, over 3061555.52 frames. ], batch size: 55, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:50:57,227 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2023-11-28 10:51:02,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.18 vs. limit=15.0 2023-11-28 10:51:07,031 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=22.5 2023-11-28 10:51:13,333 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.13 vs. limit=15.0 2023-11-28 10:51:22,650 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520600 2023-11-28 10:51:22,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3470640.0, ans=0.125 2023-11-28 10:51:28,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3470640.0, ans=0.125 2023-11-28 10:51:40,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3470706.6666666665, ans=0.2 2023-11-28 10:51:45,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3470773.3333333335, ans=0.0 2023-11-28 10:51:55,127 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3600, loss[loss=0.04628, simple_loss=0.05444, pruned_loss=0.009159, audio_tagging_loss=0.009897, over 14959.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.09002, pruned_loss=0.01218, audio_tagging_loss=0.008547, over 3055030.04 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:52:00,703 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.557e+01 8.694e+01 9.447e+01 1.046e+02 1.297e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 10:52:21,663 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520650 2023-11-28 10:52:21,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3470973.3333333335, ans=0.125 2023-11-28 10:52:48,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3471106.6666666665, ans=0.2 2023-11-28 10:52:54,241 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3650, loss[loss=0.05724, simple_loss=0.07867, pruned_loss=0.01089, audio_tagging_loss=0.00701, over 14395.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.0898, pruned_loss=0.01211, audio_tagging_loss=0.008559, over 3051365.93 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:53:00,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3471173.3333333335, ans=0.025 2023-11-28 10:53:14,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.71 vs. limit=15.0 2023-11-28 10:53:19,753 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520700 2023-11-28 10:53:20,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3471306.6666666665, ans=0.0 2023-11-28 10:53:33,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3471373.3333333335, ans=0.125 2023-11-28 10:53:35,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3471373.3333333335, ans=0.035 2023-11-28 10:53:52,249 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3700, loss[loss=0.05689, simple_loss=0.08043, pruned_loss=0.008332, audio_tagging_loss=0.008348, over 15559.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09084, pruned_loss=0.0123, audio_tagging_loss=0.008545, over 3056868.06 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:53:56,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3471506.6666666665, ans=0.125 2023-11-28 10:53:59,765 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.858e+01 9.302e+01 9.977e+01 1.303e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-28 10:54:19,225 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520750 2023-11-28 10:54:21,593 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 10:54:27,839 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.86 vs. limit=8.0 2023-11-28 10:54:51,701 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3750, loss[loss=0.07047, simple_loss=0.09561, pruned_loss=0.01448, audio_tagging_loss=0.00818, over 15280.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09082, pruned_loss=0.0123, audio_tagging_loss=0.008673, over 3050888.52 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:54:52,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3471840.0, ans=0.0 2023-11-28 10:54:59,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3471840.0, ans=0.2 2023-11-28 10:55:04,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3471906.6666666665, ans=0.0 2023-11-28 10:55:08,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.68 vs. limit=15.0 2023-11-28 10:55:17,464 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520800 2023-11-28 10:55:35,447 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 10:55:48,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3472106.6666666665, ans=0.05 2023-11-28 10:55:51,425 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3800, loss[loss=0.05005, simple_loss=0.07014, pruned_loss=0.006369, audio_tagging_loss=0.008612, over 15016.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09119, pruned_loss=0.01235, audio_tagging_loss=0.008581, over 3052833.71 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:55:58,018 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 9.010e+01 9.587e+01 1.023e+02 1.351e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 10:56:04,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3472240.0, ans=0.0 2023-11-28 10:56:16,943 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520850 2023-11-28 10:56:24,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3472306.6666666665, ans=0.0 2023-11-28 10:56:24,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3472306.6666666665, ans=0.1 2023-11-28 10:56:25,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3472373.3333333335, ans=0.1 2023-11-28 10:56:30,428 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.89 vs. limit=22.5 2023-11-28 10:56:33,530 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2023-11-28 10:56:49,681 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3850, loss[loss=0.06614, simple_loss=0.0844, pruned_loss=0.01488, audio_tagging_loss=0.00906, over 14860.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09106, pruned_loss=0.01245, audio_tagging_loss=0.008631, over 3056443.38 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:57:03,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3472573.3333333335, ans=0.1 2023-11-28 10:57:11,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3472573.3333333335, ans=0.125 2023-11-28 10:57:15,571 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520900 2023-11-28 10:57:17,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3472640.0, ans=0.0 2023-11-28 10:57:27,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3472706.6666666665, ans=0.0 2023-11-28 10:57:42,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3472773.3333333335, ans=0.1 2023-11-28 10:57:47,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3472840.0, ans=0.0 2023-11-28 10:57:48,652 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3900, loss[loss=0.06201, simple_loss=0.08425, pruned_loss=0.00975, audio_tagging_loss=0.01014, over 15466.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09115, pruned_loss=0.01253, audio_tagging_loss=0.008645, over 3051188.23 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:57:51,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3472840.0, ans=0.0 2023-11-28 10:57:56,092 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.009e+01 8.789e+01 9.361e+01 1.021e+02 3.606e+02, threshold=1.872e+02, percent-clipped=1.0 2023-11-28 10:58:01,141 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.96 vs. limit=10.0 2023-11-28 10:58:07,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3472906.6666666665, ans=0.0 2023-11-28 10:58:10,647 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2023-11-28 10:58:14,626 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 520950 2023-11-28 10:58:15,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3472973.3333333335, ans=0.0 2023-11-28 10:58:29,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3473040.0, ans=0.125 2023-11-28 10:58:29,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3473040.0, ans=0.1 2023-11-28 10:58:34,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3473106.6666666665, ans=0.1 2023-11-28 10:58:43,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3473106.6666666665, ans=0.1 2023-11-28 10:58:48,205 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 3950, loss[loss=0.05803, simple_loss=0.07491, pruned_loss=0.01046, audio_tagging_loss=0.01012, over 14965.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09124, pruned_loss=0.01237, audio_tagging_loss=0.008695, over 3050777.15 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 10:58:49,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3473173.3333333335, ans=0.0 2023-11-28 10:58:50,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3473173.3333333335, ans=0.09899494936611666 2023-11-28 10:58:58,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3473240.0, ans=0.125 2023-11-28 10:59:12,837 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521000 2023-11-28 10:59:39,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3473440.0, ans=0.1 2023-11-28 10:59:46,229 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4000, loss[loss=0.0686, simple_loss=0.08624, pruned_loss=0.01406, audio_tagging_loss=0.01143, over 15016.00 frames. ], tot_loss[loss=0.067, simple_loss=0.09136, pruned_loss=0.01245, audio_tagging_loss=0.008866, over 3046857.95 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 10:59:52,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.959e+01 9.483e+01 1.017e+02 1.499e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 11:00:12,050 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521050 2023-11-28 11:00:27,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3473706.6666666665, ans=0.125 2023-11-28 11:00:37,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3473773.3333333335, ans=0.0 2023-11-28 11:00:44,035 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4050, loss[loss=0.06856, simple_loss=0.09233, pruned_loss=0.01054, audio_tagging_loss=0.01186, over 15000.00 frames. ], tot_loss[loss=0.0673, simple_loss=0.09167, pruned_loss=0.01256, audio_tagging_loss=0.00891, over 3044840.43 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:00:50,388 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:01:08,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3473973.3333333335, ans=0.0 2023-11-28 11:01:10,337 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521100 2023-11-28 11:01:20,933 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-11-28 11:01:21,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3474040.0, ans=0.125 2023-11-28 11:01:27,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3474040.0, ans=0.125 2023-11-28 11:01:36,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3474106.6666666665, ans=0.1 2023-11-28 11:01:42,741 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4100, loss[loss=0.0577, simple_loss=0.08117, pruned_loss=0.007505, audio_tagging_loss=0.009604, over 16098.00 frames. ], tot_loss[loss=0.06697, simple_loss=0.09127, pruned_loss=0.01243, audio_tagging_loss=0.008906, over 3045604.79 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:01:51,403 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 8.779e+01 9.580e+01 1.037e+02 1.315e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 11:02:00,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3474240.0, ans=0.0 2023-11-28 11:02:08,130 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521150 2023-11-28 11:02:16,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3474373.3333333335, ans=0.0 2023-11-28 11:02:19,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3474373.3333333335, ans=0.2 2023-11-28 11:02:38,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3474440.0, ans=0.1 2023-11-28 11:02:41,699 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4150, loss[loss=0.05689, simple_loss=0.07273, pruned_loss=0.009928, audio_tagging_loss=0.0106, over 14923.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.0907, pruned_loss=0.01245, audio_tagging_loss=0.008734, over 3040832.36 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:02:45,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3474506.6666666665, ans=0.125 2023-11-28 11:03:07,957 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521200 2023-11-28 11:03:08,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2023-11-28 11:03:15,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3474640.0, ans=0.0 2023-11-28 11:03:23,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2023-11-28 11:03:26,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3474706.6666666665, ans=0.125 2023-11-28 11:03:28,245 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:03:28,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3474773.3333333335, ans=0.125 2023-11-28 11:03:30,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3474773.3333333335, ans=0.0 2023-11-28 11:03:40,523 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4200, loss[loss=0.05915, simple_loss=0.08765, pruned_loss=0.007786, audio_tagging_loss=0.007538, over 15052.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.0902, pruned_loss=0.01235, audio_tagging_loss=0.008696, over 3032365.79 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:03:49,024 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.420e+01 8.843e+01 9.445e+01 1.017e+02 1.271e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 11:03:51,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3474906.6666666665, ans=0.125 2023-11-28 11:03:54,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3474906.6666666665, ans=0.1 2023-11-28 11:04:07,601 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521250 2023-11-28 11:04:18,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2023-11-28 11:04:39,655 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4250, loss[loss=0.06744, simple_loss=0.09169, pruned_loss=0.01335, audio_tagging_loss=0.008249, over 15238.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09045, pruned_loss=0.01255, audio_tagging_loss=0.008516, over 3035340.55 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:04:50,842 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.41 vs. limit=15.0 2023-11-28 11:04:58,763 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.63 vs. limit=15.0 2023-11-28 11:05:04,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3475306.6666666665, ans=0.0 2023-11-28 11:05:05,069 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=15.0 2023-11-28 11:05:05,824 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521300 2023-11-28 11:05:09,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3475306.6666666665, ans=0.125 2023-11-28 11:05:11,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3475306.6666666665, ans=0.2 2023-11-28 11:05:19,369 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2023-11-28 11:05:32,292 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.70 vs. limit=10.0 2023-11-28 11:05:36,550 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.40 vs. limit=15.0 2023-11-28 11:05:39,413 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4300, loss[loss=0.05699, simple_loss=0.07918, pruned_loss=0.009568, audio_tagging_loss=0.007831, over 15261.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09128, pruned_loss=0.01262, audio_tagging_loss=0.008435, over 3038016.32 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:05:47,120 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.428e+01 8.879e+01 9.468e+01 1.032e+02 1.370e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 11:06:04,311 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521350 2023-11-28 11:06:11,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3475640.0, ans=0.125 2023-11-28 11:06:19,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3475706.6666666665, ans=0.0 2023-11-28 11:06:31,248 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2023-11-28 11:06:37,552 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4350, loss[loss=0.05213, simple_loss=0.07014, pruned_loss=0.0088, audio_tagging_loss=0.008256, over 13856.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09146, pruned_loss=0.01272, audio_tagging_loss=0.008413, over 3031048.71 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:06:37,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3475840.0, ans=0.0 2023-11-28 11:06:39,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3475840.0, ans=0.0 2023-11-28 11:06:43,729 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-11-28 11:06:54,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3475906.6666666665, ans=0.125 2023-11-28 11:06:56,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3475906.6666666665, ans=0.0 2023-11-28 11:07:04,083 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521400 2023-11-28 11:07:07,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3475973.3333333335, ans=0.0 2023-11-28 11:07:08,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3475973.3333333335, ans=0.0 2023-11-28 11:07:22,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3476040.0, ans=0.1 2023-11-28 11:07:33,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3476106.6666666665, ans=0.125 2023-11-28 11:07:36,269 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4400, loss[loss=0.05975, simple_loss=0.07663, pruned_loss=0.01175, audio_tagging_loss=0.009686, over 15953.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09094, pruned_loss=0.01252, audio_tagging_loss=0.008427, over 3034349.65 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:07:39,131 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2023-11-28 11:07:40,029 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:07:44,599 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.862e+01 9.068e+01 9.728e+01 1.034e+02 1.377e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-28 11:07:50,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3476240.0, ans=0.0 2023-11-28 11:08:02,152 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521450 2023-11-28 11:08:04,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3476306.6666666665, ans=0.0 2023-11-28 11:08:13,866 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2023-11-28 11:08:14,835 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.67 vs. limit=15.0 2023-11-28 11:08:19,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3476373.3333333335, ans=0.125 2023-11-28 11:08:35,687 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4450, loss[loss=0.06976, simple_loss=0.09504, pruned_loss=0.01334, audio_tagging_loss=0.008906, over 15560.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09048, pruned_loss=0.01226, audio_tagging_loss=0.008335, over 3040034.71 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:08:37,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3476506.6666666665, ans=0.125 2023-11-28 11:08:41,691 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=15.0 2023-11-28 11:09:00,806 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521500 2023-11-28 11:09:16,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3476706.6666666665, ans=0.125 2023-11-28 11:09:29,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3476773.3333333335, ans=0.125 2023-11-28 11:09:33,523 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4500, loss[loss=0.05188, simple_loss=0.0632, pruned_loss=0.01041, audio_tagging_loss=0.00987, over 15180.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08957, pruned_loss=0.01212, audio_tagging_loss=0.008388, over 3040290.09 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:09:39,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3476840.0, ans=0.125 2023-11-28 11:09:41,298 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 8.818e+01 9.367e+01 9.979e+01 1.467e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 11:09:43,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3476906.6666666665, ans=0.2 2023-11-28 11:09:59,955 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521550 2023-11-28 11:10:06,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3476973.3333333335, ans=0.0 2023-11-28 11:10:17,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3477040.0, ans=0.0 2023-11-28 11:10:17,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3477040.0, ans=0.09899494936611666 2023-11-28 11:10:23,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3477106.6666666665, ans=0.125 2023-11-28 11:10:32,150 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4550, loss[loss=0.06548, simple_loss=0.09314, pruned_loss=0.01079, audio_tagging_loss=0.008127, over 15381.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.0894, pruned_loss=0.01199, audio_tagging_loss=0.008421, over 3042584.85 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:10:33,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3477173.3333333335, ans=0.125 2023-11-28 11:10:54,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3477240.0, ans=0.125 2023-11-28 11:10:58,553 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521600 2023-11-28 11:11:10,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3477373.3333333335, ans=0.1 2023-11-28 11:11:18,964 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-28 11:11:21,167 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.21 vs. limit=12.0 2023-11-28 11:11:21,734 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:11:28,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3477440.0, ans=0.2 2023-11-28 11:11:31,635 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4600, loss[loss=0.08276, simple_loss=0.1123, pruned_loss=0.01905, audio_tagging_loss=0.007581, over 15464.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08858, pruned_loss=0.01184, audio_tagging_loss=0.008497, over 3044301.15 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:11:35,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3477506.6666666665, ans=0.125 2023-11-28 11:11:39,935 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.011e+01 8.873e+01 9.292e+01 1.017e+02 1.163e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-28 11:11:42,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3477573.3333333335, ans=0.0 2023-11-28 11:11:52,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3477573.3333333335, ans=0.125 2023-11-28 11:11:56,652 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521650 2023-11-28 11:12:30,114 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4650, loss[loss=0.06971, simple_loss=0.09062, pruned_loss=0.01505, audio_tagging_loss=0.009346, over 14821.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08872, pruned_loss=0.01204, audio_tagging_loss=0.008565, over 3044017.30 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:12:32,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3477840.0, ans=0.125 2023-11-28 11:12:35,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3477840.0, ans=0.07 2023-11-28 11:12:52,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3477973.3333333335, ans=0.07 2023-11-28 11:12:55,418 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521700 2023-11-28 11:12:59,200 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.90 vs. limit=22.5 2023-11-28 11:13:09,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3478040.0, ans=0.1 2023-11-28 11:13:15,672 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-28 11:13:28,659 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4700, loss[loss=0.05531, simple_loss=0.06754, pruned_loss=0.01105, audio_tagging_loss=0.01049, over 15871.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08897, pruned_loss=0.01219, audio_tagging_loss=0.008704, over 3048244.41 frames. ], batch size: 61, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:13:36,449 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.974e+01 9.921e+01 1.076e+02 1.441e+02, threshold=1.984e+02, percent-clipped=0.0 2023-11-28 11:13:42,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3478240.0, ans=0.2 2023-11-28 11:13:55,075 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521750 2023-11-28 11:14:13,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3478373.3333333335, ans=0.125 2023-11-28 11:14:27,484 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4750, loss[loss=0.06592, simple_loss=0.08782, pruned_loss=0.01216, audio_tagging_loss=0.009852, over 14371.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08883, pruned_loss=0.01216, audio_tagging_loss=0.00878, over 3051433.53 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 8.0 2023-11-28 11:14:37,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3478573.3333333335, ans=0.125 2023-11-28 11:14:37,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3478573.3333333335, ans=0.2 2023-11-28 11:14:45,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3478573.3333333335, ans=0.2 2023-11-28 11:14:51,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3478640.0, ans=0.0 2023-11-28 11:14:51,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3478640.0, ans=0.125 2023-11-28 11:14:52,670 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521800 2023-11-28 11:14:57,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3478640.0, ans=0.125 2023-11-28 11:14:59,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.08 vs. limit=10.0 2023-11-28 11:15:17,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3478773.3333333335, ans=0.125 2023-11-28 11:15:19,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3478773.3333333335, ans=0.125 2023-11-28 11:15:22,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3478773.3333333335, ans=0.125 2023-11-28 11:15:25,677 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4800, loss[loss=0.06074, simple_loss=0.09172, pruned_loss=0.005596, audio_tagging_loss=0.009286, over 15779.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08915, pruned_loss=0.01216, audio_tagging_loss=0.008804, over 3050150.24 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:15:34,579 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.828e+01 9.577e+01 1.068e+02 1.342e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 11:15:51,077 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521850 2023-11-28 11:16:01,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3479040.0, ans=0.0 2023-11-28 11:16:05,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3479040.0, ans=0.125 2023-11-28 11:16:23,915 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4850, loss[loss=0.09984, simple_loss=0.137, pruned_loss=0.02314, audio_tagging_loss=0.008196, over 15850.00 frames. ], tot_loss[loss=0.06642, simple_loss=0.09038, pruned_loss=0.01235, audio_tagging_loss=0.008878, over 3051430.86 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:16:37,864 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.23 vs. limit=8.0 2023-11-28 11:16:49,495 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.95 vs. limit=10.0 2023-11-28 11:16:49,859 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521900 2023-11-28 11:16:51,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3479306.6666666665, ans=0.0 2023-11-28 11:16:59,208 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.96 vs. limit=10.0 2023-11-28 11:17:04,830 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2023-11-28 11:17:08,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3479373.3333333335, ans=0.125 2023-11-28 11:17:08,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3479373.3333333335, ans=0.0 2023-11-28 11:17:22,666 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4900, loss[loss=0.06049, simple_loss=0.0773, pruned_loss=0.01072, audio_tagging_loss=0.01112, over 15101.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.08955, pruned_loss=0.01235, audio_tagging_loss=0.008865, over 3050524.65 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:17:26,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3479506.6666666665, ans=0.1 2023-11-28 11:17:27,671 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.83 vs. limit=12.0 2023-11-28 11:17:32,622 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.438e+01 8.706e+01 9.491e+01 1.021e+02 1.931e+02, threshold=1.898e+02, percent-clipped=1.0 2023-11-28 11:17:37,582 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=12.0 2023-11-28 11:17:43,271 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2023-11-28 11:17:48,977 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 521950 2023-11-28 11:17:54,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3479640.0, ans=0.125 2023-11-28 11:18:05,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-11-28 11:18:12,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3479773.3333333335, ans=0.2 2023-11-28 11:18:21,529 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 4950, loss[loss=0.06059, simple_loss=0.08048, pruned_loss=0.01064, audio_tagging_loss=0.00971, over 14807.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08854, pruned_loss=0.01213, audio_tagging_loss=0.00873, over 3045811.25 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:18:47,266 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522000 2023-11-28 11:18:49,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3479973.3333333335, ans=0.0 2023-11-28 11:18:58,200 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.58 vs. limit=15.0 2023-11-28 11:19:20,166 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5000, loss[loss=0.06672, simple_loss=0.09436, pruned_loss=0.01326, audio_tagging_loss=0.006284, over 16119.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08918, pruned_loss=0.01215, audio_tagging_loss=0.00865, over 3037979.33 frames. ], batch size: 62, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:19:29,628 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.223e+01 8.777e+01 9.263e+01 9.841e+01 1.147e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-28 11:19:33,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3480240.0, ans=0.1 2023-11-28 11:19:34,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3480240.0, ans=0.5 2023-11-28 11:19:46,475 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522050 2023-11-28 11:20:01,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3480373.3333333335, ans=0.1 2023-11-28 11:20:06,544 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2023-11-28 11:20:09,896 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-28 11:20:18,900 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5050, loss[loss=0.05135, simple_loss=0.0586, pruned_loss=0.009661, audio_tagging_loss=0.01239, over 14810.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08845, pruned_loss=0.01198, audio_tagging_loss=0.008669, over 3038035.34 frames. ], batch size: 60, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:20:31,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3480573.3333333335, ans=0.125 2023-11-28 11:20:34,028 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.23 vs. limit=22.5 2023-11-28 11:20:37,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3480573.3333333335, ans=0.125 2023-11-28 11:20:38,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3480573.3333333335, ans=0.0 2023-11-28 11:20:44,568 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522100 2023-11-28 11:21:04,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3480773.3333333335, ans=0.1 2023-11-28 11:21:11,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3480773.3333333335, ans=0.2 2023-11-28 11:21:17,561 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5100, loss[loss=0.06458, simple_loss=0.08651, pruned_loss=0.0127, audio_tagging_loss=0.008627, over 14746.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08825, pruned_loss=0.0119, audio_tagging_loss=0.008638, over 3039566.21 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:21:26,387 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 8.858e+01 9.488e+01 1.012e+02 1.214e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 11:21:43,433 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522150 2023-11-28 11:21:59,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3481040.0, ans=0.0 2023-11-28 11:22:10,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3481106.6666666665, ans=0.125 2023-11-28 11:22:11,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3481106.6666666665, ans=0.0 2023-11-28 11:22:15,701 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5150, loss[loss=0.0817, simple_loss=0.1152, pruned_loss=0.01463, audio_tagging_loss=0.009453, over 15321.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08803, pruned_loss=0.01187, audio_tagging_loss=0.008602, over 3039906.07 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:22:19,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3481173.3333333335, ans=0.125 2023-11-28 11:22:20,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3481173.3333333335, ans=0.125 2023-11-28 11:22:42,094 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522200 2023-11-28 11:23:03,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3481440.0, ans=0.0 2023-11-28 11:23:14,772 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5200, loss[loss=0.07116, simple_loss=0.09091, pruned_loss=0.01756, audio_tagging_loss=0.008149, over 14604.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08858, pruned_loss=0.01201, audio_tagging_loss=0.008662, over 3044535.73 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 11:23:24,315 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.856e+01 8.751e+01 9.601e+01 1.026e+02 1.242e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 11:23:27,301 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.94 vs. limit=12.0 2023-11-28 11:23:40,008 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522250 2023-11-28 11:23:57,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3481706.6666666665, ans=0.125 2023-11-28 11:23:57,809 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-28 11:24:01,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3481773.3333333335, ans=0.0 2023-11-28 11:24:02,169 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-11-28 11:24:05,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3481773.3333333335, ans=0.0 2023-11-28 11:24:12,206 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5250, loss[loss=0.05648, simple_loss=0.0795, pruned_loss=0.008898, audio_tagging_loss=0.007827, over 17417.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08945, pruned_loss=0.01214, audio_tagging_loss=0.008529, over 3051082.28 frames. ], batch size: 65, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:24:15,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3481840.0, ans=0.125 2023-11-28 11:24:16,054 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=7.28 vs. limit=12.0 2023-11-28 11:24:18,159 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2023-11-28 11:24:34,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3481973.3333333335, ans=0.125 2023-11-28 11:24:37,398 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522300 2023-11-28 11:24:41,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3481973.3333333335, ans=0.125 2023-11-28 11:24:58,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3482106.6666666665, ans=0.1 2023-11-28 11:25:09,497 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5300, loss[loss=0.06899, simple_loss=0.1031, pruned_loss=0.009794, audio_tagging_loss=0.007624, over 15192.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08972, pruned_loss=0.01218, audio_tagging_loss=0.008514, over 3051175.99 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:25:19,333 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.372e+01 8.992e+01 9.491e+01 1.033e+02 1.599e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 11:25:25,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3482240.0, ans=0.2 2023-11-28 11:25:35,813 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522350 2023-11-28 11:25:45,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3482373.3333333335, ans=0.2 2023-11-28 11:25:55,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=12.0 2023-11-28 11:25:59,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3482440.0, ans=0.125 2023-11-28 11:26:01,925 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.06 vs. limit=15.0 2023-11-28 11:26:06,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3482506.6666666665, ans=0.125 2023-11-28 11:26:06,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3482506.6666666665, ans=0.1 2023-11-28 11:26:07,657 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5350, loss[loss=0.07725, simple_loss=0.1029, pruned_loss=0.0174, audio_tagging_loss=0.008379, over 15800.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09017, pruned_loss=0.01226, audio_tagging_loss=0.008593, over 3046997.26 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:26:15,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3482506.6666666665, ans=0.125 2023-11-28 11:26:33,686 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522400 2023-11-28 11:26:34,430 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.23 vs. limit=22.5 2023-11-28 11:27:00,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3482773.3333333335, ans=0.125 2023-11-28 11:27:07,431 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5400, loss[loss=0.08598, simple_loss=0.1148, pruned_loss=0.02129, audio_tagging_loss=0.007287, over 15069.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09033, pruned_loss=0.01228, audio_tagging_loss=0.008614, over 3040183.79 frames. ], batch size: 53, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:27:13,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3482840.0, ans=0.125 2023-11-28 11:27:17,351 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.334e+01 8.830e+01 9.403e+01 1.046e+02 1.380e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 11:27:31,932 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522450 2023-11-28 11:27:36,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3482973.3333333335, ans=0.1 2023-11-28 11:27:37,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3482973.3333333335, ans=0.125 2023-11-28 11:28:00,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3483106.6666666665, ans=0.125 2023-11-28 11:28:05,736 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5450, loss[loss=0.05774, simple_loss=0.07804, pruned_loss=0.007736, audio_tagging_loss=0.01099, over 15166.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.09053, pruned_loss=0.01239, audio_tagging_loss=0.008611, over 3041338.98 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:28:19,651 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.48 vs. limit=10.0 2023-11-28 11:28:32,947 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522500 2023-11-28 11:28:33,653 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.35 vs. limit=12.0 2023-11-28 11:28:43,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3483373.3333333335, ans=0.125 2023-11-28 11:28:44,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3483373.3333333335, ans=0.125 2023-11-28 11:28:48,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3483373.3333333335, ans=0.125 2023-11-28 11:29:04,411 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5500, loss[loss=0.05476, simple_loss=0.07106, pruned_loss=0.008443, audio_tagging_loss=0.01078, over 15675.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08995, pruned_loss=0.01228, audio_tagging_loss=0.008767, over 3038040.53 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:29:07,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3483506.6666666665, ans=0.0 2023-11-28 11:29:15,287 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.510e+01 8.610e+01 9.341e+01 1.002e+02 1.177e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 11:29:17,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3483573.3333333335, ans=0.1 2023-11-28 11:29:30,895 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522550 2023-11-28 11:29:31,284 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2023-11-28 11:29:33,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3483640.0, ans=0.0 2023-11-28 11:29:35,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3483640.0, ans=0.0 2023-11-28 11:29:43,247 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.54 vs. limit=15.0 2023-11-28 11:29:56,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3483773.3333333335, ans=0.0 2023-11-28 11:30:04,936 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5550, loss[loss=0.05879, simple_loss=0.07903, pruned_loss=0.009729, audio_tagging_loss=0.009548, over 13670.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08863, pruned_loss=0.01217, audio_tagging_loss=0.008927, over 3034195.72 frames. ], batch size: 52, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:30:15,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3483906.6666666665, ans=0.0 2023-11-28 11:30:17,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3483906.6666666665, ans=0.0 2023-11-28 11:30:29,953 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522600 2023-11-28 11:30:45,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3484040.0, ans=0.125 2023-11-28 11:30:57,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3484106.6666666665, ans=0.1 2023-11-28 11:31:03,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3484173.3333333335, ans=0.05 2023-11-28 11:31:04,133 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5600, loss[loss=0.08452, simple_loss=0.128, pruned_loss=0.01546, audio_tagging_loss=0.005039, over 15346.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08909, pruned_loss=0.01205, audio_tagging_loss=0.00892, over 3042128.07 frames. ], batch size: 54, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 11:31:08,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3484173.3333333335, ans=0.2 2023-11-28 11:31:14,179 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.656e+01 9.030e+01 9.835e+01 1.064e+02 3.078e+02, threshold=1.967e+02, percent-clipped=1.0 2023-11-28 11:31:29,409 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522650 2023-11-28 11:31:51,258 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:32:02,662 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5650, loss[loss=0.06241, simple_loss=0.08019, pruned_loss=0.01205, audio_tagging_loss=0.01027, over 15590.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08948, pruned_loss=0.0121, audio_tagging_loss=0.008988, over 3049775.94 frames. ], batch size: 59, lr: 1.54e-03, grad_scale: 32.0 2023-11-28 11:32:30,008 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522700 2023-11-28 11:32:42,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3484706.6666666665, ans=0.0 2023-11-28 11:32:50,602 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:32:54,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3484773.3333333335, ans=0.125 2023-11-28 11:32:59,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3484773.3333333335, ans=0.1 2023-11-28 11:33:02,819 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5700, loss[loss=0.07403, simple_loss=0.1074, pruned_loss=0.0138, audio_tagging_loss=0.006556, over 15374.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.0904, pruned_loss=0.01216, audio_tagging_loss=0.008948, over 3049755.74 frames. ], batch size: 57, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:33:06,267 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.62 vs. limit=15.0 2023-11-28 11:33:09,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3484840.0, ans=0.1 2023-11-28 11:33:15,235 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.269e+01 8.782e+01 9.296e+01 1.023e+02 1.172e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 11:33:15,818 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.78 vs. limit=22.5 2023-11-28 11:33:24,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3484906.6666666665, ans=0.05 2023-11-28 11:33:24,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3484906.6666666665, ans=0.1 2023-11-28 11:33:28,821 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522750 2023-11-28 11:33:45,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3485040.0, ans=0.1 2023-11-28 11:33:45,528 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.36 vs. limit=22.5 2023-11-28 11:33:59,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3485106.6666666665, ans=0.0 2023-11-28 11:34:00,722 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.97 vs. limit=10.0 2023-11-28 11:34:02,527 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5750, loss[loss=0.05935, simple_loss=0.08202, pruned_loss=0.0114, audio_tagging_loss=0.006937, over 14777.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08962, pruned_loss=0.01211, audio_tagging_loss=0.008741, over 3054129.60 frames. ], batch size: 56, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:34:02,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3485173.3333333335, ans=0.2 2023-11-28 11:34:27,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3485306.6666666665, ans=0.125 2023-11-28 11:34:28,133 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522800 2023-11-28 11:35:01,883 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5800, loss[loss=0.06285, simple_loss=0.08995, pruned_loss=0.007149, audio_tagging_loss=0.01072, over 15585.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08911, pruned_loss=0.01207, audio_tagging_loss=0.008688, over 3056206.28 frames. ], batch size: 58, lr: 1.54e-03, grad_scale: 16.0 2023-11-28 11:35:13,802 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.175e+01 8.794e+01 9.521e+01 1.033e+02 1.295e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 11:35:25,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3485640.0, ans=0.125 2023-11-28 11:35:27,684 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2023-11-28 11:35:28,288 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522850 2023-11-28 11:35:34,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3485640.0, ans=0.2 2023-11-28 11:35:51,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3485773.3333333335, ans=0.0 2023-11-28 11:35:55,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3485773.3333333335, ans=0.1 2023-11-28 11:36:00,888 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5850, loss[loss=0.05294, simple_loss=0.07081, pruned_loss=0.009378, audio_tagging_loss=0.008162, over 15143.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09, pruned_loss=0.01216, audio_tagging_loss=0.00866, over 3052284.70 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:36:01,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3485840.0, ans=0.1 2023-11-28 11:36:06,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3485840.0, ans=0.1 2023-11-28 11:36:08,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3485840.0, ans=15.0 2023-11-28 11:36:21,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3485906.6666666665, ans=0.0 2023-11-28 11:36:26,654 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522900 2023-11-28 11:36:54,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3486106.6666666665, ans=0.2 2023-11-28 11:36:57,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3486106.6666666665, ans=0.125 2023-11-28 11:36:58,607 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.72 vs. limit=12.0 2023-11-28 11:36:59,222 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5900, loss[loss=0.07037, simple_loss=0.09487, pruned_loss=0.01293, audio_tagging_loss=0.01, over 13764.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09019, pruned_loss=0.01223, audio_tagging_loss=0.008654, over 3049392.04 frames. ], batch size: 52, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:37:11,244 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.935e+01 9.645e+01 1.023e+02 1.416e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-28 11:37:17,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3486240.0, ans=0.125 2023-11-28 11:37:25,614 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 522950 2023-11-28 11:37:30,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3486306.6666666665, ans=0.5 2023-11-28 11:37:31,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3486306.6666666665, ans=0.0 2023-11-28 11:37:52,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3486440.0, ans=0.125 2023-11-28 11:37:54,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3486440.0, ans=0.0 2023-11-28 11:37:58,744 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 5950, loss[loss=0.06158, simple_loss=0.09105, pruned_loss=0.008008, audio_tagging_loss=0.008043, over 14967.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09053, pruned_loss=0.01213, audio_tagging_loss=0.008587, over 3048284.91 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:38:00,491 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2023-11-28 11:38:24,962 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523000 2023-11-28 11:38:41,922 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.95 vs. limit=10.0 2023-11-28 11:38:57,788 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6000, loss[loss=0.07055, simple_loss=0.09171, pruned_loss=0.01381, audio_tagging_loss=0.01089, over 14477.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08871, pruned_loss=0.01189, audio_tagging_loss=0.008643, over 3039018.72 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:38:57,789 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 11:39:11,527 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.9869, 3.2711, 3.5434, 2.8971, 3.7204, 3.7759, 3.8334, 3.7567], device='cuda:2') 2023-11-28 11:39:30,102 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3538, 5.0152, 4.6289, 5.1708], device='cuda:2') 2023-11-28 11:39:33,661 INFO [train_asr.py:1267] (2/4) Epoch 44, validation: loss=0.05792, simple_loss=0.0506, pruned_loss=0.005293, audio_tagging_loss=0.02732, over 4681554.00 frames. 2023-11-28 11:39:33,662 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 11:39:38,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3486840.0, ans=0.125 2023-11-28 11:39:43,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3486840.0, ans=0.0 2023-11-28 11:39:45,289 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.423e+01 8.849e+01 9.422e+01 1.008e+02 1.234e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 11:39:45,809 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2023-11-28 11:39:59,488 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523050 2023-11-28 11:40:10,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3487040.0, ans=0.125 2023-11-28 11:40:12,820 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.35 vs. limit=22.5 2023-11-28 11:40:20,197 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:40:20,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3487106.6666666665, ans=0.04949747468305833 2023-11-28 11:40:20,684 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2023-11-28 11:40:25,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3487106.6666666665, ans=0.125 2023-11-28 11:40:29,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3487106.6666666665, ans=0.2 2023-11-28 11:40:31,913 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6050, loss[loss=0.06961, simple_loss=0.09648, pruned_loss=0.01624, audio_tagging_loss=0.005133, over 15307.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.08827, pruned_loss=0.01196, audio_tagging_loss=0.00867, over 3028663.09 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:40:41,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3487173.3333333335, ans=0.125 2023-11-28 11:40:49,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3487240.0, ans=0.125 2023-11-28 11:40:49,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3487240.0, ans=0.1 2023-11-28 11:40:58,473 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523100 2023-11-28 11:41:06,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3487373.3333333335, ans=0.1 2023-11-28 11:41:29,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3487440.0, ans=0.125 2023-11-28 11:41:31,062 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6100, loss[loss=0.05765, simple_loss=0.07322, pruned_loss=0.01115, audio_tagging_loss=0.009883, over 15820.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08899, pruned_loss=0.01198, audio_tagging_loss=0.008657, over 3036779.61 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:41:43,532 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.905e+01 9.501e+01 1.004e+02 1.216e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 11:41:56,915 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523150 2023-11-28 11:42:03,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3487640.0, ans=0.1 2023-11-28 11:42:04,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3487640.0, ans=0.0 2023-11-28 11:42:30,248 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6150, loss[loss=0.07514, simple_loss=0.09465, pruned_loss=0.01507, audio_tagging_loss=0.01274, over 14655.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08891, pruned_loss=0.01211, audio_tagging_loss=0.008703, over 3039402.40 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:42:35,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3487840.0, ans=0.1 2023-11-28 11:42:54,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3487973.3333333335, ans=0.0 2023-11-28 11:42:54,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3487973.3333333335, ans=0.0 2023-11-28 11:42:56,255 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523200 2023-11-28 11:43:13,620 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-28 11:43:24,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3488106.6666666665, ans=0.0 2023-11-28 11:43:28,694 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6200, loss[loss=0.07583, simple_loss=0.107, pruned_loss=0.01336, audio_tagging_loss=0.008965, over 14500.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08944, pruned_loss=0.01228, audio_tagging_loss=0.008723, over 3045278.03 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:43:31,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3488173.3333333335, ans=0.125 2023-11-28 11:43:32,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3488173.3333333335, ans=0.09899494936611666 2023-11-28 11:43:40,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3488240.0, ans=0.0 2023-11-28 11:43:42,408 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 8.740e+01 9.407e+01 1.006e+02 1.193e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-28 11:43:50,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3488240.0, ans=0.125 2023-11-28 11:43:55,987 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523250 2023-11-28 11:44:13,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3488373.3333333335, ans=0.0 2023-11-28 11:44:18,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3488440.0, ans=0.05 2023-11-28 11:44:18,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3488440.0, ans=0.125 2023-11-28 11:44:19,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3488440.0, ans=0.125 2023-11-28 11:44:19,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3488440.0, ans=0.0 2023-11-28 11:44:25,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3488440.0, ans=0.2 2023-11-28 11:44:28,619 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6250, loss[loss=0.07327, simple_loss=0.104, pruned_loss=0.01238, audio_tagging_loss=0.008873, over 15274.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08907, pruned_loss=0.01218, audio_tagging_loss=0.008753, over 3048471.75 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:44:35,586 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.87 vs. limit=22.5 2023-11-28 11:44:42,524 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=15.0 2023-11-28 11:44:44,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3488573.3333333335, ans=0.125 2023-11-28 11:44:54,341 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523300 2023-11-28 11:45:10,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3488706.6666666665, ans=0.125 2023-11-28 11:45:23,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3488773.3333333335, ans=0.125 2023-11-28 11:45:27,602 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6300, loss[loss=0.06921, simple_loss=0.1016, pruned_loss=0.01155, audio_tagging_loss=0.006839, over 15110.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08896, pruned_loss=0.01206, audio_tagging_loss=0.008875, over 3047156.67 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:45:28,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3488840.0, ans=0.125 2023-11-28 11:45:31,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3488840.0, ans=0.125 2023-11-28 11:45:34,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3488840.0, ans=0.035 2023-11-28 11:45:39,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.498e+01 8.790e+01 9.480e+01 1.019e+02 1.243e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 11:45:53,516 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523350 2023-11-28 11:45:54,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3488973.3333333335, ans=0.1 2023-11-28 11:45:58,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3488973.3333333335, ans=0.2 2023-11-28 11:46:21,781 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.86 vs. limit=15.0 2023-11-28 11:46:25,308 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6350, loss[loss=0.0701, simple_loss=0.09942, pruned_loss=0.01111, audio_tagging_loss=0.009274, over 15111.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08904, pruned_loss=0.01221, audio_tagging_loss=0.00894, over 3038920.85 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:46:37,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3489240.0, ans=0.5 2023-11-28 11:46:46,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3489240.0, ans=0.0 2023-11-28 11:46:51,367 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523400 2023-11-28 11:46:52,877 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=15.0 2023-11-28 11:47:09,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3489373.3333333335, ans=0.125 2023-11-28 11:47:23,880 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6400, loss[loss=0.05571, simple_loss=0.07533, pruned_loss=0.008108, audio_tagging_loss=0.009935, over 14468.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09006, pruned_loss=0.01238, audio_tagging_loss=0.008934, over 3039555.22 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:47:36,647 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 8.832e+01 9.473e+01 1.012e+02 1.860e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 11:47:40,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3489573.3333333335, ans=0.0 2023-11-28 11:47:49,011 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523450 2023-11-28 11:47:49,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2023-11-28 11:47:51,530 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-28 11:47:52,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3489640.0, ans=0.0 2023-11-28 11:48:06,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3489706.6666666665, ans=0.125 2023-11-28 11:48:10,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3489773.3333333335, ans=0.0 2023-11-28 11:48:22,438 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6450, loss[loss=0.06226, simple_loss=0.08457, pruned_loss=0.01065, audio_tagging_loss=0.009323, over 14802.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08956, pruned_loss=0.01224, audio_tagging_loss=0.008997, over 3042032.29 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:48:28,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3489840.0, ans=0.125 2023-11-28 11:48:35,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3489906.6666666665, ans=0.1 2023-11-28 11:48:41,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3489906.6666666665, ans=0.125 2023-11-28 11:48:47,543 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523500 2023-11-28 11:48:55,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3489973.3333333335, ans=0.125 2023-11-28 11:49:09,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3490106.6666666665, ans=0.025 2023-11-28 11:49:11,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3490106.6666666665, ans=0.2 2023-11-28 11:49:13,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3490106.6666666665, ans=0.0 2023-11-28 11:49:20,230 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6500, loss[loss=0.05326, simple_loss=0.07425, pruned_loss=0.006612, audio_tagging_loss=0.009521, over 16524.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08913, pruned_loss=0.01215, audio_tagging_loss=0.009094, over 3043920.60 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:49:20,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3490173.3333333335, ans=0.125 2023-11-28 11:49:26,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3490173.3333333335, ans=0.0 2023-11-28 11:49:32,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3490240.0, ans=0.2 2023-11-28 11:49:33,078 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.765e+01 8.967e+01 9.507e+01 1.009e+02 1.264e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 11:49:44,697 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.66 vs. limit=10.0 2023-11-28 11:49:46,617 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523550 2023-11-28 11:50:18,385 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6550, loss[loss=0.06619, simple_loss=0.09443, pruned_loss=0.008944, audio_tagging_loss=0.01003, over 16180.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.08992, pruned_loss=0.01246, audio_tagging_loss=0.008929, over 3051037.40 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:50:23,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3490506.6666666665, ans=0.125 2023-11-28 11:50:27,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3490506.6666666665, ans=0.0 2023-11-28 11:50:35,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3490573.3333333335, ans=0.0 2023-11-28 11:50:44,083 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523600 2023-11-28 11:51:08,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3490773.3333333335, ans=0.0 2023-11-28 11:51:17,303 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6600, loss[loss=0.05943, simple_loss=0.07577, pruned_loss=0.01222, audio_tagging_loss=0.009333, over 14673.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.08985, pruned_loss=0.01243, audio_tagging_loss=0.008706, over 3046509.09 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:51:30,536 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.303e+01 8.875e+01 9.605e+01 1.016e+02 1.315e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 11:51:31,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3490906.6666666665, ans=0.0 2023-11-28 11:51:37,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3490906.6666666665, ans=0.2 2023-11-28 11:51:41,956 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523650 2023-11-28 11:52:14,925 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6650, loss[loss=0.06863, simple_loss=0.09014, pruned_loss=0.01788, audio_tagging_loss=0.005684, over 14771.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08879, pruned_loss=0.01215, audio_tagging_loss=0.008621, over 3052260.56 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:52:35,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=3491240.0, ans=15.0 2023-11-28 11:52:41,221 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523700 2023-11-28 11:52:54,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3491373.3333333335, ans=0.0 2023-11-28 11:52:59,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3491373.3333333335, ans=0.2 2023-11-28 11:53:02,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3491440.0, ans=0.125 2023-11-28 11:53:03,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3491440.0, ans=0.125 2023-11-28 11:53:11,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3491506.6666666665, ans=0.125 2023-11-28 11:53:13,429 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6700, loss[loss=0.08529, simple_loss=0.1153, pruned_loss=0.01937, audio_tagging_loss=0.008284, over 15107.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08778, pruned_loss=0.01198, audio_tagging_loss=0.008603, over 3053076.15 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:53:28,303 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.754e+01 9.466e+01 1.016e+02 1.269e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 11:53:39,720 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523750 2023-11-28 11:53:48,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3491706.6666666665, ans=0.125 2023-11-28 11:53:51,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3491706.6666666665, ans=0.2 2023-11-28 11:54:06,194 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2023-11-28 11:54:12,335 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6750, loss[loss=0.04131, simple_loss=0.05193, pruned_loss=0.007011, audio_tagging_loss=0.008334, over 14091.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08789, pruned_loss=0.01199, audio_tagging_loss=0.008624, over 3045139.03 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:54:36,933 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523800 2023-11-28 11:54:46,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3492040.0, ans=0.125 2023-11-28 11:54:47,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3492040.0, ans=0.0 2023-11-28 11:55:07,907 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2023-11-28 11:55:10,805 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6800, loss[loss=0.07549, simple_loss=0.1134, pruned_loss=0.01371, audio_tagging_loss=0.00508, over 15686.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.0886, pruned_loss=0.01209, audio_tagging_loss=0.008674, over 3047976.01 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 11:55:24,139 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.914e+01 9.606e+01 1.021e+02 1.348e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 11:55:25,829 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2023-11-28 11:55:25,873 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.33 vs. limit=15.0 2023-11-28 11:55:35,900 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523850 2023-11-28 11:55:37,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3492306.6666666665, ans=0.0 2023-11-28 11:55:41,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3492306.6666666665, ans=0.025 2023-11-28 11:55:48,194 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2023-11-28 11:55:51,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3492373.3333333335, ans=0.0 2023-11-28 11:55:56,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3492440.0, ans=0.0 2023-11-28 11:56:09,067 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6850, loss[loss=0.04276, simple_loss=0.04748, pruned_loss=0.007763, audio_tagging_loss=0.01126, over 14942.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08788, pruned_loss=0.01191, audio_tagging_loss=0.008593, over 3044022.42 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:56:09,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3492506.6666666665, ans=0.125 2023-11-28 11:56:16,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3492506.6666666665, ans=0.125 2023-11-28 11:56:27,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3492573.3333333335, ans=0.0 2023-11-28 11:56:27,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3492573.3333333335, ans=0.125 2023-11-28 11:56:28,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3492573.3333333335, ans=0.125 2023-11-28 11:56:35,869 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523900 2023-11-28 11:56:45,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3492706.6666666665, ans=0.2 2023-11-28 11:56:45,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3492706.6666666665, ans=0.125 2023-11-28 11:56:56,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3492773.3333333335, ans=0.1 2023-11-28 11:57:01,977 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:57:03,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3492773.3333333335, ans=0.125 2023-11-28 11:57:07,908 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6900, loss[loss=0.04286, simple_loss=0.04895, pruned_loss=0.007735, audio_tagging_loss=0.01065, over 15905.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08852, pruned_loss=0.01198, audio_tagging_loss=0.008631, over 3049403.21 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:57:23,411 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.756e+01 9.577e+01 1.062e+02 1.292e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 11:57:33,296 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 523950 2023-11-28 11:57:57,177 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 11:57:57,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3493106.6666666665, ans=0.125 2023-11-28 11:57:57,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3493106.6666666665, ans=0.1 2023-11-28 11:57:58,507 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 11:58:03,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3493106.6666666665, ans=0.2 2023-11-28 11:58:06,565 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 6950, loss[loss=0.05094, simple_loss=0.06612, pruned_loss=0.01003, audio_tagging_loss=0.007846, over 14821.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09055, pruned_loss=0.01234, audio_tagging_loss=0.008451, over 3051279.35 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:58:10,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3493173.3333333335, ans=0.125 2023-11-28 11:58:25,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3493240.0, ans=0.1 2023-11-28 11:58:31,617 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524000 2023-11-28 11:58:44,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3493373.3333333335, ans=0.07 2023-11-28 11:58:46,704 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2023-11-28 11:58:51,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3493373.3333333335, ans=0.125 2023-11-28 11:59:06,804 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7000, loss[loss=0.05807, simple_loss=0.0863, pruned_loss=0.008401, audio_tagging_loss=0.006518, over 14982.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09074, pruned_loss=0.01239, audio_tagging_loss=0.008461, over 3046127.94 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 11:59:10,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3493506.6666666665, ans=0.125 2023-11-28 11:59:21,774 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.923e+01 9.384e+01 1.033e+02 1.328e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 11:59:24,136 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.87 vs. limit=15.0 2023-11-28 11:59:32,795 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524050 2023-11-28 11:59:48,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3493706.6666666665, ans=0.2 2023-11-28 11:59:48,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3493706.6666666665, ans=0.07 2023-11-28 12:00:05,203 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7050, loss[loss=0.04523, simple_loss=0.05449, pruned_loss=0.008243, audio_tagging_loss=0.009742, over 14639.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08909, pruned_loss=0.01202, audio_tagging_loss=0.008622, over 3049027.07 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:00:31,265 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524100 2023-11-28 12:00:33,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3493973.3333333335, ans=0.0 2023-11-28 12:00:34,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3493973.3333333335, ans=0.0 2023-11-28 12:00:45,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3494040.0, ans=0.05 2023-11-28 12:00:47,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3494040.0, ans=0.0 2023-11-28 12:00:47,864 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.26 vs. limit=22.5 2023-11-28 12:00:51,787 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.38 vs. limit=15.0 2023-11-28 12:00:56,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3494106.6666666665, ans=0.2 2023-11-28 12:01:03,878 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7100, loss[loss=0.06632, simple_loss=0.09123, pruned_loss=0.01041, audio_tagging_loss=0.01029, over 15530.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.0897, pruned_loss=0.01218, audio_tagging_loss=0.00867, over 3055234.36 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:01:05,740 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.19 vs. limit=15.0 2023-11-28 12:01:18,902 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.268e+01 8.879e+01 9.631e+01 1.062e+02 1.360e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-28 12:01:29,472 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524150 2023-11-28 12:01:39,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3494373.3333333335, ans=0.1 2023-11-28 12:01:54,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3494440.0, ans=0.1 2023-11-28 12:02:01,619 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7150, loss[loss=0.06807, simple_loss=0.09581, pruned_loss=0.01203, audio_tagging_loss=0.008128, over 15682.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08994, pruned_loss=0.01231, audio_tagging_loss=0.00874, over 3052696.51 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:02:07,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3494506.6666666665, ans=0.125 2023-11-28 12:02:07,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3494506.6666666665, ans=0.0 2023-11-28 12:02:12,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3494573.3333333335, ans=0.1 2023-11-28 12:02:12,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3494573.3333333335, ans=0.0 2023-11-28 12:02:27,429 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524200 2023-11-28 12:02:48,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3494773.3333333335, ans=0.2 2023-11-28 12:02:50,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3494773.3333333335, ans=0.125 2023-11-28 12:03:00,011 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7200, loss[loss=0.06605, simple_loss=0.08287, pruned_loss=0.01264, audio_tagging_loss=0.01198, over 15548.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08934, pruned_loss=0.01209, audio_tagging_loss=0.008872, over 3052636.79 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:03:05,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3494840.0, ans=0.0 2023-11-28 12:03:07,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3494840.0, ans=0.125 2023-11-28 12:03:11,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3494906.6666666665, ans=0.125 2023-11-28 12:03:12,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3494906.6666666665, ans=0.2 2023-11-28 12:03:15,005 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 8.816e+01 9.709e+01 1.018e+02 1.271e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 12:03:20,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3494906.6666666665, ans=0.2 2023-11-28 12:03:21,305 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.67 vs. limit=15.0 2023-11-28 12:03:21,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3494973.3333333335, ans=0.125 2023-11-28 12:03:25,929 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524250 2023-11-28 12:03:26,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3494973.3333333335, ans=0.0 2023-11-28 12:03:27,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3494973.3333333335, ans=0.0 2023-11-28 12:03:57,749 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7250, loss[loss=0.05175, simple_loss=0.07236, pruned_loss=0.006106, audio_tagging_loss=0.009471, over 15727.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08929, pruned_loss=0.01209, audio_tagging_loss=0.009013, over 3045776.49 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:04:11,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3495240.0, ans=0.1 2023-11-28 12:04:23,236 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524300 2023-11-28 12:04:26,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3495306.6666666665, ans=0.0 2023-11-28 12:04:40,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3495373.3333333335, ans=0.0 2023-11-28 12:04:41,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3495373.3333333335, ans=10.0 2023-11-28 12:04:56,005 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7300, loss[loss=0.08036, simple_loss=0.1059, pruned_loss=0.01866, audio_tagging_loss=0.008739, over 15895.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08907, pruned_loss=0.01214, audio_tagging_loss=0.008943, over 3037522.36 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:05:01,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3495506.6666666665, ans=0.2 2023-11-28 12:05:12,056 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 8.741e+01 9.294e+01 1.019e+02 1.260e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-28 12:05:21,875 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524350 2023-11-28 12:05:22,530 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.58 vs. limit=15.0 2023-11-28 12:05:27,779 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=12.0 2023-11-28 12:05:33,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3495706.6666666665, ans=0.125 2023-11-28 12:05:53,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3495840.0, ans=0.125 2023-11-28 12:05:54,074 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7350, loss[loss=0.0532, simple_loss=0.0701, pruned_loss=0.01084, audio_tagging_loss=0.007308, over 15720.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08944, pruned_loss=0.01222, audio_tagging_loss=0.008817, over 3036215.01 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:06:11,279 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2023-11-28 12:06:13,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3495906.6666666665, ans=0.1 2023-11-28 12:06:19,677 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524400 2023-11-28 12:06:20,236 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=22.5 2023-11-28 12:06:37,733 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=22.5 2023-11-28 12:06:53,736 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7400, loss[loss=0.06032, simple_loss=0.06778, pruned_loss=0.01273, audio_tagging_loss=0.0137, over 15442.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08887, pruned_loss=0.01211, audio_tagging_loss=0.008708, over 3034084.40 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:07:07,437 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2023-11-28 12:07:09,010 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.769e+01 9.562e+01 1.042e+02 1.496e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 12:07:19,003 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524450 2023-11-28 12:07:36,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3496373.3333333335, ans=0.0 2023-11-28 12:07:44,857 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.45 vs. limit=15.0 2023-11-28 12:07:50,866 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7450, loss[loss=0.06963, simple_loss=0.0877, pruned_loss=0.01318, audio_tagging_loss=0.01261, over 14639.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08909, pruned_loss=0.01227, audio_tagging_loss=0.008627, over 3036475.92 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:08:17,079 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524500 2023-11-28 12:08:40,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3496773.3333333335, ans=0.125 2023-11-28 12:08:44,869 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.61 vs. limit=22.5 2023-11-28 12:08:49,249 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7500, loss[loss=0.07003, simple_loss=0.09452, pruned_loss=0.01664, audio_tagging_loss=0.006131, over 14229.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08902, pruned_loss=0.01235, audio_tagging_loss=0.008612, over 3032006.94 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:08:52,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3496840.0, ans=0.125 2023-11-28 12:09:05,328 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.587e+01 8.807e+01 9.534e+01 1.017e+02 1.454e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 12:09:12,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3496973.3333333335, ans=0.125 2023-11-28 12:09:14,147 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524550 2023-11-28 12:09:18,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3496973.3333333335, ans=0.0 2023-11-28 12:09:35,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3497106.6666666665, ans=0.125 2023-11-28 12:09:46,882 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7550, loss[loss=0.06695, simple_loss=0.08836, pruned_loss=0.01313, audio_tagging_loss=0.009636, over 15614.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08959, pruned_loss=0.01238, audio_tagging_loss=0.008622, over 3041784.08 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:09:47,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3497173.3333333335, ans=0.0 2023-11-28 12:09:57,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3497240.0, ans=0.2 2023-11-28 12:10:02,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.61 vs. limit=15.0 2023-11-28 12:10:05,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3497240.0, ans=0.0 2023-11-28 12:10:11,088 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524600 2023-11-28 12:10:11,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3497306.6666666665, ans=0.2 2023-11-28 12:10:33,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3497440.0, ans=0.0 2023-11-28 12:10:40,114 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2023-11-28 12:10:43,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3497506.6666666665, ans=0.125 2023-11-28 12:10:44,041 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7600, loss[loss=0.04548, simple_loss=0.05685, pruned_loss=0.006997, audio_tagging_loss=0.01006, over 16556.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08886, pruned_loss=0.01235, audio_tagging_loss=0.008654, over 3047704.62 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:10:44,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3497506.6666666665, ans=0.0 2023-11-28 12:10:48,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3497506.6666666665, ans=0.125 2023-11-28 12:10:50,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.56 vs. limit=10.0 2023-11-28 12:10:51,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3497506.6666666665, ans=0.125 2023-11-28 12:11:00,298 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.777e+01 8.865e+01 9.544e+01 1.025e+02 1.373e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-28 12:11:05,153 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=15.0 2023-11-28 12:11:09,782 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524650 2023-11-28 12:11:20,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3497706.6666666665, ans=0.1 2023-11-28 12:11:27,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.43 vs. limit=15.0 2023-11-28 12:11:36,056 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:11:41,835 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7650, loss[loss=0.0644, simple_loss=0.07725, pruned_loss=0.01733, audio_tagging_loss=0.008446, over 14849.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08905, pruned_loss=0.0123, audio_tagging_loss=0.008639, over 3051683.71 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:11:54,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3497906.6666666665, ans=0.125 2023-11-28 12:12:08,107 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524700 2023-11-28 12:12:31,804 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2023-11-28 12:12:34,039 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=22.5 2023-11-28 12:12:41,303 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7700, loss[loss=0.04981, simple_loss=0.06972, pruned_loss=0.006291, audio_tagging_loss=0.00866, over 16313.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08934, pruned_loss=0.01227, audio_tagging_loss=0.008628, over 3052602.62 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:12:45,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3498173.3333333335, ans=0.1 2023-11-28 12:12:50,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3498173.3333333335, ans=0.0 2023-11-28 12:12:53,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3498240.0, ans=0.125 2023-11-28 12:12:57,716 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.664e+01 8.890e+01 9.413e+01 1.018e+02 1.310e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 12:12:59,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3498240.0, ans=0.2 2023-11-28 12:13:05,499 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524750 2023-11-28 12:13:15,608 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:13:16,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3498373.3333333335, ans=0.0 2023-11-28 12:13:19,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3498373.3333333335, ans=0.125 2023-11-28 12:13:20,342 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0 2023-11-28 12:13:21,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3498373.3333333335, ans=0.125 2023-11-28 12:13:38,339 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7750, loss[loss=0.0891, simple_loss=0.1221, pruned_loss=0.02054, audio_tagging_loss=0.007536, over 15817.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08999, pruned_loss=0.01233, audio_tagging_loss=0.008603, over 3056982.42 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:13:53,357 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2023-11-28 12:13:56,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3498573.3333333335, ans=0.1 2023-11-28 12:13:57,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3498573.3333333335, ans=0.05 2023-11-28 12:14:03,967 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524800 2023-11-28 12:14:35,912 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7800, loss[loss=0.06205, simple_loss=0.08638, pruned_loss=0.01035, audio_tagging_loss=0.008517, over 15982.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08891, pruned_loss=0.01225, audio_tagging_loss=0.008723, over 3050232.81 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:14:38,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3498840.0, ans=0.1 2023-11-28 12:14:45,061 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.48 vs. limit=5.0 2023-11-28 12:14:52,159 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:14:53,664 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=12.0 2023-11-28 12:14:54,660 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 8.929e+01 9.549e+01 1.038e+02 1.507e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 12:15:02,470 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524850 2023-11-28 12:15:18,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3499040.0, ans=0.125 2023-11-28 12:15:19,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3499040.0, ans=0.125 2023-11-28 12:15:19,593 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2023-11-28 12:15:27,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3499106.6666666665, ans=0.0 2023-11-28 12:15:34,940 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7850, loss[loss=0.08152, simple_loss=0.1022, pruned_loss=0.01989, audio_tagging_loss=0.01054, over 15081.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.0899, pruned_loss=0.0123, audio_tagging_loss=0.008727, over 3054937.57 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:15:44,380 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:15:54,437 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.16 vs. limit=22.5 2023-11-28 12:15:59,589 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524900 2023-11-28 12:16:28,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3499440.0, ans=0.125 2023-11-28 12:16:31,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3499506.6666666665, ans=0.2 2023-11-28 12:16:31,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=3499506.6666666665, ans=12.0 2023-11-28 12:16:32,411 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7900, loss[loss=0.06333, simple_loss=0.09112, pruned_loss=0.01063, audio_tagging_loss=0.007141, over 15913.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09017, pruned_loss=0.01224, audio_tagging_loss=0.00874, over 3052631.38 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:16:36,386 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2023-11-28 12:16:45,204 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.76 vs. limit=22.5 2023-11-28 12:16:49,085 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 8.884e+01 9.612e+01 1.005e+02 1.246e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 12:16:57,349 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 524950 2023-11-28 12:17:20,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3499773.3333333335, ans=15.0 2023-11-28 12:17:23,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3499773.3333333335, ans=0.125 2023-11-28 12:17:25,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3499773.3333333335, ans=0.0 2023-11-28 12:17:28,952 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 7950, loss[loss=0.0773, simple_loss=0.1075, pruned_loss=0.01613, audio_tagging_loss=0.007397, over 16136.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09072, pruned_loss=0.01222, audio_tagging_loss=0.008774, over 3056988.16 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:17:39,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3499906.6666666665, ans=0.0 2023-11-28 12:17:45,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3499906.6666666665, ans=0.0 2023-11-28 12:17:50,036 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:17:55,456 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525000 2023-11-28 12:17:55,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3499973.3333333335, ans=0.1 2023-11-28 12:18:10,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3500040.0, ans=0.125 2023-11-28 12:18:27,117 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8000, loss[loss=0.05968, simple_loss=0.09002, pruned_loss=0.008236, audio_tagging_loss=0.006429, over 14408.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09006, pruned_loss=0.01217, audio_tagging_loss=0.008832, over 3047845.41 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:18:45,086 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.932e+01 9.362e+01 1.010e+02 1.203e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 12:18:51,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3500306.6666666665, ans=0.125 2023-11-28 12:18:52,810 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525050 2023-11-28 12:19:01,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3500373.3333333335, ans=0.1 2023-11-28 12:19:22,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3500440.0, ans=0.125 2023-11-28 12:19:25,967 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8050, loss[loss=0.08287, simple_loss=0.1047, pruned_loss=0.02013, audio_tagging_loss=0.01038, over 15230.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.08991, pruned_loss=0.0122, audio_tagging_loss=0.00896, over 3049052.46 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:19:27,285 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:19:44,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3500573.3333333335, ans=0.125 2023-11-28 12:19:46,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3500573.3333333335, ans=0.125 2023-11-28 12:19:50,895 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525100 2023-11-28 12:20:23,013 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8100, loss[loss=0.07105, simple_loss=0.0999, pruned_loss=0.01347, audio_tagging_loss=0.007635, over 14957.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09099, pruned_loss=0.01228, audio_tagging_loss=0.008847, over 3051787.44 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:20:33,199 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.19 vs. limit=8.0 2023-11-28 12:20:40,081 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.566e+01 8.823e+01 9.483e+01 1.016e+02 1.288e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 12:20:42,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3500906.6666666665, ans=0.1 2023-11-28 12:20:48,915 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525150 2023-11-28 12:20:50,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3500973.3333333335, ans=0.1 2023-11-28 12:20:53,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3500973.3333333335, ans=0.5 2023-11-28 12:20:55,491 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-28 12:21:09,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3501106.6666666665, ans=0.1 2023-11-28 12:21:19,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3501173.3333333335, ans=0.0 2023-11-28 12:21:20,699 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8150, loss[loss=0.06057, simple_loss=0.0818, pruned_loss=0.0104, audio_tagging_loss=0.009266, over 15051.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09069, pruned_loss=0.0122, audio_tagging_loss=0.008713, over 3054517.44 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:21:32,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3501240.0, ans=0.1 2023-11-28 12:21:32,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3501240.0, ans=0.1 2023-11-28 12:21:35,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3501240.0, ans=0.125 2023-11-28 12:21:35,898 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2023-11-28 12:21:46,477 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525200 2023-11-28 12:21:46,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3501306.6666666665, ans=0.125 2023-11-28 12:21:52,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3501306.6666666665, ans=0.125 2023-11-28 12:21:59,688 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2023-11-28 12:22:10,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3501440.0, ans=0.0 2023-11-28 12:22:15,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3501440.0, ans=0.1 2023-11-28 12:22:18,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=22.5 2023-11-28 12:22:19,385 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8200, loss[loss=0.04542, simple_loss=0.05531, pruned_loss=0.006447, audio_tagging_loss=0.01132, over 14040.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08967, pruned_loss=0.01209, audio_tagging_loss=0.008652, over 3053251.74 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:22:25,441 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:22:33,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3501573.3333333335, ans=0.125 2023-11-28 12:22:36,394 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 8.686e+01 9.334e+01 1.013e+02 1.382e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-28 12:22:44,795 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525250 2023-11-28 12:22:55,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3501706.6666666665, ans=0.0 2023-11-28 12:22:56,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3501706.6666666665, ans=0.125 2023-11-28 12:23:02,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3501706.6666666665, ans=0.0 2023-11-28 12:23:16,943 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8250, loss[loss=0.07999, simple_loss=0.121, pruned_loss=0.01419, audio_tagging_loss=0.005298, over 15155.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08961, pruned_loss=0.01205, audio_tagging_loss=0.008597, over 3056588.72 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:23:36,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3501906.6666666665, ans=0.0 2023-11-28 12:23:42,756 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525300 2023-11-28 12:23:42,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3501973.3333333335, ans=0.125 2023-11-28 12:23:46,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3501973.3333333335, ans=0.0 2023-11-28 12:23:48,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3501973.3333333335, ans=0.125 2023-11-28 12:23:52,705 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2023-11-28 12:24:14,023 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2023-11-28 12:24:14,675 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8300, loss[loss=0.08109, simple_loss=0.09985, pruned_loss=0.02227, audio_tagging_loss=0.008899, over 15842.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08947, pruned_loss=0.01216, audio_tagging_loss=0.008567, over 3059535.88 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:24:22,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3502173.3333333335, ans=0.1 2023-11-28 12:24:32,917 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.585e+01 8.723e+01 9.443e+01 1.022e+02 1.604e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 12:24:38,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3502306.6666666665, ans=0.125 2023-11-28 12:24:40,371 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525350 2023-11-28 12:24:50,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3502373.3333333335, ans=0.125 2023-11-28 12:24:56,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3502373.3333333335, ans=0.09899494936611666 2023-11-28 12:25:02,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3502440.0, ans=0.125 2023-11-28 12:25:08,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3502440.0, ans=0.0 2023-11-28 12:25:12,874 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8350, loss[loss=0.05553, simple_loss=0.0694, pruned_loss=0.01194, audio_tagging_loss=0.008894, over 14942.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08917, pruned_loss=0.01202, audio_tagging_loss=0.00857, over 3060005.24 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:25:29,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2023-11-28 12:25:32,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3502573.3333333335, ans=0.125 2023-11-28 12:25:37,654 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525400 2023-11-28 12:25:52,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3502706.6666666665, ans=0.125 2023-11-28 12:26:10,963 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8400, loss[loss=0.08268, simple_loss=0.1195, pruned_loss=0.01403, audio_tagging_loss=0.008917, over 16272.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.09026, pruned_loss=0.01229, audio_tagging_loss=0.008477, over 3059155.55 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:26:18,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3502840.0, ans=0.125 2023-11-28 12:26:19,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3502840.0, ans=0.125 2023-11-28 12:26:29,021 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.791e+01 9.357e+01 9.969e+01 1.892e+02, threshold=1.871e+02, percent-clipped=1.0 2023-11-28 12:26:36,651 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525450 2023-11-28 12:26:42,671 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=15.0 2023-11-28 12:26:58,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3503106.6666666665, ans=0.1 2023-11-28 12:27:07,924 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8450, loss[loss=0.06211, simple_loss=0.08174, pruned_loss=0.01204, audio_tagging_loss=0.009206, over 15061.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.0901, pruned_loss=0.01228, audio_tagging_loss=0.008581, over 3057586.09 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:27:27,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3503240.0, ans=0.0 2023-11-28 12:27:33,475 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525500 2023-11-28 12:27:37,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3503306.6666666665, ans=0.0 2023-11-28 12:28:06,300 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8500, loss[loss=0.07515, simple_loss=0.1046, pruned_loss=0.01415, audio_tagging_loss=0.008705, over 15328.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.091, pruned_loss=0.01245, audio_tagging_loss=0.008629, over 3058037.22 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:28:24,291 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.905e+01 8.935e+01 9.397e+01 1.006e+02 1.238e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 12:28:24,874 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.45 vs. limit=22.5 2023-11-28 12:28:25,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3503573.3333333335, ans=0.2 2023-11-28 12:28:31,011 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525550 2023-11-28 12:28:36,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3503640.0, ans=0.07 2023-11-28 12:28:46,609 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.73 vs. limit=15.0 2023-11-28 12:28:54,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3503773.3333333335, ans=0.125 2023-11-28 12:28:57,193 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2023-11-28 12:28:59,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3503773.3333333335, ans=0.025 2023-11-28 12:29:03,064 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8550, loss[loss=0.0716, simple_loss=0.09114, pruned_loss=0.01546, audio_tagging_loss=0.01057, over 15033.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09095, pruned_loss=0.01252, audio_tagging_loss=0.008684, over 3052473.32 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:29:14,068 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-28 12:29:21,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3503906.6666666665, ans=0.125 2023-11-28 12:29:28,483 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525600 2023-11-28 12:29:39,779 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.58 vs. limit=22.5 2023-11-28 12:29:54,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3504106.6666666665, ans=0.1 2023-11-28 12:29:59,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3504173.3333333335, ans=0.1 2023-11-28 12:30:00,679 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8600, loss[loss=0.06668, simple_loss=0.09557, pruned_loss=0.01242, audio_tagging_loss=0.006474, over 15358.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08949, pruned_loss=0.01222, audio_tagging_loss=0.008764, over 3052139.16 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:30:11,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3504240.0, ans=10.0 2023-11-28 12:30:12,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3504240.0, ans=0.125 2023-11-28 12:30:19,762 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.707e+01 8.911e+01 9.426e+01 1.012e+02 1.309e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 12:30:20,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3504240.0, ans=0.125 2023-11-28 12:30:20,461 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.15 vs. limit=10.0 2023-11-28 12:30:23,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3504306.6666666665, ans=0.125 2023-11-28 12:30:26,558 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525650 2023-11-28 12:30:40,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3504373.3333333335, ans=0.0 2023-11-28 12:30:59,367 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8650, loss[loss=0.05884, simple_loss=0.08156, pruned_loss=0.008987, audio_tagging_loss=0.009075, over 15597.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08958, pruned_loss=0.01218, audio_tagging_loss=0.008765, over 3049623.55 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:31:09,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3504573.3333333335, ans=0.025 2023-11-28 12:31:24,158 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525700 2023-11-28 12:31:28,987 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.29 vs. limit=22.5 2023-11-28 12:31:29,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3504640.0, ans=0.125 2023-11-28 12:31:48,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3504773.3333333335, ans=0.125 2023-11-28 12:31:56,479 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8700, loss[loss=0.06043, simple_loss=0.08472, pruned_loss=0.009777, audio_tagging_loss=0.008286, over 15655.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08968, pruned_loss=0.01218, audio_tagging_loss=0.008863, over 3051170.63 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:31:56,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3504840.0, ans=0.125 2023-11-28 12:32:02,561 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.10 vs. limit=10.0 2023-11-28 12:32:14,550 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 8.928e+01 9.510e+01 1.026e+02 1.329e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 12:32:14,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3504906.6666666665, ans=0.125 2023-11-28 12:32:21,826 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525750 2023-11-28 12:32:29,126 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2023-11-28 12:32:36,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3505040.0, ans=0.0 2023-11-28 12:32:42,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3505106.6666666665, ans=0.125 2023-11-28 12:32:51,271 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-28 12:32:53,015 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8750, loss[loss=0.06783, simple_loss=0.09058, pruned_loss=0.01422, audio_tagging_loss=0.008312, over 15231.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09056, pruned_loss=0.01228, audio_tagging_loss=0.008874, over 3053496.38 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:33:06,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3505240.0, ans=0.95 2023-11-28 12:33:09,206 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:33:19,006 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525800 2023-11-28 12:33:31,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3505373.3333333335, ans=0.125 2023-11-28 12:33:46,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3505440.0, ans=0.0 2023-11-28 12:33:48,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3505440.0, ans=0.1 2023-11-28 12:33:51,370 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8800, loss[loss=0.05833, simple_loss=0.0731, pruned_loss=0.01218, audio_tagging_loss=0.0096, over 14420.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09078, pruned_loss=0.01237, audio_tagging_loss=0.008853, over 3049028.74 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:33:54,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3505506.6666666665, ans=0.125 2023-11-28 12:34:00,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3505506.6666666665, ans=0.1 2023-11-28 12:34:09,431 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.660e+01 8.935e+01 9.643e+01 1.033e+02 1.238e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-28 12:34:16,043 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525850 2023-11-28 12:34:27,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3505706.6666666665, ans=0.0 2023-11-28 12:34:43,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3505773.3333333335, ans=0.125 2023-11-28 12:34:48,667 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8850, loss[loss=0.06375, simple_loss=0.08818, pruned_loss=0.01004, audio_tagging_loss=0.009618, over 14325.00 frames. ], tot_loss[loss=0.06631, simple_loss=0.09033, pruned_loss=0.01228, audio_tagging_loss=0.008867, over 3051732.35 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:34:59,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3505906.6666666665, ans=0.1 2023-11-28 12:35:04,074 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:35:13,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3505973.3333333335, ans=0.1 2023-11-28 12:35:14,067 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525900 2023-11-28 12:35:18,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3505973.3333333335, ans=0.1 2023-11-28 12:35:31,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3506040.0, ans=0.125 2023-11-28 12:35:42,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3506106.6666666665, ans=0.1 2023-11-28 12:35:43,348 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:35:45,387 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8900, loss[loss=0.075, simple_loss=0.105, pruned_loss=0.01325, audio_tagging_loss=0.009268, over 15204.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09086, pruned_loss=0.01228, audio_tagging_loss=0.008742, over 3057958.09 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:36:04,590 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 8.621e+01 9.140e+01 9.989e+01 1.171e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-28 12:36:11,304 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 525950 2023-11-28 12:36:15,219 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.06 vs. limit=10.0 2023-11-28 12:36:22,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3506373.3333333335, ans=0.0 2023-11-28 12:36:27,085 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.96 vs. limit=15.0 2023-11-28 12:36:42,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3506506.6666666665, ans=0.0 2023-11-28 12:36:43,087 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 8950, loss[loss=0.06037, simple_loss=0.08181, pruned_loss=0.01245, audio_tagging_loss=0.007016, over 14910.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09014, pruned_loss=0.01228, audio_tagging_loss=0.008626, over 3058910.84 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:37:07,793 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526000 2023-11-28 12:37:13,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3506640.0, ans=0.0 2023-11-28 12:37:35,379 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2023-11-28 12:37:38,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3506773.3333333335, ans=0.0 2023-11-28 12:37:40,497 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9000, loss[loss=0.05267, simple_loss=0.06584, pruned_loss=0.01002, audio_tagging_loss=0.009729, over 14243.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08995, pruned_loss=0.01233, audio_tagging_loss=0.008545, over 3056144.36 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:37:40,498 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 12:38:15,239 INFO [train_asr.py:1267] (2/4) Epoch 44, validation: loss=0.05875, simple_loss=0.05057, pruned_loss=0.005344, audio_tagging_loss=0.02812, over 4681554.00 frames. 2023-11-28 12:38:15,240 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 12:38:32,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3506906.6666666665, ans=0.125 2023-11-28 12:38:35,941 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.641e+01 8.830e+01 9.439e+01 1.037e+02 1.240e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 12:38:41,595 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526050 2023-11-28 12:39:05,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3507106.6666666665, ans=0.125 2023-11-28 12:39:08,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3507106.6666666665, ans=0.0 2023-11-28 12:39:13,626 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9050, loss[loss=0.05451, simple_loss=0.07164, pruned_loss=0.009117, audio_tagging_loss=0.009579, over 15800.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09137, pruned_loss=0.0126, audio_tagging_loss=0.008415, over 3057911.41 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:39:33,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3507240.0, ans=0.0 2023-11-28 12:39:34,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3507240.0, ans=0.125 2023-11-28 12:39:34,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3507240.0, ans=0.125 2023-11-28 12:39:38,908 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526100 2023-11-28 12:39:44,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3507306.6666666665, ans=0.125 2023-11-28 12:40:05,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3507440.0, ans=0.125 2023-11-28 12:40:07,920 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-28 12:40:11,479 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9100, loss[loss=0.0599, simple_loss=0.08608, pruned_loss=0.01057, audio_tagging_loss=0.006292, over 14961.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09121, pruned_loss=0.0126, audio_tagging_loss=0.008396, over 3056322.71 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:40:29,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3507573.3333333335, ans=0.125 2023-11-28 12:40:30,765 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 9.014e+01 9.601e+01 1.029e+02 1.341e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 12:40:36,352 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526150 2023-11-28 12:40:37,812 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-28 12:40:43,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3507640.0, ans=0.0 2023-11-28 12:40:51,236 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.16 vs. limit=22.5 2023-11-28 12:41:08,305 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9150, loss[loss=0.05702, simple_loss=0.0753, pruned_loss=0.008561, audio_tagging_loss=0.01081, over 15562.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09069, pruned_loss=0.01237, audio_tagging_loss=0.008478, over 3051326.09 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:41:25,520 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-28 12:41:34,212 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526200 2023-11-28 12:41:39,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3507973.3333333335, ans=0.125 2023-11-28 12:41:55,763 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.21 vs. limit=22.5 2023-11-28 12:42:05,859 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9200, loss[loss=0.06224, simple_loss=0.08264, pruned_loss=0.01236, audio_tagging_loss=0.008568, over 16073.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09047, pruned_loss=0.01236, audio_tagging_loss=0.008532, over 3045320.71 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:42:21,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3508240.0, ans=0.0 2023-11-28 12:42:25,535 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.609e+01 9.408e+01 1.003e+02 1.431e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 12:42:31,015 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526250 2023-11-28 12:42:31,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3508306.6666666665, ans=0.1 2023-11-28 12:42:32,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3508306.6666666665, ans=0.125 2023-11-28 12:42:38,067 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2023-11-28 12:42:48,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=15.0 2023-11-28 12:43:02,876 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9250, loss[loss=0.0635, simple_loss=0.08224, pruned_loss=0.009907, audio_tagging_loss=0.01248, over 15744.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09087, pruned_loss=0.01238, audio_tagging_loss=0.008501, over 3050134.20 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:43:07,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3508506.6666666665, ans=0.125 2023-11-28 12:43:09,187 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.07 vs. limit=15.0 2023-11-28 12:43:14,568 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.31 vs. limit=6.0 2023-11-28 12:43:27,712 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526300 2023-11-28 12:43:31,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3508640.0, ans=0.0 2023-11-28 12:43:42,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3508706.6666666665, ans=0.2 2023-11-28 12:43:43,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3508706.6666666665, ans=0.1 2023-11-28 12:43:49,233 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.91 vs. limit=15.0 2023-11-28 12:43:58,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3508840.0, ans=0.0 2023-11-28 12:43:59,850 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9300, loss[loss=0.08062, simple_loss=0.1211, pruned_loss=0.01476, audio_tagging_loss=0.005302, over 16045.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08993, pruned_loss=0.01231, audio_tagging_loss=0.008538, over 3046760.71 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:44:01,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3508840.0, ans=0.125 2023-11-28 12:44:03,572 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.39 vs. limit=10.0 2023-11-28 12:44:19,097 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 9.008e+01 9.881e+01 1.066e+02 1.464e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-28 12:44:25,812 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526350 2023-11-28 12:44:39,759 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.52 vs. limit=15.0 2023-11-28 12:44:54,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3509106.6666666665, ans=0.125 2023-11-28 12:44:57,015 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9350, loss[loss=0.06009, simple_loss=0.0822, pruned_loss=0.01285, audio_tagging_loss=0.006143, over 16399.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08896, pruned_loss=0.01221, audio_tagging_loss=0.008574, over 3038096.56 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:45:00,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3509173.3333333335, ans=0.2 2023-11-28 12:45:09,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3509240.0, ans=0.125 2023-11-28 12:45:14,946 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.55 vs. limit=15.0 2023-11-28 12:45:22,246 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526400 2023-11-28 12:45:30,554 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.53 vs. limit=10.0 2023-11-28 12:45:47,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3509440.0, ans=0.125 2023-11-28 12:45:50,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3509440.0, ans=0.125 2023-11-28 12:45:55,562 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9400, loss[loss=0.07282, simple_loss=0.108, pruned_loss=0.01181, audio_tagging_loss=0.007017, over 15747.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08847, pruned_loss=0.01195, audio_tagging_loss=0.008661, over 3037628.36 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:46:01,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3509506.6666666665, ans=0.125 2023-11-28 12:46:14,159 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.941e+01 9.559e+01 9.955e+01 1.222e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 12:46:18,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3509640.0, ans=0.0 2023-11-28 12:46:20,436 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526450 2023-11-28 12:46:45,396 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.66 vs. limit=10.0 2023-11-28 12:46:47,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3509773.3333333335, ans=0.1 2023-11-28 12:46:48,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3509773.3333333335, ans=0.125 2023-11-28 12:46:52,481 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9450, loss[loss=0.07231, simple_loss=0.1045, pruned_loss=0.01362, audio_tagging_loss=0.006424, over 15852.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08892, pruned_loss=0.01194, audio_tagging_loss=0.008682, over 3041619.78 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:46:52,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3509840.0, ans=0.125 2023-11-28 12:46:55,875 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:46:56,109 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:47:08,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3509906.6666666665, ans=0.125 2023-11-28 12:47:10,382 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.91 vs. limit=22.5 2023-11-28 12:47:18,051 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526500 2023-11-28 12:47:39,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3510106.6666666665, ans=0.125 2023-11-28 12:47:50,094 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9500, loss[loss=0.07369, simple_loss=0.1039, pruned_loss=0.01446, audio_tagging_loss=0.007293, over 15482.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09019, pruned_loss=0.01228, audio_tagging_loss=0.008638, over 3041785.83 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:48:05,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3510240.0, ans=0.04949747468305833 2023-11-28 12:48:10,362 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.845e+01 9.672e+01 1.033e+02 1.277e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 12:48:15,957 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526550 2023-11-28 12:48:22,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3510306.6666666665, ans=0.1 2023-11-28 12:48:44,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3510440.0, ans=0.1 2023-11-28 12:48:48,233 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9550, loss[loss=0.04511, simple_loss=0.05279, pruned_loss=0.008652, audio_tagging_loss=0.01006, over 14130.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08951, pruned_loss=0.01221, audio_tagging_loss=0.008726, over 3035321.34 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:48:48,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3510506.6666666665, ans=0.125 2023-11-28 12:48:55,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3510506.6666666665, ans=0.0 2023-11-28 12:49:00,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3510573.3333333335, ans=0.0 2023-11-28 12:49:04,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3510573.3333333335, ans=0.0 2023-11-28 12:49:13,676 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526600 2023-11-28 12:49:17,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3510640.0, ans=0.125 2023-11-28 12:49:34,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3510773.3333333335, ans=0.125 2023-11-28 12:49:42,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3510773.3333333335, ans=0.0 2023-11-28 12:49:44,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3510773.3333333335, ans=0.2 2023-11-28 12:49:46,389 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9600, loss[loss=0.06435, simple_loss=0.08646, pruned_loss=0.01282, audio_tagging_loss=0.008306, over 14771.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08864, pruned_loss=0.01217, audio_tagging_loss=0.008907, over 3037157.16 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:49:54,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3510840.0, ans=0.0 2023-11-28 12:49:58,688 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.91 vs. limit=22.5 2023-11-28 12:50:06,846 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.314e+01 8.780e+01 9.691e+01 1.026e+02 1.293e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-28 12:50:08,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3510973.3333333335, ans=0.125 2023-11-28 12:50:11,857 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526650 2023-11-28 12:50:11,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3510973.3333333335, ans=0.1 2023-11-28 12:50:22,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3511040.0, ans=10.0 2023-11-28 12:50:44,464 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9650, loss[loss=0.08815, simple_loss=0.1208, pruned_loss=0.01947, audio_tagging_loss=0.00825, over 15935.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08793, pruned_loss=0.01201, audio_tagging_loss=0.008931, over 3038264.29 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:50:47,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3511173.3333333335, ans=0.125 2023-11-28 12:50:49,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3511173.3333333335, ans=0.2 2023-11-28 12:51:08,746 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2023-11-28 12:51:09,390 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526700 2023-11-28 12:51:16,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3511306.6666666665, ans=0.2 2023-11-28 12:51:20,656 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2023-11-28 12:51:35,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3511440.0, ans=0.125 2023-11-28 12:51:42,567 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9700, loss[loss=0.0674, simple_loss=0.09582, pruned_loss=0.01293, audio_tagging_loss=0.006557, over 14936.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08939, pruned_loss=0.01211, audio_tagging_loss=0.008719, over 3039395.83 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:51:48,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3511506.6666666665, ans=0.0 2023-11-28 12:51:54,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3511573.3333333335, ans=0.125 2023-11-28 12:52:03,074 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.922e+01 8.945e+01 9.456e+01 1.030e+02 1.271e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 12:52:07,529 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526750 2023-11-28 12:52:33,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3511773.3333333335, ans=0.0 2023-11-28 12:52:35,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3511773.3333333335, ans=0.025 2023-11-28 12:52:38,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3511840.0, ans=0.0 2023-11-28 12:52:39,640 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9750, loss[loss=0.07277, simple_loss=0.1017, pruned_loss=0.01461, audio_tagging_loss=0.007332, over 15951.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08995, pruned_loss=0.01209, audio_tagging_loss=0.008577, over 3044112.00 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:52:39,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3511840.0, ans=0.1 2023-11-28 12:52:43,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3511840.0, ans=0.125 2023-11-28 12:52:46,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3511840.0, ans=0.125 2023-11-28 12:53:04,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2023-11-28 12:53:05,021 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526800 2023-11-28 12:53:07,573 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2023-11-28 12:53:18,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3512040.0, ans=0.125 2023-11-28 12:53:26,011 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.91 vs. limit=15.0 2023-11-28 12:53:34,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3512106.6666666665, ans=0.2 2023-11-28 12:53:37,921 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9800, loss[loss=0.06943, simple_loss=0.08957, pruned_loss=0.01516, audio_tagging_loss=0.009483, over 16453.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08943, pruned_loss=0.01205, audio_tagging_loss=0.008509, over 3046081.99 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:53:43,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3512173.3333333335, ans=0.2 2023-11-28 12:53:49,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3512240.0, ans=0.125 2023-11-28 12:53:56,640 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2023-11-28 12:53:58,293 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.970e+01 9.668e+01 1.037e+02 1.358e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 12:54:02,864 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526850 2023-11-28 12:54:04,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3512306.6666666665, ans=0.0 2023-11-28 12:54:18,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3512373.3333333335, ans=0.1 2023-11-28 12:54:19,850 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.72 vs. limit=15.0 2023-11-28 12:54:23,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3512440.0, ans=0.5 2023-11-28 12:54:29,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3512440.0, ans=0.0 2023-11-28 12:54:31,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3512440.0, ans=0.125 2023-11-28 12:54:33,062 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 12:54:35,759 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9850, loss[loss=0.06208, simple_loss=0.08293, pruned_loss=0.01051, audio_tagging_loss=0.0101, over 15902.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.0898, pruned_loss=0.01211, audio_tagging_loss=0.008533, over 3041250.69 frames. ], batch size: 61, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:54:39,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3512506.6666666665, ans=0.125 2023-11-28 12:54:47,915 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=22.5 2023-11-28 12:55:01,016 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526900 2023-11-28 12:55:21,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3512773.3333333335, ans=0.125 2023-11-28 12:55:22,765 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-28 12:55:25,151 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=12.0 2023-11-28 12:55:28,309 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.98 vs. limit=22.5 2023-11-28 12:55:33,304 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9900, loss[loss=0.06215, simple_loss=0.08577, pruned_loss=0.01113, audio_tagging_loss=0.008131, over 15638.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.0904, pruned_loss=0.01239, audio_tagging_loss=0.008514, over 3046156.64 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:55:39,208 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=15.0 2023-11-28 12:55:55,302 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 9.239e+01 9.931e+01 1.065e+02 1.438e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-28 12:55:57,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3512973.3333333335, ans=0.125 2023-11-28 12:55:58,679 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 526950 2023-11-28 12:55:58,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3512973.3333333335, ans=0.125 2023-11-28 12:56:01,342 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2023-11-28 12:56:27,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3513106.6666666665, ans=0.2 2023-11-28 12:56:31,209 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 9950, loss[loss=0.07203, simple_loss=0.1097, pruned_loss=0.0107, audio_tagging_loss=0.006479, over 13906.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08912, pruned_loss=0.01215, audio_tagging_loss=0.008563, over 3043178.04 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 12:56:56,757 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527000 2023-11-28 12:57:01,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3513306.6666666665, ans=0.125 2023-11-28 12:57:02,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3513306.6666666665, ans=0.0 2023-11-28 12:57:04,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3513373.3333333335, ans=0.2 2023-11-28 12:57:10,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3513373.3333333335, ans=0.125 2023-11-28 12:57:29,209 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10000, loss[loss=0.0572, simple_loss=0.07758, pruned_loss=0.009192, audio_tagging_loss=0.00922, over 15654.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08918, pruned_loss=0.01212, audio_tagging_loss=0.008535, over 3038364.78 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:57:31,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3513506.6666666665, ans=0.0 2023-11-28 12:57:50,542 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.311e+01 8.894e+01 9.636e+01 1.033e+02 1.186e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 12:57:53,872 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527050 2023-11-28 12:58:01,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3513640.0, ans=0.1 2023-11-28 12:58:02,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3513706.6666666665, ans=0.125 2023-11-28 12:58:05,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3513706.6666666665, ans=0.0 2023-11-28 12:58:10,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3513706.6666666665, ans=0.125 2023-11-28 12:58:26,166 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10050, loss[loss=0.05133, simple_loss=0.06468, pruned_loss=0.00829, audio_tagging_loss=0.0107, over 14441.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08977, pruned_loss=0.01228, audio_tagging_loss=0.008518, over 3042491.12 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:58:36,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3513906.6666666665, ans=0.07 2023-11-28 12:58:42,090 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2023-11-28 12:58:49,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3513973.3333333335, ans=0.125 2023-11-28 12:58:51,644 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527100 2023-11-28 12:58:51,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3513973.3333333335, ans=0.125 2023-11-28 12:59:02,141 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 12:59:06,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3514040.0, ans=0.125 2023-11-28 12:59:22,920 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10100, loss[loss=0.07764, simple_loss=0.1043, pruned_loss=0.01579, audio_tagging_loss=0.009685, over 15577.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08967, pruned_loss=0.01232, audio_tagging_loss=0.008585, over 3046282.61 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 12:59:44,625 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2023-11-28 12:59:46,043 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.720e+01 9.623e+01 1.026e+02 1.280e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 12:59:49,341 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527150 2023-11-28 12:59:51,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3514306.6666666665, ans=0.125 2023-11-28 13:00:14,181 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:00:14,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3514440.0, ans=0.125 2023-11-28 13:00:15,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3514440.0, ans=0.0 2023-11-28 13:00:18,518 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.04 vs. limit=22.5 2023-11-28 13:00:21,329 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10150, loss[loss=0.06041, simple_loss=0.08718, pruned_loss=0.009293, audio_tagging_loss=0.007522, over 15584.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08966, pruned_loss=0.01226, audio_tagging_loss=0.00867, over 3048766.87 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:00:23,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3514506.6666666665, ans=0.0 2023-11-28 13:00:26,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3514506.6666666665, ans=0.0 2023-11-28 13:00:33,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3514573.3333333335, ans=0.125 2023-11-28 13:00:35,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3514573.3333333335, ans=0.2 2023-11-28 13:00:46,678 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527200 2023-11-28 13:00:47,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3514640.0, ans=0.0 2023-11-28 13:00:52,429 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:00:52,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3514640.0, ans=0.0 2023-11-28 13:01:05,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3514706.6666666665, ans=0.125 2023-11-28 13:01:16,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3514773.3333333335, ans=0.0 2023-11-28 13:01:19,638 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10200, loss[loss=0.06676, simple_loss=0.06894, pruned_loss=0.01936, audio_tagging_loss=0.01293, over 14047.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08997, pruned_loss=0.01215, audio_tagging_loss=0.008715, over 3048371.56 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:01:19,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3514840.0, ans=0.04949747468305833 2023-11-28 13:01:22,280 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2023-11-28 13:01:24,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3514840.0, ans=0.0 2023-11-28 13:01:41,251 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.267e+01 8.683e+01 9.423e+01 1.021e+02 1.647e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-28 13:01:42,016 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.09 vs. limit=15.0 2023-11-28 13:01:44,666 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527250 2023-11-28 13:01:45,663 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:01:46,185 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.43 vs. limit=22.5 2023-11-28 13:01:55,863 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:02:16,829 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10250, loss[loss=0.0665, simple_loss=0.08704, pruned_loss=0.01264, audio_tagging_loss=0.01034, over 15267.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08958, pruned_loss=0.01213, audio_tagging_loss=0.008771, over 3047078.68 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:02:27,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3515240.0, ans=0.0 2023-11-28 13:02:30,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3515240.0, ans=0.0 2023-11-28 13:02:32,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3515240.0, ans=0.025 2023-11-28 13:02:40,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3515306.6666666665, ans=0.2 2023-11-28 13:02:43,375 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527300 2023-11-28 13:02:46,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3515306.6666666665, ans=0.0 2023-11-28 13:02:53,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3515373.3333333335, ans=0.1 2023-11-28 13:02:53,499 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.45 vs. limit=15.0 2023-11-28 13:02:54,721 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2023-11-28 13:03:03,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3515440.0, ans=0.125 2023-11-28 13:03:06,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3515440.0, ans=0.125 2023-11-28 13:03:14,510 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10300, loss[loss=0.06202, simple_loss=0.08476, pruned_loss=0.01051, audio_tagging_loss=0.009136, over 14521.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08961, pruned_loss=0.01214, audio_tagging_loss=0.008686, over 3040417.94 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:03:21,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3515506.6666666665, ans=0.0 2023-11-28 13:03:22,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3515506.6666666665, ans=0.125 2023-11-28 13:03:28,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3515573.3333333335, ans=0.125 2023-11-28 13:03:36,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3515573.3333333335, ans=0.125 2023-11-28 13:03:36,876 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.412e+01 8.855e+01 9.649e+01 1.050e+02 1.403e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 13:03:40,200 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527350 2023-11-28 13:03:40,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3515640.0, ans=0.125 2023-11-28 13:04:04,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3515773.3333333335, ans=10.0 2023-11-28 13:04:12,906 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10350, loss[loss=0.09275, simple_loss=0.1292, pruned_loss=0.02249, audio_tagging_loss=0.005665, over 15349.00 frames. ], tot_loss[loss=0.06669, simple_loss=0.09115, pruned_loss=0.01245, audio_tagging_loss=0.008671, over 3049917.94 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:04:27,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3515906.6666666665, ans=0.125 2023-11-28 13:04:37,736 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527400 2023-11-28 13:05:10,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.32 vs. limit=15.0 2023-11-28 13:05:10,550 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10400, loss[loss=0.06919, simple_loss=0.08762, pruned_loss=0.01582, audio_tagging_loss=0.009561, over 15889.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09019, pruned_loss=0.0124, audio_tagging_loss=0.008793, over 3045807.53 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:05:24,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3516240.0, ans=0.0 2023-11-28 13:05:32,160 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 9.002e+01 9.653e+01 1.031e+02 1.825e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-28 13:05:36,659 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527450 2023-11-28 13:06:08,059 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10450, loss[loss=0.07162, simple_loss=0.1016, pruned_loss=0.01168, audio_tagging_loss=0.009122, over 15525.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.08988, pruned_loss=0.01228, audio_tagging_loss=0.008791, over 3047232.51 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:06:12,843 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.43 vs. limit=22.5 2023-11-28 13:06:13,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3516506.6666666665, ans=0.0 2023-11-28 13:06:23,108 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:06:23,363 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2023-11-28 13:06:29,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3516573.3333333335, ans=0.04949747468305833 2023-11-28 13:06:32,129 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.35 vs. limit=12.0 2023-11-28 13:06:33,670 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527500 2023-11-28 13:06:34,176 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=12.0 2023-11-28 13:06:40,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3516640.0, ans=0.09899494936611666 2023-11-28 13:06:41,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3516706.6666666665, ans=0.0 2023-11-28 13:06:46,999 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2023-11-28 13:06:47,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3516706.6666666665, ans=0.0 2023-11-28 13:06:53,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3516773.3333333335, ans=0.125 2023-11-28 13:07:06,808 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10500, loss[loss=0.05518, simple_loss=0.07034, pruned_loss=0.01026, audio_tagging_loss=0.009744, over 15063.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09009, pruned_loss=0.01232, audio_tagging_loss=0.008662, over 3045542.82 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:07:16,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3516906.6666666665, ans=0.125 2023-11-28 13:07:28,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.731e+01 9.359e+01 1.002e+02 1.371e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-28 13:07:31,672 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527550 2023-11-28 13:07:48,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3517040.0, ans=0.1 2023-11-28 13:07:54,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3517106.6666666665, ans=0.2 2023-11-28 13:07:59,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3517106.6666666665, ans=0.0 2023-11-28 13:08:04,088 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10550, loss[loss=0.06209, simple_loss=0.08595, pruned_loss=0.01099, audio_tagging_loss=0.008132, over 15613.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08994, pruned_loss=0.01228, audio_tagging_loss=0.008541, over 3037493.10 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:08:20,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3517240.0, ans=0.09899494936611666 2023-11-28 13:08:26,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3517306.6666666665, ans=0.1 2023-11-28 13:08:28,788 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527600 2023-11-28 13:08:37,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3517306.6666666665, ans=0.125 2023-11-28 13:08:38,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3517373.3333333335, ans=0.125 2023-11-28 13:08:47,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3517373.3333333335, ans=0.125 2023-11-28 13:08:47,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3517373.3333333335, ans=0.0 2023-11-28 13:09:01,983 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10600, loss[loss=0.0626, simple_loss=0.08754, pruned_loss=0.01096, audio_tagging_loss=0.007865, over 15048.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08972, pruned_loss=0.01232, audio_tagging_loss=0.0085, over 3032886.55 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:09:09,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3517506.6666666665, ans=0.1 2023-11-28 13:09:15,462 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:09:20,101 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=12.0 2023-11-28 13:09:24,563 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.917e+01 9.595e+01 1.067e+02 1.545e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-28 13:09:27,982 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527650 2023-11-28 13:09:30,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3517640.0, ans=0.0 2023-11-28 13:09:46,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3517706.6666666665, ans=0.0 2023-11-28 13:09:48,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3517773.3333333335, ans=0.125 2023-11-28 13:10:00,432 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10650, loss[loss=0.0644, simple_loss=0.09451, pruned_loss=0.01213, audio_tagging_loss=0.005012, over 15468.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09024, pruned_loss=0.01245, audio_tagging_loss=0.008427, over 3034958.21 frames. ], batch size: 58, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:10:25,950 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527700 2023-11-28 13:10:33,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3518040.0, ans=0.125 2023-11-28 13:10:39,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3518040.0, ans=0.04949747468305833 2023-11-28 13:10:57,985 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10700, loss[loss=0.08875, simple_loss=0.1326, pruned_loss=0.01533, audio_tagging_loss=0.007112, over 15946.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09016, pruned_loss=0.01248, audio_tagging_loss=0.008485, over 3036065.19 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:11:05,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3518173.3333333335, ans=0.0 2023-11-28 13:11:19,351 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 8.935e+01 9.512e+01 1.031e+02 1.313e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 13:11:22,794 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527750 2023-11-28 13:11:31,029 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-28 13:11:55,921 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10750, loss[loss=0.05881, simple_loss=0.08565, pruned_loss=0.007347, audio_tagging_loss=0.008641, over 14254.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09014, pruned_loss=0.01236, audio_tagging_loss=0.008473, over 3037541.68 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:12:21,146 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527800 2023-11-28 13:12:26,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3518640.0, ans=0.125 2023-11-28 13:12:41,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3518773.3333333335, ans=0.1 2023-11-28 13:12:53,920 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10800, loss[loss=0.06256, simple_loss=0.08876, pruned_loss=0.01127, audio_tagging_loss=0.00691, over 14724.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09086, pruned_loss=0.01239, audio_tagging_loss=0.008382, over 3044928.56 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:12:56,362 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.89 vs. limit=15.0 2023-11-28 13:13:01,585 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.81 vs. limit=15.0 2023-11-28 13:13:15,944 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.616e+01 8.825e+01 9.488e+01 1.009e+02 1.262e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 13:13:19,980 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527850 2023-11-28 13:13:21,457 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.30 vs. limit=15.0 2023-11-28 13:13:22,714 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2023-11-28 13:13:31,099 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:13:39,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3519106.6666666665, ans=0.0 2023-11-28 13:13:51,771 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10850, loss[loss=0.05166, simple_loss=0.06519, pruned_loss=0.007937, audio_tagging_loss=0.01113, over 14445.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09136, pruned_loss=0.01244, audio_tagging_loss=0.008464, over 3043415.09 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:13:54,616 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2023-11-28 13:14:00,539 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2023-11-28 13:14:08,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3519240.0, ans=0.125 2023-11-28 13:14:17,151 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527900 2023-11-28 13:14:18,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3519306.6666666665, ans=0.125 2023-11-28 13:14:19,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3519306.6666666665, ans=0.125 2023-11-28 13:14:30,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3519373.3333333335, ans=10.0 2023-11-28 13:14:38,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3519440.0, ans=0.125 2023-11-28 13:14:50,119 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10900, loss[loss=0.05271, simple_loss=0.0743, pruned_loss=0.00565, audio_tagging_loss=0.00991, over 15248.00 frames. ], tot_loss[loss=0.06665, simple_loss=0.09109, pruned_loss=0.01247, audio_tagging_loss=0.008635, over 3044145.00 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:14:51,254 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:15:08,209 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=15.0 2023-11-28 13:15:11,869 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.107e+01 8.813e+01 9.594e+01 1.027e+02 1.260e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-28 13:15:15,292 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 527950 2023-11-28 13:15:19,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3519640.0, ans=0.0 2023-11-28 13:15:35,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3519773.3333333335, ans=0.0 2023-11-28 13:15:38,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3519773.3333333335, ans=0.125 2023-11-28 13:15:40,388 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2023-11-28 13:15:41,606 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=15.0 2023-11-28 13:15:45,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3519773.3333333335, ans=0.1 2023-11-28 13:15:47,465 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 10950, loss[loss=0.04988, simple_loss=0.07118, pruned_loss=0.006649, audio_tagging_loss=0.007643, over 14509.00 frames. ], tot_loss[loss=0.06659, simple_loss=0.09085, pruned_loss=0.01248, audio_tagging_loss=0.008679, over 3041504.74 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:15:48,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3519840.0, ans=0.2 2023-11-28 13:15:52,992 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:15:56,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3519840.0, ans=6.0 2023-11-28 13:16:03,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3519906.6666666665, ans=0.125 2023-11-28 13:16:12,790 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528000 2023-11-28 13:16:12,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3519973.3333333335, ans=0.125 2023-11-28 13:16:14,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3519973.3333333335, ans=0.125 2023-11-28 13:16:31,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3520040.0, ans=0.125 2023-11-28 13:16:47,526 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11000, loss[loss=0.05629, simple_loss=0.07738, pruned_loss=0.009238, audio_tagging_loss=0.008361, over 15209.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09068, pruned_loss=0.01259, audio_tagging_loss=0.00868, over 3034917.65 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:16:52,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3520173.3333333335, ans=0.07 2023-11-28 13:16:59,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3520240.0, ans=0.0 2023-11-28 13:17:01,794 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:17:04,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3520240.0, ans=0.0 2023-11-28 13:17:05,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3520240.0, ans=0.0 2023-11-28 13:17:07,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3520240.0, ans=0.0 2023-11-28 13:17:09,437 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 9.312e+01 9.741e+01 1.066e+02 1.982e+02, threshold=1.948e+02, percent-clipped=1.0 2023-11-28 13:17:12,877 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528050 2023-11-28 13:17:20,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3520306.6666666665, ans=0.1 2023-11-28 13:17:22,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3520373.3333333335, ans=0.015 2023-11-28 13:17:44,887 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11050, loss[loss=0.05982, simple_loss=0.07434, pruned_loss=0.01219, audio_tagging_loss=0.01047, over 16497.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09053, pruned_loss=0.0125, audio_tagging_loss=0.008707, over 3043195.80 frames. ], batch size: 62, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:17:47,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3520506.6666666665, ans=0.1 2023-11-28 13:17:58,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3520573.3333333335, ans=0.125 2023-11-28 13:18:01,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3520573.3333333335, ans=0.2 2023-11-28 13:18:06,253 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.70 vs. limit=15.0 2023-11-28 13:18:07,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3520640.0, ans=15.0 2023-11-28 13:18:10,208 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528100 2023-11-28 13:18:11,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3520640.0, ans=0.5 2023-11-28 13:18:12,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3520640.0, ans=0.0 2023-11-28 13:18:19,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2023-11-28 13:18:41,365 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11100, loss[loss=0.07688, simple_loss=0.1089, pruned_loss=0.01576, audio_tagging_loss=0.006658, over 15695.00 frames. ], tot_loss[loss=0.06646, simple_loss=0.09042, pruned_loss=0.01247, audio_tagging_loss=0.008784, over 3044723.78 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:18:49,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3520840.0, ans=0.125 2023-11-28 13:19:04,071 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.272e+01 8.992e+01 9.660e+01 1.067e+02 1.331e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 13:19:06,334 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528150 2023-11-28 13:19:23,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3521040.0, ans=0.125 2023-11-28 13:19:30,599 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.63 vs. limit=10.0 2023-11-28 13:19:39,155 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11150, loss[loss=0.06027, simple_loss=0.0831, pruned_loss=0.01032, audio_tagging_loss=0.008393, over 15403.00 frames. ], tot_loss[loss=0.06667, simple_loss=0.09077, pruned_loss=0.01244, audio_tagging_loss=0.008846, over 3047951.06 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:19:40,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3521173.3333333335, ans=0.125 2023-11-28 13:19:48,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3521173.3333333335, ans=0.0 2023-11-28 13:19:57,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3521240.0, ans=0.2 2023-11-28 13:20:04,757 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528200 2023-11-28 13:20:07,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3521306.6666666665, ans=0.0 2023-11-28 13:20:08,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3521306.6666666665, ans=0.0 2023-11-28 13:20:08,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3521306.6666666665, ans=0.125 2023-11-28 13:20:25,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3521440.0, ans=0.125 2023-11-28 13:20:29,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3521440.0, ans=0.125 2023-11-28 13:20:29,676 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.73 vs. limit=15.0 2023-11-28 13:20:34,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3521440.0, ans=0.125 2023-11-28 13:20:36,833 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11200, loss[loss=0.06491, simple_loss=0.0753, pruned_loss=0.01322, audio_tagging_loss=0.01404, over 16369.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.09087, pruned_loss=0.01246, audio_tagging_loss=0.00888, over 3051986.94 frames. ], batch size: 63, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:20:40,047 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2023-11-28 13:21:00,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.780e+01 8.925e+01 9.638e+01 1.050e+02 1.394e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-28 13:21:03,157 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528250 2023-11-28 13:21:17,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3521706.6666666665, ans=0.1 2023-11-28 13:21:19,817 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2023-11-28 13:21:35,062 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11250, loss[loss=0.04778, simple_loss=0.06331, pruned_loss=0.006333, audio_tagging_loss=0.009798, over 15298.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08939, pruned_loss=0.01223, audio_tagging_loss=0.008986, over 3040518.09 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:21:48,798 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.13 vs. limit=10.0 2023-11-28 13:22:00,234 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528300 2023-11-28 13:22:28,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3522106.6666666665, ans=0.0 2023-11-28 13:22:31,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3522106.6666666665, ans=0.0 2023-11-28 13:22:33,014 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11300, loss[loss=0.06456, simple_loss=0.08696, pruned_loss=0.01314, audio_tagging_loss=0.007939, over 14388.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08945, pruned_loss=0.01229, audio_tagging_loss=0.008866, over 3039417.07 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:22:39,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3522173.3333333335, ans=0.125 2023-11-28 13:22:47,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3522240.0, ans=0.0 2023-11-28 13:22:57,900 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.792e+01 8.854e+01 9.489e+01 1.005e+02 1.713e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-28 13:22:58,004 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528350 2023-11-28 13:23:17,372 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:23:20,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3522440.0, ans=0.1 2023-11-28 13:23:25,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3522440.0, ans=0.035 2023-11-28 13:23:30,098 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11350, loss[loss=0.06516, simple_loss=0.08947, pruned_loss=0.01221, audio_tagging_loss=0.008216, over 15792.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08958, pruned_loss=0.01217, audio_tagging_loss=0.008702, over 3046714.83 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:23:30,825 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.80 vs. limit=15.0 2023-11-28 13:23:33,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3522506.6666666665, ans=0.125 2023-11-28 13:23:42,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3522573.3333333335, ans=0.0 2023-11-28 13:23:45,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3522573.3333333335, ans=0.125 2023-11-28 13:23:56,585 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528400 2023-11-28 13:24:09,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3522706.6666666665, ans=0.125 2023-11-28 13:24:18,418 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=15.0 2023-11-28 13:24:28,359 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11400, loss[loss=0.0826, simple_loss=0.1242, pruned_loss=0.01396, audio_tagging_loss=0.006525, over 15353.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08999, pruned_loss=0.01226, audio_tagging_loss=0.00856, over 3045891.08 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:24:42,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3522906.6666666665, ans=0.1 2023-11-28 13:24:54,039 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.279e+01 9.209e+01 9.932e+01 1.057e+02 3.089e+02, threshold=1.986e+02, percent-clipped=1.0 2023-11-28 13:24:54,134 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528450 2023-11-28 13:24:56,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3522973.3333333335, ans=0.05 2023-11-28 13:25:00,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3522973.3333333335, ans=0.125 2023-11-28 13:25:07,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3523040.0, ans=0.2 2023-11-28 13:25:08,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3523040.0, ans=0.025 2023-11-28 13:25:08,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.07 vs. limit=15.0 2023-11-28 13:25:15,174 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.69 vs. limit=6.0 2023-11-28 13:25:27,088 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11450, loss[loss=0.05723, simple_loss=0.07586, pruned_loss=0.009606, audio_tagging_loss=0.009696, over 14170.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08929, pruned_loss=0.01211, audio_tagging_loss=0.008538, over 3042692.48 frames. ], batch size: 54, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:25:36,587 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.50 vs. limit=22.5 2023-11-28 13:25:38,472 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=22.5 2023-11-28 13:25:51,784 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528500 2023-11-28 13:25:58,492 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:26:03,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3523373.3333333335, ans=0.0 2023-11-28 13:26:09,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3523373.3333333335, ans=0.0 2023-11-28 13:26:10,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3523373.3333333335, ans=0.125 2023-11-28 13:26:15,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3523440.0, ans=0.2 2023-11-28 13:26:24,137 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11500, loss[loss=0.07743, simple_loss=0.1152, pruned_loss=0.01382, audio_tagging_loss=0.006008, over 15152.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08911, pruned_loss=0.01195, audio_tagging_loss=0.008582, over 3042316.86 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:26:32,683 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:26:50,285 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.439e+01 8.727e+01 9.422e+01 1.009e+02 1.518e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 13:26:50,382 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528550 2023-11-28 13:26:57,884 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.80 vs. limit=15.0 2023-11-28 13:27:00,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3523706.6666666665, ans=0.0 2023-11-28 13:27:04,457 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2023-11-28 13:27:07,636 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=12.0 2023-11-28 13:27:15,226 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:27:15,675 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2023-11-28 13:27:17,251 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-11-28 13:27:22,074 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11550, loss[loss=0.09277, simple_loss=0.1309, pruned_loss=0.02081, audio_tagging_loss=0.006501, over 15214.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08896, pruned_loss=0.0119, audio_tagging_loss=0.008529, over 3043446.49 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 8.0 2023-11-28 13:27:23,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3523840.0, ans=0.09899494936611666 2023-11-28 13:27:42,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3523906.6666666665, ans=0.125 2023-11-28 13:27:47,958 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528600 2023-11-28 13:27:50,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3523973.3333333335, ans=0.125 2023-11-28 13:27:53,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3523973.3333333335, ans=0.125 2023-11-28 13:27:59,027 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2023-11-28 13:28:03,202 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:28:21,094 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11600, loss[loss=0.06716, simple_loss=0.09699, pruned_loss=0.01131, audio_tagging_loss=0.00736, over 15044.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08847, pruned_loss=0.01176, audio_tagging_loss=0.008609, over 3044760.03 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:28:32,780 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2023-11-28 13:28:45,928 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.331e+01 9.016e+01 9.615e+01 1.039e+02 1.434e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-28 13:28:46,038 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528650 2023-11-28 13:28:50,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3524306.6666666665, ans=0.1 2023-11-28 13:28:53,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3524373.3333333335, ans=0.125 2023-11-28 13:29:05,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3524373.3333333335, ans=0.1 2023-11-28 13:29:18,255 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11650, loss[loss=0.0575, simple_loss=0.06725, pruned_loss=0.01172, audio_tagging_loss=0.01215, over 14724.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08786, pruned_loss=0.01173, audio_tagging_loss=0.008688, over 3051350.25 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:29:25,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3524506.6666666665, ans=0.1 2023-11-28 13:29:40,142 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.42 vs. limit=10.0 2023-11-28 13:29:43,048 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528700 2023-11-28 13:29:58,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3524706.6666666665, ans=0.2 2023-11-28 13:30:04,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3524773.3333333335, ans=0.125 2023-11-28 13:30:15,758 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11700, loss[loss=0.07631, simple_loss=0.1005, pruned_loss=0.01441, audio_tagging_loss=0.01166, over 15507.00 frames. ], tot_loss[loss=0.06409, simple_loss=0.08735, pruned_loss=0.01164, audio_tagging_loss=0.008771, over 3051739.10 frames. ], batch size: 60, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:30:17,246 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:30:19,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3524840.0, ans=0.0 2023-11-28 13:30:23,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3524840.0, ans=0.125 2023-11-28 13:30:41,690 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 8.706e+01 9.225e+01 1.001e+02 1.364e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-28 13:30:41,788 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528750 2023-11-28 13:30:47,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3524973.3333333335, ans=0.1 2023-11-28 13:30:52,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3525040.0, ans=0.125 2023-11-28 13:30:56,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3525040.0, ans=0.035 2023-11-28 13:30:58,841 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:31:10,029 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-28 13:31:12,752 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11750, loss[loss=0.07589, simple_loss=0.1136, pruned_loss=0.01175, audio_tagging_loss=0.007333, over 16299.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08784, pruned_loss=0.01182, audio_tagging_loss=0.00877, over 3049493.56 frames. ], batch size: 59, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:31:32,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3525240.0, ans=0.1 2023-11-28 13:31:38,560 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528800 2023-11-28 13:31:45,664 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.81 vs. limit=15.0 2023-11-28 13:32:02,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3525440.0, ans=0.125 2023-11-28 13:32:07,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3525440.0, ans=0.0 2023-11-28 13:32:11,793 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11800, loss[loss=0.07996, simple_loss=0.1101, pruned_loss=0.01638, audio_tagging_loss=0.008523, over 15151.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08781, pruned_loss=0.01181, audio_tagging_loss=0.008773, over 3044472.33 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:32:15,335 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:32:36,407 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.895e+01 9.052e+01 9.553e+01 1.035e+02 2.670e+02, threshold=1.911e+02, percent-clipped=1.0 2023-11-28 13:32:36,506 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528850 2023-11-28 13:32:38,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3525640.0, ans=0.125 2023-11-28 13:32:50,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3525706.6666666665, ans=0.0 2023-11-28 13:33:08,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3525840.0, ans=0.125 2023-11-28 13:33:09,422 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11850, loss[loss=0.05736, simple_loss=0.06967, pruned_loss=0.01178, audio_tagging_loss=0.01075, over 12951.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08759, pruned_loss=0.01192, audio_tagging_loss=0.008837, over 3038924.63 frames. ], batch size: 53, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:33:10,178 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.98 vs. limit=10.0 2023-11-28 13:33:18,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3525840.0, ans=0.0 2023-11-28 13:33:19,880 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2023-11-28 13:33:35,033 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528900 2023-11-28 13:33:55,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3526106.6666666665, ans=0.2 2023-11-28 13:33:55,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3526106.6666666665, ans=0.0 2023-11-28 13:33:56,126 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.84 vs. limit=22.5 2023-11-28 13:34:04,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3526106.6666666665, ans=0.07 2023-11-28 13:34:06,363 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11900, loss[loss=0.06096, simple_loss=0.0837, pruned_loss=0.01031, audio_tagging_loss=0.008804, over 15144.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08899, pruned_loss=0.01218, audio_tagging_loss=0.008853, over 3044726.99 frames. ], batch size: 56, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:34:07,173 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.60 vs. limit=15.0 2023-11-28 13:34:08,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3526173.3333333335, ans=0.125 2023-11-28 13:34:31,989 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.911e+01 9.791e+01 1.051e+02 1.188e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-28 13:34:32,085 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 528950 2023-11-28 13:34:36,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3526306.6666666665, ans=0.125 2023-11-28 13:34:46,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3526373.3333333335, ans=0.125 2023-11-28 13:34:55,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3526440.0, ans=0.125 2023-11-28 13:35:01,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3526440.0, ans=0.0 2023-11-28 13:35:05,014 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 11950, loss[loss=0.05337, simple_loss=0.06138, pruned_loss=0.009768, audio_tagging_loss=0.01292, over 14558.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08887, pruned_loss=0.01223, audio_tagging_loss=0.008911, over 3039250.30 frames. ], batch size: 57, lr: 1.53e-03, grad_scale: 16.0 2023-11-28 13:35:28,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3526640.0, ans=0.125 2023-11-28 13:35:29,905 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529000 2023-11-28 13:35:38,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3526706.6666666665, ans=0.2 2023-11-28 13:35:38,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3526706.6666666665, ans=0.0 2023-11-28 13:35:39,580 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.84 vs. limit=15.0 2023-11-28 13:35:41,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=3526706.6666666665, ans=0.1 2023-11-28 13:35:44,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3526706.6666666665, ans=0.125 2023-11-28 13:35:51,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3526773.3333333335, ans=0.0 2023-11-28 13:35:59,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3526773.3333333335, ans=0.0 2023-11-28 13:36:02,065 INFO [train_asr.py:1235] (2/4) Epoch 44, batch 12000, loss[loss=0.09216, simple_loss=0.1262, pruned_loss=0.02202, audio_tagging_loss=0.007051, over 14982.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08931, pruned_loss=0.01237, audio_tagging_loss=0.008863, over 3040057.82 frames. ], batch size: 55, lr: 1.53e-03, grad_scale: 32.0 2023-11-28 13:36:02,066 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 13:36:26,095 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3445, 5.0328, 4.6951, 5.1767], device='cuda:2') 2023-11-28 13:36:37,275 INFO [train_asr.py:1267] (2/4) Epoch 44, validation: loss=0.05811, simple_loss=0.05058, pruned_loss=0.005337, audio_tagging_loss=0.02748, over 4681554.00 frames. 2023-11-28 13:36:37,276 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 13:36:54,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3526906.6666666665, ans=0.125 2023-11-28 13:37:00,784 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529050 2023-11-28 13:37:01,756 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.922e+01 8.972e+01 9.530e+01 1.015e+02 1.256e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 13:37:21,555 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 0, loss[loss=0.06762, simple_loss=0.07518, pruned_loss=0.00836, audio_tagging_loss=0.02167, over 14116.00 frames. ], tot_loss[loss=0.06762, simple_loss=0.07518, pruned_loss=0.00836, audio_tagging_loss=0.02167, over 14116.00 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 13:37:21,556 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 13:37:51,114 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6006, 3.6498, 3.9885, 3.3776], device='cuda:2') 2023-11-28 13:37:56,015 INFO [train_asr.py:1267] (2/4) Epoch 45, validation: loss=0.05764, simple_loss=0.05062, pruned_loss=0.005372, audio_tagging_loss=0.02696, over 4681554.00 frames. 2023-11-28 13:37:56,015 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 13:38:14,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3527080.0, ans=0.0 2023-11-28 13:38:18,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3527146.6666666665, ans=0.1 2023-11-28 13:38:24,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3527146.6666666665, ans=0.05 2023-11-28 13:38:26,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3527146.6666666665, ans=0.0 2023-11-28 13:38:38,034 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.34 vs. limit=15.0 2023-11-28 13:38:46,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3527280.0, ans=0.125 2023-11-28 13:38:49,231 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529100 2023-11-28 13:38:49,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3527280.0, ans=0.0 2023-11-28 13:38:53,504 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 50, loss[loss=0.06946, simple_loss=0.09253, pruned_loss=0.01029, audio_tagging_loss=0.01291, over 16878.00 frames. ], tot_loss[loss=0.07458, simple_loss=0.0924, pruned_loss=0.01215, audio_tagging_loss=0.01623, over 687431.95 frames. ], batch size: 64, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:38:57,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3527346.6666666665, ans=0.125 2023-11-28 13:39:06,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3527413.3333333335, ans=0.125 2023-11-28 13:39:11,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3527413.3333333335, ans=0.125 2023-11-28 13:39:24,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2023-11-28 13:39:32,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3527546.6666666665, ans=0.0 2023-11-28 13:39:34,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3527546.6666666665, ans=15.0 2023-11-28 13:39:47,118 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529150 2023-11-28 13:39:49,231 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.341e+01 9.943e+01 1.065e+02 1.140e+02 1.453e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-28 13:39:51,484 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 100, loss[loss=0.0741, simple_loss=0.09233, pruned_loss=0.0149, audio_tagging_loss=0.01304, over 14245.00 frames. ], tot_loss[loss=0.07449, simple_loss=0.09299, pruned_loss=0.01232, audio_tagging_loss=0.01568, over 1209565.86 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:40:02,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-28 13:40:09,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3527746.6666666665, ans=0.125 2023-11-28 13:40:12,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3527746.6666666665, ans=0.1 2023-11-28 13:40:27,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3527880.0, ans=0.2 2023-11-28 13:40:37,694 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.48 vs. limit=15.0 2023-11-28 13:40:39,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3527946.6666666665, ans=0.125 2023-11-28 13:40:45,479 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529200 2023-11-28 13:40:50,296 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 150, loss[loss=0.06155, simple_loss=0.07862, pruned_loss=0.01096, audio_tagging_loss=0.01129, over 15823.00 frames. ], tot_loss[loss=0.07184, simple_loss=0.0913, pruned_loss=0.01206, audio_tagging_loss=0.01412, over 1626705.83 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:40:53,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3528013.3333333335, ans=0.125 2023-11-28 13:41:20,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3528146.6666666665, ans=0.125 2023-11-28 13:41:39,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3528280.0, ans=0.125 2023-11-28 13:41:43,796 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529250 2023-11-28 13:41:46,534 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.992e+01 9.963e+01 1.064e+02 1.457e+02, threshold=1.993e+02, percent-clipped=0.0 2023-11-28 13:41:48,780 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 200, loss[loss=0.05256, simple_loss=0.0718, pruned_loss=0.008726, audio_tagging_loss=0.007928, over 15267.00 frames. ], tot_loss[loss=0.07067, simple_loss=0.09148, pruned_loss=0.01239, audio_tagging_loss=0.01254, over 1942902.34 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:41:49,582 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.15 vs. limit=15.0 2023-11-28 13:42:02,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3528413.3333333335, ans=15.0 2023-11-28 13:42:12,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3528480.0, ans=0.125 2023-11-28 13:42:18,191 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2023-11-28 13:42:27,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3528546.6666666665, ans=0.125 2023-11-28 13:42:29,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3528546.6666666665, ans=0.125 2023-11-28 13:42:39,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3528613.3333333335, ans=0.125 2023-11-28 13:42:41,608 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529300 2023-11-28 13:42:44,801 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2023-11-28 13:42:46,519 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 250, loss[loss=0.05353, simple_loss=0.07516, pruned_loss=0.007728, audio_tagging_loss=0.008219, over 15704.00 frames. ], tot_loss[loss=0.0691, simple_loss=0.0908, pruned_loss=0.01229, audio_tagging_loss=0.01141, over 2180214.17 frames. ], batch size: 64, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:42:50,249 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2023-11-28 13:42:52,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3528680.0, ans=0.0 2023-11-28 13:43:04,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3528746.6666666665, ans=0.0 2023-11-28 13:43:22,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3528880.0, ans=0.2 2023-11-28 13:43:39,483 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529350 2023-11-28 13:43:39,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3528946.6666666665, ans=0.0 2023-11-28 13:43:42,044 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.940e+01 8.918e+01 9.810e+01 1.066e+02 1.328e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-28 13:43:44,757 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 300, loss[loss=0.05818, simple_loss=0.08419, pruned_loss=0.00989, audio_tagging_loss=0.006201, over 15806.00 frames. ], tot_loss[loss=0.06738, simple_loss=0.08942, pruned_loss=0.01202, audio_tagging_loss=0.01065, over 2372943.95 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:44:31,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3529280.0, ans=0.125 2023-11-28 13:44:37,740 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529400 2023-11-28 13:44:42,415 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 350, loss[loss=0.07201, simple_loss=0.1, pruned_loss=0.01258, audio_tagging_loss=0.009402, over 15369.00 frames. ], tot_loss[loss=0.06708, simple_loss=0.08988, pruned_loss=0.01209, audio_tagging_loss=0.01006, over 2526072.90 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:44:49,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3529346.6666666665, ans=0.0 2023-11-28 13:44:57,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3529413.3333333335, ans=0.125 2023-11-28 13:44:58,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3529413.3333333335, ans=0.0 2023-11-28 13:45:00,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3529413.3333333335, ans=0.125 2023-11-28 13:45:07,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3529480.0, ans=0.0 2023-11-28 13:45:10,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3529480.0, ans=0.125 2023-11-28 13:45:31,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3529613.3333333335, ans=0.0 2023-11-28 13:45:35,881 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529450 2023-11-28 13:45:38,642 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 9.086e+01 9.699e+01 1.038e+02 1.395e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-28 13:45:40,891 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 400, loss[loss=0.06015, simple_loss=0.08746, pruned_loss=0.008544, audio_tagging_loss=0.00788, over 15979.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.08973, pruned_loss=0.01196, audio_tagging_loss=0.009715, over 2639781.31 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 13:46:03,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3529813.3333333335, ans=0.1 2023-11-28 13:46:14,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3529880.0, ans=0.1 2023-11-28 13:46:16,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=3529880.0, ans=12.0 2023-11-28 13:46:34,067 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529500 2023-11-28 13:46:38,932 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 450, loss[loss=0.04822, simple_loss=0.06215, pruned_loss=0.007041, audio_tagging_loss=0.01011, over 15937.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08898, pruned_loss=0.01203, audio_tagging_loss=0.009563, over 2730558.32 frames. ], batch size: 63, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:46:59,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3530080.0, ans=0.5 2023-11-28 13:47:02,723 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:47:32,241 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529550 2023-11-28 13:47:35,465 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.821e+01 9.442e+01 9.964e+01 1.327e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 13:47:36,655 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 500, loss[loss=0.06365, simple_loss=0.0743, pruned_loss=0.01349, audio_tagging_loss=0.01301, over 15910.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08826, pruned_loss=0.01199, audio_tagging_loss=0.009322, over 2800156.26 frames. ], batch size: 61, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:48:00,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3530480.0, ans=0.2 2023-11-28 13:48:15,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3530546.6666666665, ans=0.125 2023-11-28 13:48:23,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3530613.3333333335, ans=0.125 2023-11-28 13:48:27,481 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:48:29,484 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529600 2023-11-28 13:48:30,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3530613.3333333335, ans=0.0 2023-11-28 13:48:34,703 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 550, loss[loss=0.06725, simple_loss=0.08715, pruned_loss=0.01602, audio_tagging_loss=0.007647, over 14158.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08889, pruned_loss=0.01214, audio_tagging_loss=0.009154, over 2854038.89 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:48:34,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3530680.0, ans=0.0 2023-11-28 13:48:37,982 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2023-11-28 13:48:45,388 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:48:47,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3530746.6666666665, ans=0.125 2023-11-28 13:48:59,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3530813.3333333335, ans=0.125 2023-11-28 13:49:16,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3530880.0, ans=0.125 2023-11-28 13:49:20,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3530946.6666666665, ans=0.0 2023-11-28 13:49:25,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3530946.6666666665, ans=0.125 2023-11-28 13:49:28,754 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529650 2023-11-28 13:49:31,978 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 8.881e+01 9.298e+01 9.934e+01 2.506e+02, threshold=1.860e+02, percent-clipped=1.0 2023-11-28 13:49:33,538 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 600, loss[loss=0.06355, simple_loss=0.0887, pruned_loss=0.01092, audio_tagging_loss=0.008284, over 15601.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08846, pruned_loss=0.01201, audio_tagging_loss=0.009164, over 2897280.21 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:49:33,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3531013.3333333335, ans=0.0 2023-11-28 13:50:01,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3531146.6666666665, ans=0.0 2023-11-28 13:50:02,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3531146.6666666665, ans=0.1 2023-11-28 13:50:15,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3531213.3333333335, ans=0.2 2023-11-28 13:50:21,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3531280.0, ans=0.1 2023-11-28 13:50:27,215 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529700 2023-11-28 13:50:31,545 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 650, loss[loss=0.07668, simple_loss=0.1021, pruned_loss=0.01601, audio_tagging_loss=0.009636, over 15880.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08966, pruned_loss=0.01214, audio_tagging_loss=0.008988, over 2933439.64 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:50:37,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3531346.6666666665, ans=0.125 2023-11-28 13:50:40,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3531346.6666666665, ans=0.0 2023-11-28 13:50:41,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3531413.3333333335, ans=0.0 2023-11-28 13:51:00,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3531480.0, ans=0.125 2023-11-28 13:51:22,494 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:51:24,492 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529750 2023-11-28 13:51:27,663 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 9.012e+01 9.762e+01 1.029e+02 1.844e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-28 13:51:28,836 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 700, loss[loss=0.05873, simple_loss=0.08119, pruned_loss=0.008549, audio_tagging_loss=0.009586, over 15621.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08986, pruned_loss=0.01216, audio_tagging_loss=0.008936, over 2963222.70 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:51:29,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3531680.0, ans=0.125 2023-11-28 13:51:29,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3531680.0, ans=0.2 2023-11-28 13:51:37,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3531680.0, ans=0.0 2023-11-28 13:51:42,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3531746.6666666665, ans=0.0 2023-11-28 13:51:44,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3531746.6666666665, ans=0.125 2023-11-28 13:52:13,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3531880.0, ans=0.0 2023-11-28 13:52:22,762 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529800 2023-11-28 13:52:28,095 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 750, loss[loss=0.08446, simple_loss=0.1203, pruned_loss=0.0178, audio_tagging_loss=0.006483, over 14548.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09036, pruned_loss=0.01225, audio_tagging_loss=0.008855, over 2980735.24 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:52:52,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3532146.6666666665, ans=0.0 2023-11-28 13:52:52,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3532146.6666666665, ans=0.2 2023-11-28 13:53:04,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3532213.3333333335, ans=0.125 2023-11-28 13:53:07,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3532213.3333333335, ans=15.0 2023-11-28 13:53:10,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3532213.3333333335, ans=0.125 2023-11-28 13:53:22,426 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529850 2023-11-28 13:53:23,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3532280.0, ans=0.125 2023-11-28 13:53:25,746 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.782e+01 9.191e+01 9.653e+01 1.030e+02 1.250e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-28 13:53:26,932 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 800, loss[loss=0.06111, simple_loss=0.08639, pruned_loss=0.007956, audio_tagging_loss=0.009957, over 14866.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09004, pruned_loss=0.01229, audio_tagging_loss=0.008953, over 3000357.84 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 13:53:40,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3532413.3333333335, ans=0.125 2023-11-28 13:53:49,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3532480.0, ans=0.2 2023-11-28 13:54:12,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3532613.3333333335, ans=0.125 2023-11-28 13:54:15,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3532613.3333333335, ans=0.0 2023-11-28 13:54:18,306 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.34 vs. limit=12.0 2023-11-28 13:54:20,014 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529900 2023-11-28 13:54:20,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3532613.3333333335, ans=0.025 2023-11-28 13:54:20,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3532613.3333333335, ans=0.125 2023-11-28 13:54:21,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3532613.3333333335, ans=0.07 2023-11-28 13:54:24,379 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 850, loss[loss=0.08023, simple_loss=0.1124, pruned_loss=0.01393, audio_tagging_loss=0.01012, over 16063.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08892, pruned_loss=0.01211, audio_tagging_loss=0.008945, over 3011491.49 frames. ], batch size: 62, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:54:25,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3532680.0, ans=0.125 2023-11-28 13:54:36,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3532746.6666666665, ans=0.125 2023-11-28 13:54:39,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3532746.6666666665, ans=0.125 2023-11-28 13:54:39,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3532746.6666666665, ans=0.035 2023-11-28 13:54:49,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3532813.3333333335, ans=0.125 2023-11-28 13:55:15,255 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=15.0 2023-11-28 13:55:17,002 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:55:17,973 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 529950 2023-11-28 13:55:22,297 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.009e+01 8.821e+01 9.434e+01 1.007e+02 1.194e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 13:55:22,335 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 900, loss[loss=0.05735, simple_loss=0.08359, pruned_loss=0.007001, audio_tagging_loss=0.008559, over 14596.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.0884, pruned_loss=0.01207, audio_tagging_loss=0.009091, over 3009320.12 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:55:55,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3533146.6666666665, ans=0.1 2023-11-28 13:56:16,913 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530000 2023-11-28 13:56:17,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3533280.0, ans=0.125 2023-11-28 13:56:21,523 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 950, loss[loss=0.05109, simple_loss=0.07107, pruned_loss=0.008026, audio_tagging_loss=0.00753, over 14004.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08848, pruned_loss=0.01204, audio_tagging_loss=0.009082, over 3019431.61 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:56:41,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3533413.3333333335, ans=0.95 2023-11-28 13:56:41,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3533413.3333333335, ans=0.1 2023-11-28 13:56:43,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3533480.0, ans=0.0 2023-11-28 13:56:45,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3533480.0, ans=0.1 2023-11-28 13:56:47,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3533480.0, ans=0.125 2023-11-28 13:57:14,562 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530050 2023-11-28 13:57:16,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3533613.3333333335, ans=0.2 2023-11-28 13:57:18,892 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.954e+01 9.513e+01 1.020e+02 1.278e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 13:57:18,919 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1000, loss[loss=0.06907, simple_loss=0.1002, pruned_loss=0.01165, audio_tagging_loss=0.007342, over 16336.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08865, pruned_loss=0.01207, audio_tagging_loss=0.008874, over 3025509.94 frames. ], batch size: 60, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:57:33,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3533746.6666666665, ans=0.125 2023-11-28 13:57:42,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3533813.3333333335, ans=0.125 2023-11-28 13:57:45,889 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:58:00,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3533880.0, ans=0.125 2023-11-28 13:58:11,620 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530100 2023-11-28 13:58:15,935 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1050, loss[loss=0.05967, simple_loss=0.08096, pruned_loss=0.01354, audio_tagging_loss=0.005654, over 15087.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.088, pruned_loss=0.01205, audio_tagging_loss=0.008763, over 3034361.67 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:58:26,930 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2023-11-28 13:58:29,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3534080.0, ans=0.07 2023-11-28 13:58:33,237 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 13:58:33,889 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.37 vs. limit=15.0 2023-11-28 13:58:42,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3534146.6666666665, ans=0.125 2023-11-28 13:59:01,673 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=12.0 2023-11-28 13:59:02,740 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.58 vs. limit=15.0 2023-11-28 13:59:05,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3534280.0, ans=0.0 2023-11-28 13:59:09,397 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530150 2023-11-28 13:59:14,281 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 8.910e+01 9.787e+01 1.025e+02 1.500e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-28 13:59:14,306 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1100, loss[loss=0.08484, simple_loss=0.1233, pruned_loss=0.01654, audio_tagging_loss=0.006635, over 15169.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08823, pruned_loss=0.01205, audio_tagging_loss=0.008743, over 3042683.84 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 13:59:14,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3534346.6666666665, ans=0.125 2023-11-28 13:59:19,332 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 13:59:19,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3534346.6666666665, ans=0.125 2023-11-28 13:59:24,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3534346.6666666665, ans=0.0 2023-11-28 13:59:24,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3534346.6666666665, ans=0.0 2023-11-28 13:59:24,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3534346.6666666665, ans=0.1 2023-11-28 13:59:28,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3534413.3333333335, ans=0.125 2023-11-28 13:59:40,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3534480.0, ans=0.0 2023-11-28 13:59:50,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3534546.6666666665, ans=0.125 2023-11-28 13:59:50,903 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=15.0 2023-11-28 13:59:53,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3534546.6666666665, ans=0.0 2023-11-28 14:00:08,104 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530200 2023-11-28 14:00:12,833 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1150, loss[loss=0.06707, simple_loss=0.1012, pruned_loss=0.01079, audio_tagging_loss=0.005659, over 15303.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08825, pruned_loss=0.01195, audio_tagging_loss=0.008745, over 3042487.07 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:00:25,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3534746.6666666665, ans=0.0 2023-11-28 14:00:30,497 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2023-11-28 14:00:48,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3534880.0, ans=0.125 2023-11-28 14:00:50,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3534880.0, ans=0.04949747468305833 2023-11-28 14:00:54,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3534880.0, ans=0.0 2023-11-28 14:01:06,344 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530250 2023-11-28 14:01:07,079 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=22.5 2023-11-28 14:01:10,687 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.181e+01 9.000e+01 9.554e+01 1.019e+02 1.286e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 14:01:10,714 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1200, loss[loss=0.04589, simple_loss=0.06253, pruned_loss=0.005771, audio_tagging_loss=0.008855, over 14690.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08947, pruned_loss=0.0121, audio_tagging_loss=0.008641, over 3047597.47 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:01:21,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3535080.0, ans=0.125 2023-11-28 14:01:34,848 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2023-11-28 14:01:35,878 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2023-11-28 14:01:43,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3535146.6666666665, ans=0.125 2023-11-28 14:02:04,430 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530300 2023-11-28 14:02:09,372 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1250, loss[loss=0.07671, simple_loss=0.1042, pruned_loss=0.01764, audio_tagging_loss=0.006962, over 15648.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08975, pruned_loss=0.01213, audio_tagging_loss=0.008518, over 3045767.19 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:02:28,500 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-28 14:02:30,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3535413.3333333335, ans=0.125 2023-11-28 14:02:30,497 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2023-11-28 14:02:34,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3535480.0, ans=0.0 2023-11-28 14:02:52,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3535546.6666666665, ans=0.125 2023-11-28 14:02:55,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3535613.3333333335, ans=0.2 2023-11-28 14:03:02,139 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530350 2023-11-28 14:03:06,881 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2023-11-28 14:03:07,353 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1300, loss[loss=0.04528, simple_loss=0.05836, pruned_loss=0.0073, audio_tagging_loss=0.008802, over 15259.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09023, pruned_loss=0.01208, audio_tagging_loss=0.008471, over 3041931.60 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:03:08,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 8.419e+01 9.205e+01 1.020e+02 1.250e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-28 14:03:34,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3535813.3333333335, ans=0.0 2023-11-28 14:03:34,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.22 vs. limit=10.0 2023-11-28 14:03:37,240 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.93 vs. limit=22.5 2023-11-28 14:03:55,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3535946.6666666665, ans=0.2 2023-11-28 14:04:01,216 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530400 2023-11-28 14:04:02,881 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=22.5 2023-11-28 14:04:05,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3536013.3333333335, ans=0.05 2023-11-28 14:04:05,970 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1350, loss[loss=0.07109, simple_loss=0.09572, pruned_loss=0.01582, audio_tagging_loss=0.007417, over 15840.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09002, pruned_loss=0.01214, audio_tagging_loss=0.008546, over 3040025.23 frames. ], batch size: 62, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:04:06,515 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=15.0 2023-11-28 14:04:26,178 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.82 vs. limit=15.0 2023-11-28 14:04:30,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3536146.6666666665, ans=0.0 2023-11-28 14:04:31,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3536146.6666666665, ans=0.125 2023-11-28 14:04:39,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3536213.3333333335, ans=0.125 2023-11-28 14:04:47,708 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2023-11-28 14:04:47,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3536213.3333333335, ans=15.0 2023-11-28 14:04:50,242 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:04:59,517 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530450 2023-11-28 14:05:03,905 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1400, loss[loss=0.07645, simple_loss=0.1049, pruned_loss=0.01497, audio_tagging_loss=0.00902, over 15495.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08955, pruned_loss=0.01193, audio_tagging_loss=0.008646, over 3036821.37 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:05:06,573 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.943e+01 9.366e+01 9.966e+01 1.345e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 14:05:08,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3536346.6666666665, ans=0.125 2023-11-28 14:05:10,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3536346.6666666665, ans=0.125 2023-11-28 14:05:23,509 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.97 vs. limit=12.0 2023-11-28 14:05:45,638 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.91 vs. limit=12.0 2023-11-28 14:05:57,449 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530500 2023-11-28 14:05:59,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3536613.3333333335, ans=0.0 2023-11-28 14:06:00,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3536680.0, ans=0.0 2023-11-28 14:06:01,796 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1450, loss[loss=0.05124, simple_loss=0.06951, pruned_loss=0.007376, audio_tagging_loss=0.009111, over 14405.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08998, pruned_loss=0.01204, audio_tagging_loss=0.008638, over 3040671.26 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:06:10,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3536680.0, ans=0.125 2023-11-28 14:06:16,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3536746.6666666665, ans=0.1 2023-11-28 14:06:27,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3536813.3333333335, ans=0.0 2023-11-28 14:06:32,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3536813.3333333335, ans=0.125 2023-11-28 14:06:55,258 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530550 2023-11-28 14:07:00,236 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1500, loss[loss=0.05275, simple_loss=0.07408, pruned_loss=0.006597, audio_tagging_loss=0.009117, over 14729.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08928, pruned_loss=0.01217, audio_tagging_loss=0.008628, over 3041620.68 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:07:02,456 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 9.037e+01 9.664e+01 1.030e+02 1.385e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 14:07:27,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3537146.6666666665, ans=0.125 2023-11-28 14:07:31,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3537146.6666666665, ans=0.2 2023-11-28 14:07:37,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3537213.3333333335, ans=0.125 2023-11-28 14:07:47,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3537280.0, ans=0.0 2023-11-28 14:07:52,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3537280.0, ans=0.125 2023-11-28 14:07:53,601 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530600 2023-11-28 14:07:58,724 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1550, loss[loss=0.05187, simple_loss=0.07138, pruned_loss=0.008147, audio_tagging_loss=0.00803, over 14881.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08975, pruned_loss=0.01229, audio_tagging_loss=0.0087, over 3038385.53 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 8.0 2023-11-28 14:08:05,374 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2023-11-28 14:08:07,588 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2023-11-28 14:08:36,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3537546.6666666665, ans=0.5 2023-11-28 14:08:49,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3537613.3333333335, ans=0.125 2023-11-28 14:08:51,557 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530650 2023-11-28 14:08:55,986 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1600, loss[loss=0.05953, simple_loss=0.07621, pruned_loss=0.01023, audio_tagging_loss=0.0112, over 14768.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.08953, pruned_loss=0.01224, audio_tagging_loss=0.008762, over 3037397.83 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:08:56,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3537680.0, ans=0.0 2023-11-28 14:08:58,165 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.016e+01 9.104e+01 9.583e+01 1.052e+02 1.503e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 14:08:58,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3537680.0, ans=0.2 2023-11-28 14:09:03,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3537680.0, ans=0.2 2023-11-28 14:09:04,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3537680.0, ans=0.0 2023-11-28 14:09:05,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3537680.0, ans=0.125 2023-11-28 14:09:12,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3537746.6666666665, ans=0.07 2023-11-28 14:09:42,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3537946.6666666665, ans=0.1 2023-11-28 14:09:48,726 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530700 2023-11-28 14:09:53,812 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1650, loss[loss=0.06148, simple_loss=0.08143, pruned_loss=0.01128, audio_tagging_loss=0.009485, over 14507.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08987, pruned_loss=0.01221, audio_tagging_loss=0.008721, over 3034763.62 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:10:20,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3538146.6666666665, ans=0.5 2023-11-28 14:10:47,167 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530750 2023-11-28 14:10:48,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3538280.0, ans=0.125 2023-11-28 14:10:48,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3538280.0, ans=0.125 2023-11-28 14:10:51,980 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1700, loss[loss=0.06472, simple_loss=0.08525, pruned_loss=0.01137, audio_tagging_loss=0.01073, over 14908.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08966, pruned_loss=0.01221, audio_tagging_loss=0.008804, over 3042870.42 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:10:54,247 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 8.956e+01 9.565e+01 1.008e+02 1.733e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-28 14:10:56,066 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.43 vs. limit=15.0 2023-11-28 14:11:19,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3538480.0, ans=0.125 2023-11-28 14:11:27,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3538546.6666666665, ans=0.125 2023-11-28 14:11:30,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3538546.6666666665, ans=0.0 2023-11-28 14:11:35,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3538546.6666666665, ans=0.07 2023-11-28 14:11:45,479 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530800 2023-11-28 14:11:50,091 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1750, loss[loss=0.05869, simple_loss=0.08486, pruned_loss=0.008162, audio_tagging_loss=0.008095, over 15005.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08985, pruned_loss=0.01224, audio_tagging_loss=0.00872, over 3047206.13 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:12:00,266 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:12:03,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3538746.6666666665, ans=0.2 2023-11-28 14:12:18,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3538813.3333333335, ans=0.2 2023-11-28 14:12:20,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3538813.3333333335, ans=0.2 2023-11-28 14:12:23,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3538880.0, ans=0.0 2023-11-28 14:12:28,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3538880.0, ans=0.0 2023-11-28 14:12:30,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3538880.0, ans=0.1 2023-11-28 14:12:41,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3538946.6666666665, ans=0.125 2023-11-28 14:12:43,154 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530850 2023-11-28 14:12:47,409 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1800, loss[loss=0.07559, simple_loss=0.1118, pruned_loss=0.01341, audio_tagging_loss=0.006275, over 14671.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.0903, pruned_loss=0.01199, audio_tagging_loss=0.008586, over 3050536.35 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:12:50,236 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.905e+01 9.378e+01 9.880e+01 1.265e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 14:13:22,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3539213.3333333335, ans=0.125 2023-11-28 14:13:27,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3539213.3333333335, ans=0.125 2023-11-28 14:13:41,506 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530900 2023-11-28 14:13:46,507 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1850, loss[loss=0.05602, simple_loss=0.0744, pruned_loss=0.007652, audio_tagging_loss=0.01117, over 14976.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08858, pruned_loss=0.01183, audio_tagging_loss=0.008681, over 3044104.18 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:14:01,418 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.70 vs. limit=6.0 2023-11-28 14:14:07,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3539413.3333333335, ans=0.125 2023-11-28 14:14:09,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3539480.0, ans=0.0 2023-11-28 14:14:27,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3539546.6666666665, ans=0.1 2023-11-28 14:14:40,727 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 530950 2023-11-28 14:14:45,117 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1900, loss[loss=0.05657, simple_loss=0.07807, pruned_loss=0.007721, audio_tagging_loss=0.009817, over 15843.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.0884, pruned_loss=0.01186, audio_tagging_loss=0.008627, over 3048973.68 frames. ], batch size: 59, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:14:45,511 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2023-11-28 14:14:47,346 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 8.623e+01 9.343e+01 1.003e+02 1.342e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-28 14:15:03,998 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.21 vs. limit=15.0 2023-11-28 14:15:09,440 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2023-11-28 14:15:21,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3539880.0, ans=0.1 2023-11-28 14:15:24,427 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2023-11-28 14:15:26,417 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.91 vs. limit=15.0 2023-11-28 14:15:37,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3539946.6666666665, ans=0.125 2023-11-28 14:15:38,292 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531000 2023-11-28 14:15:43,030 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 1950, loss[loss=0.06323, simple_loss=0.08815, pruned_loss=0.01141, audio_tagging_loss=0.007742, over 14396.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08856, pruned_loss=0.01194, audio_tagging_loss=0.00858, over 3044669.11 frames. ], batch size: 53, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:15:48,468 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-28 14:15:53,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3540080.0, ans=0.125 2023-11-28 14:15:58,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3540080.0, ans=0.1 2023-11-28 14:16:12,665 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.01 vs. limit=22.5 2023-11-28 14:16:35,974 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531050 2023-11-28 14:16:40,329 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2000, loss[loss=0.07973, simple_loss=0.1114, pruned_loss=0.01358, audio_tagging_loss=0.01046, over 16082.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08915, pruned_loss=0.01216, audio_tagging_loss=0.00856, over 3041458.45 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:16:42,531 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.094e+01 8.874e+01 9.480e+01 1.027e+02 1.449e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 14:17:11,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3540480.0, ans=0.125 2023-11-28 14:17:16,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3540546.6666666665, ans=0.125 2023-11-28 14:17:22,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3540546.6666666665, ans=0.0 2023-11-28 14:17:34,608 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531100 2023-11-28 14:17:35,062 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=12.0 2023-11-28 14:17:37,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3540613.3333333335, ans=0.125 2023-11-28 14:17:39,030 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2050, loss[loss=0.07874, simple_loss=0.1111, pruned_loss=0.01586, audio_tagging_loss=0.007354, over 15406.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08913, pruned_loss=0.01219, audio_tagging_loss=0.008579, over 3039414.77 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:17:41,649 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2023-11-28 14:17:45,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3540680.0, ans=0.1 2023-11-28 14:18:01,129 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.66 vs. limit=12.0 2023-11-28 14:18:06,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3540813.3333333335, ans=0.125 2023-11-28 14:18:31,755 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531150 2023-11-28 14:18:36,097 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2100, loss[loss=0.06383, simple_loss=0.07886, pruned_loss=0.0149, audio_tagging_loss=0.009492, over 14506.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08971, pruned_loss=0.01215, audio_tagging_loss=0.008527, over 3048392.22 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:18:39,395 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 8.857e+01 9.324e+01 1.026e+02 1.303e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-28 14:18:53,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3541080.0, ans=0.0 2023-11-28 14:18:54,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3541080.0, ans=0.05 2023-11-28 14:18:55,041 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.83 vs. limit=15.0 2023-11-28 14:18:56,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3541080.0, ans=0.125 2023-11-28 14:19:08,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3541146.6666666665, ans=0.2 2023-11-28 14:19:10,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3541213.3333333335, ans=0.0 2023-11-28 14:19:14,120 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2023-11-28 14:19:16,319 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.75 vs. limit=15.0 2023-11-28 14:19:21,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3541280.0, ans=0.125 2023-11-28 14:19:29,730 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531200 2023-11-28 14:19:32,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3541280.0, ans=0.2 2023-11-28 14:19:34,423 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2150, loss[loss=0.0789, simple_loss=0.09257, pruned_loss=0.02274, audio_tagging_loss=0.009871, over 14863.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08935, pruned_loss=0.01214, audio_tagging_loss=0.008556, over 3038166.47 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:19:54,229 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.55 vs. limit=15.0 2023-11-28 14:19:58,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3541480.0, ans=0.125 2023-11-28 14:20:11,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3541546.6666666665, ans=0.125 2023-11-28 14:20:12,114 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:20:12,580 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-11-28 14:20:13,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3541546.6666666665, ans=0.05 2023-11-28 14:20:18,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3541546.6666666665, ans=0.2 2023-11-28 14:20:23,439 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2023-11-28 14:20:24,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3541613.3333333335, ans=0.0 2023-11-28 14:20:24,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3541613.3333333335, ans=0.125 2023-11-28 14:20:28,027 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531250 2023-11-28 14:20:30,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3541613.3333333335, ans=0.1 2023-11-28 14:20:32,777 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2200, loss[loss=0.07778, simple_loss=0.1177, pruned_loss=0.01346, audio_tagging_loss=0.005476, over 15234.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.09001, pruned_loss=0.01234, audio_tagging_loss=0.008615, over 3039777.58 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:20:36,789 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.900e+01 9.511e+01 1.009e+02 1.221e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 14:20:48,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3541746.6666666665, ans=0.125 2023-11-28 14:21:25,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3541946.6666666665, ans=0.0 2023-11-28 14:21:26,758 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531300 2023-11-28 14:21:28,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3541946.6666666665, ans=0.09899494936611666 2023-11-28 14:21:31,076 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2250, loss[loss=0.06859, simple_loss=0.1011, pruned_loss=0.01162, audio_tagging_loss=0.006428, over 15100.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09036, pruned_loss=0.01242, audio_tagging_loss=0.008679, over 3034339.56 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:21:33,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3542013.3333333335, ans=0.125 2023-11-28 14:22:04,222 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2023-11-28 14:22:19,817 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2023-11-28 14:22:24,243 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531350 2023-11-28 14:22:26,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3542280.0, ans=0.1 2023-11-28 14:22:28,599 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2300, loss[loss=0.1086, simple_loss=0.1622, pruned_loss=0.01943, audio_tagging_loss=0.008082, over 15944.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.09085, pruned_loss=0.01234, audio_tagging_loss=0.008625, over 3039644.24 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:22:31,825 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.858e+01 9.166e+01 9.728e+01 1.033e+02 1.405e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-28 14:22:34,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3542346.6666666665, ans=0.125 2023-11-28 14:22:55,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3542480.0, ans=0.0 2023-11-28 14:22:57,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3542480.0, ans=0.2 2023-11-28 14:23:03,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3542546.6666666665, ans=0.0 2023-11-28 14:23:04,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3542546.6666666665, ans=0.125 2023-11-28 14:23:07,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3542546.6666666665, ans=0.125 2023-11-28 14:23:12,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3542546.6666666665, ans=0.0 2023-11-28 14:23:12,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3542546.6666666665, ans=0.125 2023-11-28 14:23:17,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3542613.3333333335, ans=0.0 2023-11-28 14:23:21,468 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:23:22,103 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531400 2023-11-28 14:23:23,646 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.19 vs. limit=22.5 2023-11-28 14:23:26,691 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2350, loss[loss=0.05211, simple_loss=0.07164, pruned_loss=0.006773, audio_tagging_loss=0.009521, over 14943.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09064, pruned_loss=0.01215, audio_tagging_loss=0.00876, over 3043525.80 frames. ], batch size: 57, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:23:34,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3542680.0, ans=0.035 2023-11-28 14:23:53,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3542813.3333333335, ans=0.0 2023-11-28 14:24:04,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3542880.0, ans=0.0 2023-11-28 14:24:16,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3542946.6666666665, ans=0.125 2023-11-28 14:24:19,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3542946.6666666665, ans=0.125 2023-11-28 14:24:20,469 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531450 2023-11-28 14:24:25,472 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2400, loss[loss=0.07206, simple_loss=0.09643, pruned_loss=0.01395, audio_tagging_loss=0.009889, over 14428.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09023, pruned_loss=0.01208, audio_tagging_loss=0.008889, over 3041003.40 frames. ], batch size: 56, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:24:26,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3543013.3333333335, ans=0.025 2023-11-28 14:24:28,800 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.833e+01 8.802e+01 9.417e+01 9.979e+01 1.299e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 14:24:30,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3543013.3333333335, ans=0.125 2023-11-28 14:24:40,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3543080.0, ans=0.2 2023-11-28 14:24:50,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3543146.6666666665, ans=0.125 2023-11-28 14:25:18,427 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531500 2023-11-28 14:25:23,334 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2450, loss[loss=0.04786, simple_loss=0.05953, pruned_loss=0.006742, audio_tagging_loss=0.01136, over 14085.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09001, pruned_loss=0.01201, audio_tagging_loss=0.00886, over 3041029.66 frames. ], batch size: 55, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:25:31,514 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.72 vs. limit=15.0 2023-11-28 14:25:42,696 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2023-11-28 14:25:44,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3543413.3333333335, ans=0.09899494936611666 2023-11-28 14:25:45,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3543480.0, ans=0.1 2023-11-28 14:25:58,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3543546.6666666665, ans=0.125 2023-11-28 14:26:15,811 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.21 vs. limit=22.5 2023-11-28 14:26:16,405 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531550 2023-11-28 14:26:18,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3543613.3333333335, ans=0.125 2023-11-28 14:26:21,222 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2500, loss[loss=0.0593, simple_loss=0.07219, pruned_loss=0.01013, audio_tagging_loss=0.01307, over 15206.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08889, pruned_loss=0.01199, audio_tagging_loss=0.008971, over 3038545.93 frames. ], batch size: 58, lr: 1.51e-03, grad_scale: 32.0 2023-11-28 14:26:25,021 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 9.030e+01 9.693e+01 1.035e+02 1.388e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 14:26:29,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3543680.0, ans=0.125 2023-11-28 14:26:46,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3543813.3333333335, ans=0.125 2023-11-28 14:26:51,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3543813.3333333335, ans=0.125 2023-11-28 14:26:52,249 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2023-11-28 14:26:59,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3543880.0, ans=0.125 2023-11-28 14:27:00,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3543880.0, ans=0.125 2023-11-28 14:27:06,007 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.08 vs. limit=10.0 2023-11-28 14:27:14,735 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531600 2023-11-28 14:27:19,439 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2550, loss[loss=0.04604, simple_loss=0.05799, pruned_loss=0.008545, audio_tagging_loss=0.008504, over 13991.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08875, pruned_loss=0.01192, audio_tagging_loss=0.00891, over 3032946.46 frames. ], batch size: 54, lr: 1.51e-03, grad_scale: 16.0 2023-11-28 14:27:25,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3544013.3333333335, ans=0.125 2023-11-28 14:27:29,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3544013.3333333335, ans=0.1 2023-11-28 14:27:34,326 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=22.5 2023-11-28 14:27:39,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3544080.0, ans=0.125 2023-11-28 14:27:39,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3544080.0, ans=0.125 2023-11-28 14:27:54,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3544213.3333333335, ans=0.2 2023-11-28 14:27:57,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3544213.3333333335, ans=0.125 2023-11-28 14:27:59,349 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.09 vs. limit=22.5 2023-11-28 14:28:13,600 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531650 2023-11-28 14:28:18,580 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2600, loss[loss=0.0489, simple_loss=0.06737, pruned_loss=0.008577, audio_tagging_loss=0.006633, over 15083.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08824, pruned_loss=0.01197, audio_tagging_loss=0.00872, over 3031318.92 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 14:28:21,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3544346.6666666665, ans=0.2 2023-11-28 14:28:24,064 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 8.904e+01 9.542e+01 1.021e+02 1.415e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 14:28:37,187 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2023-11-28 14:28:48,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3544480.0, ans=0.125 2023-11-28 14:28:51,315 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.57 vs. limit=15.0 2023-11-28 14:28:51,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3544546.6666666665, ans=0.1 2023-11-28 14:28:52,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3544546.6666666665, ans=0.125 2023-11-28 14:28:54,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3544546.6666666665, ans=0.1 2023-11-28 14:29:11,515 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531700 2023-11-28 14:29:16,006 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2650, loss[loss=0.06046, simple_loss=0.08166, pruned_loss=0.01098, audio_tagging_loss=0.008644, over 14857.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08876, pruned_loss=0.01202, audio_tagging_loss=0.008601, over 3035100.39 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 14:29:20,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3544680.0, ans=0.1 2023-11-28 14:29:26,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3544680.0, ans=0.1 2023-11-28 14:29:39,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3544813.3333333335, ans=0.125 2023-11-28 14:30:04,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3544946.6666666665, ans=0.1 2023-11-28 14:30:10,272 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531750 2023-11-28 14:30:10,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3544946.6666666665, ans=0.1 2023-11-28 14:30:12,852 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2023-11-28 14:30:14,511 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2700, loss[loss=0.06925, simple_loss=0.0915, pruned_loss=0.01461, audio_tagging_loss=0.008896, over 15178.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08776, pruned_loss=0.01185, audio_tagging_loss=0.008665, over 3043174.00 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 14:30:19,907 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.102e+01 8.994e+01 9.441e+01 1.024e+02 1.210e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 14:30:21,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3545013.3333333335, ans=0.125 2023-11-28 14:30:28,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3545080.0, ans=0.125 2023-11-28 14:30:35,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3545080.0, ans=0.2 2023-11-28 14:30:54,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3545213.3333333335, ans=0.125 2023-11-28 14:30:56,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3545213.3333333335, ans=0.2 2023-11-28 14:30:59,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3545280.0, ans=0.0 2023-11-28 14:31:03,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3545280.0, ans=0.125 2023-11-28 14:31:03,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3545280.0, ans=0.1 2023-11-28 14:31:07,523 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531800 2023-11-28 14:31:07,855 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.26 vs. limit=15.0 2023-11-28 14:31:12,760 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2750, loss[loss=0.07676, simple_loss=0.1065, pruned_loss=0.01389, audio_tagging_loss=0.009629, over 16189.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08796, pruned_loss=0.01185, audio_tagging_loss=0.008604, over 3048018.21 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 14:31:19,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3545346.6666666665, ans=0.125 2023-11-28 14:31:25,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3545413.3333333335, ans=0.04949747468305833 2023-11-28 14:32:02,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3545613.3333333335, ans=0.125 2023-11-28 14:32:05,262 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:32:06,436 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531850 2023-11-28 14:32:09,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3545680.0, ans=0.125 2023-11-28 14:32:09,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3545680.0, ans=0.1 2023-11-28 14:32:10,834 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2800, loss[loss=0.05557, simple_loss=0.08635, pruned_loss=0.005279, audio_tagging_loss=0.007112, over 16308.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08872, pruned_loss=0.01205, audio_tagging_loss=0.008539, over 3045491.22 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:32:14,814 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:32:16,757 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.903e+01 9.379e+01 1.017e+02 3.083e+02, threshold=1.876e+02, percent-clipped=1.0 2023-11-28 14:32:30,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3545746.6666666665, ans=0.02 2023-11-28 14:33:01,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3545946.6666666665, ans=0.2 2023-11-28 14:33:04,488 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531900 2023-11-28 14:33:08,784 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2850, loss[loss=0.07239, simple_loss=0.0986, pruned_loss=0.01418, audio_tagging_loss=0.008909, over 16207.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08879, pruned_loss=0.01201, audio_tagging_loss=0.00854, over 3044641.66 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:33:21,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3546080.0, ans=0.125 2023-11-28 14:33:27,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3546080.0, ans=0.2 2023-11-28 14:33:58,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3546280.0, ans=0.125 2023-11-28 14:34:01,803 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 531950 2023-11-28 14:34:06,185 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2900, loss[loss=0.08971, simple_loss=0.1207, pruned_loss=0.02088, audio_tagging_loss=0.00849, over 14778.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08931, pruned_loss=0.01221, audio_tagging_loss=0.00857, over 3044251.82 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:34:07,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3546346.6666666665, ans=0.0 2023-11-28 14:34:08,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3546346.6666666665, ans=0.125 2023-11-28 14:34:12,362 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.591e+01 8.752e+01 9.369e+01 1.016e+02 1.365e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 14:34:12,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3546346.6666666665, ans=0.125 2023-11-28 14:34:16,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3546346.6666666665, ans=0.125 2023-11-28 14:34:25,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3546413.3333333335, ans=0.125 2023-11-28 14:34:46,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3546546.6666666665, ans=0.2 2023-11-28 14:34:51,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3546613.3333333335, ans=0.125 2023-11-28 14:35:00,591 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532000 2023-11-28 14:35:07,401 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 2950, loss[loss=0.05983, simple_loss=0.08207, pruned_loss=0.008601, audio_tagging_loss=0.01019, over 15287.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08945, pruned_loss=0.01211, audio_tagging_loss=0.008592, over 3048044.52 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:35:40,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3546880.0, ans=0.0 2023-11-28 14:36:01,127 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532050 2023-11-28 14:36:05,951 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3000, loss[loss=0.07072, simple_loss=0.09483, pruned_loss=0.01411, audio_tagging_loss=0.009196, over 15017.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08973, pruned_loss=0.01207, audio_tagging_loss=0.008624, over 3047972.49 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:36:05,951 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 14:36:26,506 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1460, 2.4798, 4.9557, 3.0299], device='cuda:2') 2023-11-28 14:36:36,668 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6070, 3.7403, 3.9483, 3.4195], device='cuda:2') 2023-11-28 14:36:41,303 INFO [train_asr.py:1267] (2/4) Epoch 45, validation: loss=0.05774, simple_loss=0.05054, pruned_loss=0.005299, audio_tagging_loss=0.02717, over 4681554.00 frames. 2023-11-28 14:36:41,303 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 14:36:43,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3547013.3333333335, ans=0.125 2023-11-28 14:36:45,024 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=22.5 2023-11-28 14:36:46,868 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.901e+01 9.475e+01 1.021e+02 1.271e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 14:36:50,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3547013.3333333335, ans=0.0 2023-11-28 14:36:57,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3547080.0, ans=0.2 2023-11-28 14:37:05,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3547146.6666666665, ans=0.125 2023-11-28 14:37:12,117 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2023-11-28 14:37:13,360 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-11-28 14:37:34,360 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532100 2023-11-28 14:37:39,439 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3050, loss[loss=0.04478, simple_loss=0.0598, pruned_loss=0.007037, audio_tagging_loss=0.00784, over 14011.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09061, pruned_loss=0.0123, audio_tagging_loss=0.008546, over 3051690.06 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:37:40,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3547346.6666666665, ans=0.1 2023-11-28 14:37:55,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3547413.3333333335, ans=0.0 2023-11-28 14:37:55,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3547413.3333333335, ans=0.125 2023-11-28 14:38:06,310 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=12.0 2023-11-28 14:38:15,444 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:38:21,044 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2023-11-28 14:38:33,625 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532150 2023-11-28 14:38:38,087 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3100, loss[loss=0.07151, simple_loss=0.08354, pruned_loss=0.01826, audio_tagging_loss=0.01148, over 14858.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08987, pruned_loss=0.01209, audio_tagging_loss=0.008619, over 3049278.13 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:38:43,567 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.323e+01 8.815e+01 9.478e+01 1.004e+02 1.302e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 14:39:00,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3547813.3333333335, ans=0.125 2023-11-28 14:39:04,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3547813.3333333335, ans=0.07 2023-11-28 14:39:04,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3547813.3333333335, ans=0.125 2023-11-28 14:39:23,925 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=22.5 2023-11-28 14:39:28,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=3547946.6666666665, ans=0.2 2023-11-28 14:39:31,189 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532200 2023-11-28 14:39:35,924 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3150, loss[loss=0.06614, simple_loss=0.08591, pruned_loss=0.01172, audio_tagging_loss=0.01146, over 16246.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09011, pruned_loss=0.01218, audio_tagging_loss=0.008676, over 3044712.02 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:40:09,570 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.83 vs. limit=15.0 2023-11-28 14:40:40,685 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:40:49,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3548213.3333333335, ans=0.125 2023-11-28 14:41:16,658 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532250 2023-11-28 14:41:16,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3548280.0, ans=0.2 2023-11-28 14:41:23,667 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3200, loss[loss=0.06754, simple_loss=0.08642, pruned_loss=0.01143, audio_tagging_loss=0.01291, over 15599.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.09021, pruned_loss=0.01209, audio_tagging_loss=0.008752, over 3047565.61 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 14:41:33,621 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.062e+01 8.970e+01 9.466e+01 1.009e+02 1.247e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 14:41:45,888 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2023-11-28 14:41:47,502 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.61 vs. limit=15.0 2023-11-28 14:42:35,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3548546.6666666665, ans=0.0 2023-11-28 14:42:53,463 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532300 2023-11-28 14:43:00,654 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3250, loss[loss=0.06886, simple_loss=0.1024, pruned_loss=0.01082, audio_tagging_loss=0.006835, over 15576.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09018, pruned_loss=0.01216, audio_tagging_loss=0.008738, over 3046172.42 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 14:43:06,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3548680.0, ans=0.2 2023-11-28 14:43:17,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3548746.6666666665, ans=0.125 2023-11-28 14:43:23,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3548746.6666666665, ans=0.125 2023-11-28 14:43:36,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3548813.3333333335, ans=0.0 2023-11-28 14:43:38,585 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2023-11-28 14:44:19,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3548946.6666666665, ans=0.125 2023-11-28 14:44:25,770 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 14:44:27,632 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532350 2023-11-28 14:44:35,246 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3300, loss[loss=0.08591, simple_loss=0.1255, pruned_loss=0.01783, audio_tagging_loss=0.005319, over 15617.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08982, pruned_loss=0.01231, audio_tagging_loss=0.008817, over 3047793.69 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:44:49,665 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.697e+01 8.914e+01 9.371e+01 1.015e+02 1.378e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-28 14:45:04,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3549080.0, ans=0.125 2023-11-28 14:45:20,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3549146.6666666665, ans=0.125 2023-11-28 14:45:34,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3549213.3333333335, ans=0.0 2023-11-28 14:45:55,620 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532400 2023-11-28 14:45:56,117 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2023-11-28 14:46:01,547 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2023-11-28 14:46:02,264 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3350, loss[loss=0.05801, simple_loss=0.07732, pruned_loss=0.009887, audio_tagging_loss=0.009464, over 14767.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09005, pruned_loss=0.01225, audio_tagging_loss=0.008814, over 3051352.30 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:46:03,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3549346.6666666665, ans=0.125 2023-11-28 14:46:05,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=22.5 2023-11-28 14:46:22,650 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-28 14:47:00,861 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-11-28 14:47:04,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3549546.6666666665, ans=0.125 2023-11-28 14:47:06,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3549613.3333333335, ans=0.2 2023-11-28 14:47:18,229 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532450 2023-11-28 14:47:25,130 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3400, loss[loss=0.06134, simple_loss=0.08275, pruned_loss=0.01146, audio_tagging_loss=0.0085, over 15552.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08964, pruned_loss=0.01222, audio_tagging_loss=0.008702, over 3052335.73 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:47:35,336 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.293e+01 8.841e+01 9.503e+01 1.045e+02 1.895e+02, threshold=1.901e+02, percent-clipped=1.0 2023-11-28 14:48:02,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3549813.3333333335, ans=0.125 2023-11-28 14:48:33,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3549946.6666666665, ans=0.0 2023-11-28 14:48:33,612 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2023-11-28 14:48:37,271 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532500 2023-11-28 14:48:42,905 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3450, loss[loss=0.06028, simple_loss=0.07484, pruned_loss=0.01438, audio_tagging_loss=0.008482, over 14394.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08913, pruned_loss=0.01215, audio_tagging_loss=0.008669, over 3045727.94 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:48:56,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3550013.3333333335, ans=0.2 2023-11-28 14:49:29,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3550213.3333333335, ans=0.125 2023-11-28 14:49:45,399 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.12 vs. limit=15.0 2023-11-28 14:49:46,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3550280.0, ans=0.125 2023-11-28 14:49:49,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3550280.0, ans=0.0 2023-11-28 14:49:53,337 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.72 vs. limit=8.0 2023-11-28 14:49:53,826 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532550 2023-11-28 14:49:59,759 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3500, loss[loss=0.05905, simple_loss=0.08325, pruned_loss=0.006223, audio_tagging_loss=0.0112, over 15589.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08966, pruned_loss=0.01218, audio_tagging_loss=0.008601, over 3050386.20 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:50:01,668 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2023-11-28 14:50:04,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3550346.6666666665, ans=0.125 2023-11-28 14:50:08,265 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.938e+01 9.500e+01 1.028e+02 1.250e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 14:50:20,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3550413.3333333335, ans=0.2 2023-11-28 14:50:25,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3550413.3333333335, ans=0.0 2023-11-28 14:50:39,244 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:50:51,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3550546.6666666665, ans=0.1 2023-11-28 14:51:06,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3550613.3333333335, ans=0.0 2023-11-28 14:51:07,207 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532600 2023-11-28 14:51:13,061 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3550, loss[loss=0.06192, simple_loss=0.08898, pruned_loss=0.0104, audio_tagging_loss=0.007029, over 14851.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08958, pruned_loss=0.01216, audio_tagging_loss=0.008514, over 3040334.02 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:51:44,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3550813.3333333335, ans=0.125 2023-11-28 14:51:47,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3550813.3333333335, ans=0.0 2023-11-28 14:51:57,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3550880.0, ans=15.0 2023-11-28 14:52:00,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3550880.0, ans=0.0 2023-11-28 14:52:04,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=3550880.0, ans=0.2 2023-11-28 14:52:05,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3550880.0, ans=0.125 2023-11-28 14:52:11,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3550946.6666666665, ans=0.125 2023-11-28 14:52:18,627 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532650 2023-11-28 14:52:18,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3550946.6666666665, ans=0.0 2023-11-28 14:52:24,469 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3600, loss[loss=0.07777, simple_loss=0.1029, pruned_loss=0.01788, audio_tagging_loss=0.008414, over 14798.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.09004, pruned_loss=0.01203, audio_tagging_loss=0.00854, over 3041239.52 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 14:52:32,699 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 8.883e+01 9.624e+01 1.026e+02 1.265e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 14:52:51,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3551146.6666666665, ans=0.1 2023-11-28 14:52:54,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3551146.6666666665, ans=0.125 2023-11-28 14:52:55,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3551146.6666666665, ans=0.0 2023-11-28 14:53:07,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3551213.3333333335, ans=0.125 2023-11-28 14:53:20,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3551280.0, ans=0.2 2023-11-28 14:53:23,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3551280.0, ans=10.0 2023-11-28 14:53:29,886 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532700 2023-11-28 14:53:34,852 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3650, loss[loss=0.06195, simple_loss=0.08764, pruned_loss=0.01011, audio_tagging_loss=0.008019, over 14424.00 frames. ], tot_loss[loss=0.066, simple_loss=0.0909, pruned_loss=0.01211, audio_tagging_loss=0.008437, over 3041547.86 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:53:35,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3551346.6666666665, ans=0.1 2023-11-28 14:53:52,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3551413.3333333335, ans=0.0 2023-11-28 14:54:03,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3551480.0, ans=0.125 2023-11-28 14:54:07,865 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2023-11-28 14:54:10,113 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.44 vs. limit=22.5 2023-11-28 14:54:15,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3551480.0, ans=0.0 2023-11-28 14:54:15,545 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2023-11-28 14:54:39,880 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532750 2023-11-28 14:54:45,607 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3700, loss[loss=0.07757, simple_loss=0.11, pruned_loss=0.01348, audio_tagging_loss=0.009079, over 16265.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09144, pruned_loss=0.01218, audio_tagging_loss=0.008433, over 3041357.58 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:54:55,111 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.500e+01 8.996e+01 9.675e+01 1.033e+02 1.277e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 14:55:16,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3551813.3333333335, ans=10.0 2023-11-28 14:55:36,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3551880.0, ans=0.1 2023-11-28 14:55:47,807 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532800 2023-11-28 14:55:53,197 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3750, loss[loss=0.06772, simple_loss=0.08745, pruned_loss=0.01675, audio_tagging_loss=0.007246, over 15039.00 frames. ], tot_loss[loss=0.0667, simple_loss=0.09184, pruned_loss=0.01242, audio_tagging_loss=0.008366, over 3044555.18 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:55:54,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3552013.3333333335, ans=0.125 2023-11-28 14:56:10,071 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.13 vs. limit=15.0 2023-11-28 14:56:20,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3552146.6666666665, ans=0.0 2023-11-28 14:56:40,593 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 14:56:42,267 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2023-11-28 14:56:51,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3552280.0, ans=0.0 2023-11-28 14:56:54,271 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532850 2023-11-28 14:56:58,838 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3800, loss[loss=0.05643, simple_loss=0.06907, pruned_loss=0.01014, audio_tagging_loss=0.01175, over 14119.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09102, pruned_loss=0.01235, audio_tagging_loss=0.008463, over 3046764.28 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:57:07,272 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.454e+01 9.078e+01 9.747e+01 1.050e+02 1.632e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-28 14:57:09,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3552346.6666666665, ans=15.0 2023-11-28 14:57:12,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3552413.3333333335, ans=0.125 2023-11-28 14:57:56,430 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532900 2023-11-28 14:57:59,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3552613.3333333335, ans=0.1 2023-11-28 14:58:01,896 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3850, loss[loss=0.05615, simple_loss=0.08343, pruned_loss=0.005868, audio_tagging_loss=0.008568, over 14567.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09048, pruned_loss=0.01222, audio_tagging_loss=0.008674, over 3048171.23 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:58:41,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3552880.0, ans=10.0 2023-11-28 14:58:59,513 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 532950 2023-11-28 14:59:04,012 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3900, loss[loss=0.07132, simple_loss=0.0931, pruned_loss=0.01452, audio_tagging_loss=0.01025, over 14885.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08881, pruned_loss=0.0122, audio_tagging_loss=0.008831, over 3042894.99 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 14:59:10,763 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-28 14:59:12,238 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.774e+01 9.422e+01 1.004e+02 1.392e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 14:59:24,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3553080.0, ans=0.0 2023-11-28 14:59:45,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3553213.3333333335, ans=0.0 2023-11-28 14:59:59,729 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533000 2023-11-28 15:00:01,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3553280.0, ans=0.125 2023-11-28 15:00:04,707 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:00:05,521 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 3950, loss[loss=0.07103, simple_loss=0.1004, pruned_loss=0.01141, audio_tagging_loss=0.009405, over 14907.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08937, pruned_loss=0.01218, audio_tagging_loss=0.008872, over 3046946.18 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:00:08,405 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.52 vs. limit=15.0 2023-11-28 15:00:09,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3553346.6666666665, ans=0.125 2023-11-28 15:00:12,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3553346.6666666665, ans=0.0 2023-11-28 15:00:29,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3553480.0, ans=0.1 2023-11-28 15:00:40,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3553546.6666666665, ans=0.125 2023-11-28 15:00:44,844 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.03 vs. limit=15.0 2023-11-28 15:00:45,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3553546.6666666665, ans=0.125 2023-11-28 15:01:00,220 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533050 2023-11-28 15:01:03,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3553680.0, ans=0.125 2023-11-28 15:01:04,948 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4000, loss[loss=0.06086, simple_loss=0.08571, pruned_loss=0.009901, audio_tagging_loss=0.008102, over 14327.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.08923, pruned_loss=0.0122, audio_tagging_loss=0.008985, over 3039141.86 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:01:13,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.093e+01 9.021e+01 9.610e+01 1.042e+02 1.658e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 15:01:30,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3553813.3333333335, ans=0.2 2023-11-28 15:01:41,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3553880.0, ans=0.125 2023-11-28 15:02:00,814 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533100 2023-11-28 15:02:05,802 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4050, loss[loss=0.05369, simple_loss=0.06789, pruned_loss=0.01062, audio_tagging_loss=0.009122, over 16209.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08867, pruned_loss=0.01214, audio_tagging_loss=0.008972, over 3038054.49 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:02:10,529 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:02:30,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3554146.6666666665, ans=0.125 2023-11-28 15:03:01,394 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533150 2023-11-28 15:03:06,487 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4100, loss[loss=0.08348, simple_loss=0.117, pruned_loss=0.02057, audio_tagging_loss=0.004396, over 15221.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08885, pruned_loss=0.01211, audio_tagging_loss=0.009007, over 3041422.48 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:03:16,442 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.818e+01 9.455e+01 1.012e+02 1.204e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 15:03:22,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3554413.3333333335, ans=0.0 2023-11-28 15:03:25,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3554413.3333333335, ans=0.1 2023-11-28 15:03:51,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3554546.6666666665, ans=0.125 2023-11-28 15:04:01,658 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533200 2023-11-28 15:04:06,993 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4150, loss[loss=0.0455, simple_loss=0.05144, pruned_loss=0.00712, audio_tagging_loss=0.01266, over 15759.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08915, pruned_loss=0.01204, audio_tagging_loss=0.0089, over 3044682.64 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:04:36,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3554813.3333333335, ans=0.1 2023-11-28 15:04:44,570 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2023-11-28 15:04:47,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3554880.0, ans=0.0 2023-11-28 15:04:47,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2023-11-28 15:04:48,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3554880.0, ans=0.0 2023-11-28 15:04:52,368 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:04:58,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3554946.6666666665, ans=0.0 2023-11-28 15:05:01,832 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533250 2023-11-28 15:05:06,751 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4200, loss[loss=0.06415, simple_loss=0.08697, pruned_loss=0.01143, audio_tagging_loss=0.00923, over 15465.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08888, pruned_loss=0.01187, audio_tagging_loss=0.008763, over 3038744.30 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:05:06,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3555013.3333333335, ans=0.1 2023-11-28 15:05:08,699 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.82 vs. limit=10.0 2023-11-28 15:05:14,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3555013.3333333335, ans=0.1 2023-11-28 15:05:15,736 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.572e+01 9.503e+01 1.029e+02 1.274e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 15:05:20,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3555080.0, ans=0.0 2023-11-28 15:05:29,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3555146.6666666665, ans=0.2 2023-11-28 15:05:51,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3555213.3333333335, ans=0.125 2023-11-28 15:05:56,462 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:06:00,934 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533300 2023-11-28 15:06:04,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3555346.6666666665, ans=0.1 2023-11-28 15:06:05,306 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4250, loss[loss=0.08017, simple_loss=0.1152, pruned_loss=0.01522, audio_tagging_loss=0.007345, over 16339.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08942, pruned_loss=0.01201, audio_tagging_loss=0.00865, over 3044181.23 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:06:10,999 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:06:20,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3555413.3333333335, ans=0.125 2023-11-28 15:06:31,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3555480.0, ans=0.1 2023-11-28 15:06:51,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3555613.3333333335, ans=0.1 2023-11-28 15:06:54,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3555613.3333333335, ans=0.09899494936611666 2023-11-28 15:07:00,038 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533350 2023-11-28 15:07:04,437 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4300, loss[loss=0.06542, simple_loss=0.09197, pruned_loss=0.01032, audio_tagging_loss=0.009107, over 16044.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09116, pruned_loss=0.01224, audio_tagging_loss=0.008538, over 3055313.33 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:07:05,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3555680.0, ans=0.0 2023-11-28 15:07:13,801 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 9.132e+01 9.735e+01 1.057e+02 1.337e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 15:07:45,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3555880.0, ans=0.125 2023-11-28 15:07:58,884 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533400 2023-11-28 15:08:03,642 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4350, loss[loss=0.06405, simple_loss=0.07695, pruned_loss=0.01253, audio_tagging_loss=0.01304, over 17099.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09031, pruned_loss=0.01225, audio_tagging_loss=0.008616, over 3051814.67 frames. ], batch size: 65, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:08:41,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3556213.3333333335, ans=0.1 2023-11-28 15:08:42,920 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2023-11-28 15:08:57,832 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533450 2023-11-28 15:09:02,393 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4400, loss[loss=0.06177, simple_loss=0.08366, pruned_loss=0.01122, audio_tagging_loss=0.008725, over 15688.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08982, pruned_loss=0.01222, audio_tagging_loss=0.008684, over 3055326.51 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:09:12,175 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 8.943e+01 9.727e+01 1.045e+02 1.586e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-28 15:09:21,756 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2023-11-28 15:09:26,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3556480.0, ans=0.0 2023-11-28 15:09:31,777 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2023-11-28 15:09:42,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3556546.6666666665, ans=0.0 2023-11-28 15:09:55,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3556613.3333333335, ans=0.1 2023-11-28 15:10:01,443 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=12.0 2023-11-28 15:10:02,153 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533500 2023-11-28 15:10:06,969 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4450, loss[loss=0.07491, simple_loss=0.1005, pruned_loss=0.01614, audio_tagging_loss=0.008532, over 15259.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09035, pruned_loss=0.01218, audio_tagging_loss=0.008568, over 3057539.09 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:10:09,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3556680.0, ans=0.0 2023-11-28 15:10:10,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3556680.0, ans=0.125 2023-11-28 15:10:35,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3556813.3333333335, ans=0.2 2023-11-28 15:10:41,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3556813.3333333335, ans=0.2 2023-11-28 15:10:58,904 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.10 vs. limit=15.0 2023-11-28 15:11:06,022 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533550 2023-11-28 15:11:10,934 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4500, loss[loss=0.06182, simple_loss=0.08661, pruned_loss=0.01166, audio_tagging_loss=0.006856, over 15488.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09111, pruned_loss=0.01231, audio_tagging_loss=0.008526, over 3058377.70 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:11:15,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3557013.3333333335, ans=0.2 2023-11-28 15:11:22,375 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.702e+01 8.792e+01 9.317e+01 1.000e+02 1.287e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-28 15:11:27,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3557080.0, ans=0.1 2023-11-28 15:11:28,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3557080.0, ans=0.1 2023-11-28 15:11:45,664 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=22.5 2023-11-28 15:12:10,867 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533600 2023-11-28 15:12:15,964 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4550, loss[loss=0.05803, simple_loss=0.08515, pruned_loss=0.00705, audio_tagging_loss=0.008409, over 14069.00 frames. ], tot_loss[loss=0.06679, simple_loss=0.092, pruned_loss=0.01234, audio_tagging_loss=0.008444, over 3055398.01 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:12:16,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.34 vs. limit=15.0 2023-11-28 15:13:01,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3557546.6666666665, ans=0.125 2023-11-28 15:13:04,820 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:13:06,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3557613.3333333335, ans=0.0 2023-11-28 15:13:06,949 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:13:14,903 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533650 2023-11-28 15:13:19,708 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4600, loss[loss=0.08619, simple_loss=0.1287, pruned_loss=0.01634, audio_tagging_loss=0.005509, over 15126.00 frames. ], tot_loss[loss=0.06731, simple_loss=0.0924, pruned_loss=0.01259, audio_tagging_loss=0.008519, over 3058158.90 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:13:26,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3557680.0, ans=0.125 2023-11-28 15:13:29,184 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.928e+01 9.397e+01 1.022e+02 1.415e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 15:14:04,820 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:14:05,262 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.83 vs. limit=10.0 2023-11-28 15:14:09,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3557946.6666666665, ans=0.0 2023-11-28 15:14:14,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3557946.6666666665, ans=0.125 2023-11-28 15:14:16,131 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.68 vs. limit=15.0 2023-11-28 15:14:17,654 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533700 2023-11-28 15:14:22,812 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4650, loss[loss=0.05106, simple_loss=0.06093, pruned_loss=0.01059, audio_tagging_loss=0.009998, over 15110.00 frames. ], tot_loss[loss=0.06684, simple_loss=0.09156, pruned_loss=0.01241, audio_tagging_loss=0.008649, over 3060183.94 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:14:55,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3558146.6666666665, ans=0.09899494936611666 2023-11-28 15:14:56,729 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:15:08,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3558213.3333333335, ans=0.0 2023-11-28 15:15:09,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3558213.3333333335, ans=0.1 2023-11-28 15:15:15,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3558280.0, ans=0.125 2023-11-28 15:15:21,841 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533750 2023-11-28 15:15:27,132 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4700, loss[loss=0.06553, simple_loss=0.08475, pruned_loss=0.0125, audio_tagging_loss=0.01065, over 14369.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08991, pruned_loss=0.0122, audio_tagging_loss=0.00879, over 3057196.24 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:15:30,340 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.64 vs. limit=15.0 2023-11-28 15:15:38,953 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 8.852e+01 9.419e+01 1.023e+02 1.642e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-28 15:15:44,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3558413.3333333335, ans=0.125 2023-11-28 15:16:11,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3558546.6666666665, ans=0.0 2023-11-28 15:16:26,082 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533800 2023-11-28 15:16:29,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3558613.3333333335, ans=0.125 2023-11-28 15:16:31,090 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4750, loss[loss=0.08138, simple_loss=0.1134, pruned_loss=0.01645, audio_tagging_loss=0.008211, over 15129.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08927, pruned_loss=0.01209, audio_tagging_loss=0.008944, over 3051943.45 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:16:42,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3558746.6666666665, ans=0.2 2023-11-28 15:17:01,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3558813.3333333335, ans=0.2 2023-11-28 15:17:02,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=3558813.3333333335, ans=0.025 2023-11-28 15:17:12,417 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.90 vs. limit=10.0 2023-11-28 15:17:21,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3558946.6666666665, ans=0.125 2023-11-28 15:17:26,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3558946.6666666665, ans=0.07 2023-11-28 15:17:28,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3558946.6666666665, ans=0.2 2023-11-28 15:17:28,533 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2023-11-28 15:17:29,115 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533850 2023-11-28 15:17:34,704 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4800, loss[loss=0.062, simple_loss=0.08147, pruned_loss=0.01123, audio_tagging_loss=0.01004, over 16038.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08857, pruned_loss=0.01203, audio_tagging_loss=0.009066, over 3052637.51 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:17:37,759 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.60 vs. limit=15.0 2023-11-28 15:17:43,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3559013.3333333335, ans=0.125 2023-11-28 15:17:45,866 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 9.221e+01 9.663e+01 1.040e+02 1.234e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 15:18:15,602 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2023-11-28 15:18:31,387 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533900 2023-11-28 15:18:36,116 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4850, loss[loss=0.05416, simple_loss=0.0793, pruned_loss=0.007487, audio_tagging_loss=0.007022, over 14660.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08876, pruned_loss=0.01195, audio_tagging_loss=0.00911, over 3048501.29 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:18:36,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3559346.6666666665, ans=0.125 2023-11-28 15:18:41,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3559346.6666666665, ans=0.1 2023-11-28 15:18:49,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3559413.3333333335, ans=0.0 2023-11-28 15:18:55,721 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.50 vs. limit=15.0 2023-11-28 15:19:20,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3559546.6666666665, ans=0.2 2023-11-28 15:19:33,308 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 533950 2023-11-28 15:19:38,538 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4900, loss[loss=0.08607, simple_loss=0.1202, pruned_loss=0.01887, audio_tagging_loss=0.007114, over 14494.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08833, pruned_loss=0.01196, audio_tagging_loss=0.009009, over 3050010.51 frames. ], batch size: 52, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:19:41,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3559680.0, ans=0.2 2023-11-28 15:19:49,241 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.527e+01 9.026e+01 9.693e+01 1.038e+02 1.259e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 15:20:00,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3559746.6666666665, ans=0.0 2023-11-28 15:20:08,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3559813.3333333335, ans=0.1 2023-11-28 15:20:24,231 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=12.0 2023-11-28 15:20:35,421 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534000 2023-11-28 15:20:40,385 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 4950, loss[loss=0.0707, simple_loss=0.09487, pruned_loss=0.01477, audio_tagging_loss=0.008491, over 14862.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08882, pruned_loss=0.01206, audio_tagging_loss=0.008905, over 3054501.60 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:20:47,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3560013.3333333335, ans=0.125 2023-11-28 15:20:48,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3560013.3333333335, ans=0.1 2023-11-28 15:20:49,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3560013.3333333335, ans=0.125 2023-11-28 15:21:05,383 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-28 15:21:09,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3560146.6666666665, ans=0.0 2023-11-28 15:21:29,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3560280.0, ans=0.125 2023-11-28 15:21:37,570 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534050 2023-11-28 15:21:41,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3560346.6666666665, ans=0.0 2023-11-28 15:21:42,799 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5000, loss[loss=0.08977, simple_loss=0.1276, pruned_loss=0.02037, audio_tagging_loss=0.005593, over 15579.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09055, pruned_loss=0.01233, audio_tagging_loss=0.008633, over 3059422.41 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:21:53,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.494e+01 8.823e+01 9.586e+01 1.031e+02 1.320e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 15:22:13,058 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2023-11-28 15:22:16,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3560480.0, ans=0.1 2023-11-28 15:22:17,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3560480.0, ans=0.125 2023-11-28 15:22:18,647 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.33 vs. limit=22.5 2023-11-28 15:22:20,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3560546.6666666665, ans=0.1 2023-11-28 15:22:36,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3560613.3333333335, ans=0.1 2023-11-28 15:22:37,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3560613.3333333335, ans=0.0 2023-11-28 15:22:39,797 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534100 2023-11-28 15:22:45,171 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5050, loss[loss=0.06063, simple_loss=0.07685, pruned_loss=0.01161, audio_tagging_loss=0.0106, over 13616.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09017, pruned_loss=0.01224, audio_tagging_loss=0.008669, over 3050472.69 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:22:52,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3560680.0, ans=0.0 2023-11-28 15:22:55,347 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-28 15:23:02,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3560746.6666666665, ans=0.125 2023-11-28 15:23:22,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3560880.0, ans=0.0 2023-11-28 15:23:22,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3560880.0, ans=10.0 2023-11-28 15:23:26,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3560880.0, ans=0.125 2023-11-28 15:23:36,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3560946.6666666665, ans=0.125 2023-11-28 15:23:41,420 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534150 2023-11-28 15:23:46,152 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5100, loss[loss=0.06846, simple_loss=0.09228, pruned_loss=0.01319, audio_tagging_loss=0.009126, over 15574.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09051, pruned_loss=0.01242, audio_tagging_loss=0.008588, over 3058720.09 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:23:58,682 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 9.022e+01 9.569e+01 1.030e+02 1.259e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 15:24:21,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3561146.6666666665, ans=0.0 2023-11-28 15:24:43,894 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534200 2023-11-28 15:24:48,746 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5150, loss[loss=0.04797, simple_loss=0.06055, pruned_loss=0.006913, audio_tagging_loss=0.01079, over 14259.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08893, pruned_loss=0.0121, audio_tagging_loss=0.008601, over 3057085.84 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:24:51,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3561346.6666666665, ans=0.1 2023-11-28 15:25:04,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3561413.3333333335, ans=0.125 2023-11-28 15:25:09,049 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:25:30,584 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2023-11-28 15:25:39,830 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=22.5 2023-11-28 15:25:46,185 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534250 2023-11-28 15:25:47,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3561613.3333333335, ans=0.2 2023-11-28 15:25:50,866 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5200, loss[loss=0.04861, simple_loss=0.06613, pruned_loss=0.008275, audio_tagging_loss=0.007268, over 14391.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.0892, pruned_loss=0.01212, audio_tagging_loss=0.008576, over 3057274.37 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:25:52,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3561680.0, ans=0.0 2023-11-28 15:25:56,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3561680.0, ans=0.125 2023-11-28 15:26:03,807 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 8.561e+01 9.249e+01 1.010e+02 1.176e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-28 15:26:06,848 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2023-11-28 15:26:08,778 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:26:12,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3561746.6666666665, ans=0.0 2023-11-28 15:26:16,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3561813.3333333335, ans=0.125 2023-11-28 15:26:43,082 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.98 vs. limit=15.0 2023-11-28 15:26:48,787 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534300 2023-11-28 15:26:53,352 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5250, loss[loss=0.06044, simple_loss=0.08264, pruned_loss=0.01041, audio_tagging_loss=0.008712, over 15833.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08972, pruned_loss=0.01227, audio_tagging_loss=0.008533, over 3055817.37 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:27:01,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3562013.3333333335, ans=0.04949747468305833 2023-11-28 15:27:08,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3562080.0, ans=0.125 2023-11-28 15:27:15,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3562080.0, ans=0.125 2023-11-28 15:27:31,181 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=12.0 2023-11-28 15:27:50,506 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534350 2023-11-28 15:27:55,105 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5300, loss[loss=0.04791, simple_loss=0.06929, pruned_loss=0.005166, audio_tagging_loss=0.008105, over 15922.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08979, pruned_loss=0.01224, audio_tagging_loss=0.008482, over 3057299.41 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:27:59,309 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2023-11-28 15:28:01,207 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.22 vs. limit=22.5 2023-11-28 15:28:06,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3562413.3333333335, ans=0.125 2023-11-28 15:28:07,529 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.489e+01 8.987e+01 9.599e+01 1.032e+02 1.281e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 15:28:24,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3562480.0, ans=0.1 2023-11-28 15:28:38,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3562546.6666666665, ans=0.125 2023-11-28 15:28:40,746 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2023-11-28 15:28:46,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3562613.3333333335, ans=0.0 2023-11-28 15:28:50,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3562613.3333333335, ans=0.0 2023-11-28 15:28:50,862 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2023-11-28 15:28:52,846 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534400 2023-11-28 15:28:55,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3562613.3333333335, ans=0.125 2023-11-28 15:28:56,112 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2023-11-28 15:28:57,843 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5350, loss[loss=0.07281, simple_loss=0.09918, pruned_loss=0.01363, audio_tagging_loss=0.009585, over 14781.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09059, pruned_loss=0.01239, audio_tagging_loss=0.008434, over 3049411.06 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:29:12,801 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:29:31,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3562813.3333333335, ans=0.0 2023-11-28 15:29:55,304 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534450 2023-11-28 15:30:00,008 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5400, loss[loss=0.07375, simple_loss=0.1063, pruned_loss=0.01387, audio_tagging_loss=0.006723, over 14556.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09056, pruned_loss=0.01244, audio_tagging_loss=0.008489, over 3046444.32 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:30:00,404 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:30:05,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3563013.3333333335, ans=0.125 2023-11-28 15:30:06,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3563013.3333333335, ans=0.04949747468305833 2023-11-28 15:30:14,159 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.799e+01 8.988e+01 9.532e+01 1.029e+02 1.170e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 15:30:24,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3563146.6666666665, ans=0.125 2023-11-28 15:30:25,939 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=15.0 2023-11-28 15:30:31,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3563146.6666666665, ans=0.2 2023-11-28 15:30:47,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3563213.3333333335, ans=0.2 2023-11-28 15:30:57,446 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534500 2023-11-28 15:31:02,811 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5450, loss[loss=0.06667, simple_loss=0.08848, pruned_loss=0.01247, audio_tagging_loss=0.009961, over 14904.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09066, pruned_loss=0.01232, audio_tagging_loss=0.008561, over 3041953.51 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:31:13,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3563346.6666666665, ans=0.125 2023-11-28 15:31:28,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3563480.0, ans=0.125 2023-11-28 15:31:30,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3563480.0, ans=0.0 2023-11-28 15:31:43,947 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=12.0 2023-11-28 15:32:00,224 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534550 2023-11-28 15:32:04,873 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5500, loss[loss=0.06814, simple_loss=0.09294, pruned_loss=0.0128, audio_tagging_loss=0.008865, over 14696.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08977, pruned_loss=0.01207, audio_tagging_loss=0.008589, over 3043428.83 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:32:17,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3563746.6666666665, ans=0.0 2023-11-28 15:32:18,306 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.730e+01 8.919e+01 9.709e+01 1.036e+02 2.693e+02, threshold=1.942e+02, percent-clipped=1.0 2023-11-28 15:32:31,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3563813.3333333335, ans=10.0 2023-11-28 15:32:52,319 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.23 vs. limit=12.0 2023-11-28 15:33:01,924 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534600 2023-11-28 15:33:06,886 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5550, loss[loss=0.05144, simple_loss=0.07117, pruned_loss=0.00846, audio_tagging_loss=0.007394, over 17098.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08932, pruned_loss=0.012, audio_tagging_loss=0.008742, over 3046027.43 frames. ], batch size: 66, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:33:15,811 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.64 vs. limit=15.0 2023-11-28 15:33:37,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3564146.6666666665, ans=0.125 2023-11-28 15:33:41,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3564146.6666666665, ans=0.0 2023-11-28 15:34:00,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3564280.0, ans=0.0 2023-11-28 15:34:03,821 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534650 2023-11-28 15:34:06,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3564280.0, ans=0.125 2023-11-28 15:34:08,861 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.09 vs. limit=10.0 2023-11-28 15:34:09,203 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5600, loss[loss=0.08091, simple_loss=0.1122, pruned_loss=0.01452, audio_tagging_loss=0.01028, over 15469.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08949, pruned_loss=0.01193, audio_tagging_loss=0.008767, over 3046068.32 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:34:17,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3564346.6666666665, ans=0.125 2023-11-28 15:34:23,315 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.833e+01 8.970e+01 9.641e+01 1.032e+02 1.547e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-28 15:34:30,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3564413.3333333335, ans=0.05 2023-11-28 15:34:46,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3564546.6666666665, ans=0.125 2023-11-28 15:34:48,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3564546.6666666665, ans=0.125 2023-11-28 15:34:56,197 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:35:06,967 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534700 2023-11-28 15:35:09,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3564613.3333333335, ans=0.2 2023-11-28 15:35:11,603 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5650, loss[loss=0.0509, simple_loss=0.06059, pruned_loss=0.01193, audio_tagging_loss=0.008679, over 14336.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09001, pruned_loss=0.01212, audio_tagging_loss=0.00881, over 3050454.38 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:35:11,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3564680.0, ans=0.125 2023-11-28 15:35:24,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3564746.6666666665, ans=0.125 2023-11-28 15:35:37,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3564813.3333333335, ans=10.0 2023-11-28 15:35:48,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3564880.0, ans=0.0 2023-11-28 15:36:02,993 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.42 vs. limit=22.5 2023-11-28 15:36:09,247 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534750 2023-11-28 15:36:13,965 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5700, loss[loss=0.05599, simple_loss=0.07634, pruned_loss=0.01071, audio_tagging_loss=0.007107, over 15660.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08964, pruned_loss=0.01212, audio_tagging_loss=0.00876, over 3054605.05 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:36:15,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3565013.3333333335, ans=0.125 2023-11-28 15:36:23,192 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-11-28 15:36:28,523 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.604e+01 8.736e+01 9.435e+01 1.001e+02 1.261e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 15:36:28,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3565080.0, ans=0.125 2023-11-28 15:36:29,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3565080.0, ans=0.2 2023-11-28 15:36:35,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3565080.0, ans=0.0 2023-11-28 15:37:08,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3565280.0, ans=0.0 2023-11-28 15:37:09,772 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.39 vs. limit=10.0 2023-11-28 15:37:11,426 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534800 2023-11-28 15:37:16,418 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5750, loss[loss=0.07079, simple_loss=0.09226, pruned_loss=0.01479, audio_tagging_loss=0.009869, over 15804.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08984, pruned_loss=0.01214, audio_tagging_loss=0.0087, over 3051459.86 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:37:20,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3565346.6666666665, ans=0.125 2023-11-28 15:37:23,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3565346.6666666665, ans=0.125 2023-11-28 15:37:27,548 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.89 vs. limit=12.0 2023-11-28 15:37:37,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3565413.3333333335, ans=0.07 2023-11-28 15:37:39,938 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2023-11-28 15:37:55,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3565546.6666666665, ans=0.0 2023-11-28 15:37:59,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3565546.6666666665, ans=0.0 2023-11-28 15:38:01,137 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.70 vs. limit=22.5 2023-11-28 15:38:01,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3565546.6666666665, ans=0.125 2023-11-28 15:38:03,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3565546.6666666665, ans=0.125 2023-11-28 15:38:14,182 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534850 2023-11-28 15:38:18,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3565613.3333333335, ans=0.1 2023-11-28 15:38:20,319 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5800, loss[loss=0.06572, simple_loss=0.09978, pruned_loss=0.009466, audio_tagging_loss=0.006361, over 14545.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08975, pruned_loss=0.01218, audio_tagging_loss=0.008568, over 3052842.34 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:38:35,156 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 8.604e+01 9.339e+01 1.000e+02 1.373e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 15:38:56,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3565880.0, ans=0.125 2023-11-28 15:38:57,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3565880.0, ans=0.2 2023-11-28 15:39:10,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3565946.6666666665, ans=0.04949747468305833 2023-11-28 15:39:13,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3565946.6666666665, ans=0.1 2023-11-28 15:39:14,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3565946.6666666665, ans=0.125 2023-11-28 15:39:17,047 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534900 2023-11-28 15:39:22,242 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5850, loss[loss=0.06803, simple_loss=0.09362, pruned_loss=0.01388, audio_tagging_loss=0.007339, over 15094.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08931, pruned_loss=0.0122, audio_tagging_loss=0.008474, over 3046928.02 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:39:48,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3566146.6666666665, ans=0.2 2023-11-28 15:39:52,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3566146.6666666665, ans=0.125 2023-11-28 15:39:54,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3566146.6666666665, ans=0.1 2023-11-28 15:40:12,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3566280.0, ans=0.0 2023-11-28 15:40:19,295 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 534950 2023-11-28 15:40:19,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3566280.0, ans=0.0 2023-11-28 15:40:20,645 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:40:24,062 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5900, loss[loss=0.04493, simple_loss=0.05647, pruned_loss=0.005752, audio_tagging_loss=0.01094, over 14228.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08886, pruned_loss=0.01215, audio_tagging_loss=0.008476, over 3041963.53 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:40:30,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3566346.6666666665, ans=0.0 2023-11-28 15:40:39,349 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.038e+01 9.170e+01 9.658e+01 1.028e+02 1.325e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 15:40:51,541 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.22 vs. limit=15.0 2023-11-28 15:41:17,509 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.97 vs. limit=15.0 2023-11-28 15:41:21,374 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535000 2023-11-28 15:41:26,456 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-28 15:41:26,966 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 5950, loss[loss=0.04627, simple_loss=0.05036, pruned_loss=0.009358, audio_tagging_loss=0.01173, over 14880.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09006, pruned_loss=0.01231, audio_tagging_loss=0.008437, over 3046297.07 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:41:27,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3566680.0, ans=0.125 2023-11-28 15:41:35,496 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2023-11-28 15:41:37,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3566680.0, ans=0.1 2023-11-28 15:41:46,439 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.43 vs. limit=15.0 2023-11-28 15:42:06,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3566880.0, ans=0.0 2023-11-28 15:42:16,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3566946.6666666665, ans=0.1 2023-11-28 15:42:24,340 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535050 2023-11-28 15:42:24,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3566946.6666666665, ans=0.0 2023-11-28 15:42:29,563 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6000, loss[loss=0.06718, simple_loss=0.09383, pruned_loss=0.01206, audio_tagging_loss=0.008197, over 15424.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08969, pruned_loss=0.01224, audio_tagging_loss=0.008433, over 3048763.23 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:42:29,564 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 15:42:52,388 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3152, 4.2931, 4.4919, 4.4420], device='cuda:2') 2023-11-28 15:43:07,323 INFO [train_asr.py:1267] (2/4) Epoch 45, validation: loss=0.05761, simple_loss=0.05049, pruned_loss=0.005188, audio_tagging_loss=0.02718, over 4681554.00 frames. 2023-11-28 15:43:07,324 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 15:43:22,490 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.761e+01 9.402e+01 1.021e+02 1.330e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 15:43:22,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3567080.0, ans=0.0 2023-11-28 15:43:54,050 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 15:44:04,898 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535100 2023-11-28 15:44:09,582 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6050, loss[loss=0.0473, simple_loss=0.04864, pruned_loss=0.005049, audio_tagging_loss=0.01793, over 15748.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.08967, pruned_loss=0.01223, audio_tagging_loss=0.008471, over 3049016.60 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:44:38,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3567480.0, ans=0.0 2023-11-28 15:44:48,386 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:45:07,084 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535150 2023-11-28 15:45:12,330 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6100, loss[loss=0.04272, simple_loss=0.05512, pruned_loss=0.005179, audio_tagging_loss=0.009981, over 13602.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08964, pruned_loss=0.01229, audio_tagging_loss=0.008449, over 3046590.88 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:45:27,618 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.258e+01 8.963e+01 9.572e+01 1.025e+02 1.368e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 15:45:40,194 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-28 15:46:09,248 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535200 2023-11-28 15:46:14,253 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6150, loss[loss=0.07184, simple_loss=0.09704, pruned_loss=0.01601, audio_tagging_loss=0.007308, over 14322.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08905, pruned_loss=0.01215, audio_tagging_loss=0.008547, over 3042784.61 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:46:27,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3568080.0, ans=0.125 2023-11-28 15:46:34,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3568080.0, ans=0.125 2023-11-28 15:46:39,926 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.64 vs. limit=15.0 2023-11-28 15:46:47,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3568146.6666666665, ans=0.125 2023-11-28 15:47:01,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3568213.3333333335, ans=0.125 2023-11-28 15:47:08,722 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.23 vs. limit=22.5 2023-11-28 15:47:09,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3568280.0, ans=0.0 2023-11-28 15:47:11,697 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535250 2023-11-28 15:47:12,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3568280.0, ans=0.1 2023-11-28 15:47:17,026 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6200, loss[loss=0.07328, simple_loss=0.1094, pruned_loss=0.01097, audio_tagging_loss=0.007614, over 15290.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08897, pruned_loss=0.01216, audio_tagging_loss=0.008639, over 3038624.16 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:47:32,373 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2023-11-28 15:47:33,548 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.897e+01 8.984e+01 9.633e+01 1.042e+02 1.273e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 15:47:38,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3568413.3333333335, ans=0.125 2023-11-28 15:47:40,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3568480.0, ans=0.125 2023-11-28 15:47:42,581 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.33 vs. limit=22.5 2023-11-28 15:48:08,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3568613.3333333335, ans=0.125 2023-11-28 15:48:14,115 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535300 2023-11-28 15:48:14,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3568613.3333333335, ans=0.125 2023-11-28 15:48:19,449 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6250, loss[loss=0.0492, simple_loss=0.06632, pruned_loss=0.006121, audio_tagging_loss=0.009922, over 16954.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08934, pruned_loss=0.01229, audio_tagging_loss=0.008741, over 3034939.75 frames. ], batch size: 69, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:48:32,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3568746.6666666665, ans=0.1 2023-11-28 15:48:35,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3568746.6666666665, ans=0.2 2023-11-28 15:48:52,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3568813.3333333335, ans=0.2 2023-11-28 15:49:16,852 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535350 2023-11-28 15:49:21,437 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6300, loss[loss=0.04878, simple_loss=0.06983, pruned_loss=0.006465, audio_tagging_loss=0.007398, over 14891.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08975, pruned_loss=0.01224, audio_tagging_loss=0.008812, over 3032311.63 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:49:22,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3569013.3333333335, ans=0.1 2023-11-28 15:49:38,094 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 8.816e+01 9.438e+01 1.010e+02 1.307e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 15:49:40,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3569080.0, ans=0.1 2023-11-28 15:49:42,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3569080.0, ans=0.1 2023-11-28 15:49:51,296 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 15:50:19,334 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535400 2023-11-28 15:50:23,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3569346.6666666665, ans=0.95 2023-11-28 15:50:24,774 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6350, loss[loss=0.04973, simple_loss=0.06926, pruned_loss=0.007112, audio_tagging_loss=0.007984, over 15407.00 frames. ], tot_loss[loss=0.06626, simple_loss=0.09033, pruned_loss=0.01234, audio_tagging_loss=0.008751, over 3036698.77 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:50:42,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3569413.3333333335, ans=0.125 2023-11-28 15:50:59,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3569480.0, ans=0.1 2023-11-28 15:51:21,831 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535450 2023-11-28 15:51:23,602 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.33 vs. limit=15.0 2023-11-28 15:51:26,560 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6400, loss[loss=0.06346, simple_loss=0.0901, pruned_loss=0.009331, audio_tagging_loss=0.00908, over 14918.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08934, pruned_loss=0.01205, audio_tagging_loss=0.008927, over 3029758.46 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:51:43,573 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.073e+01 8.811e+01 9.428e+01 1.008e+02 1.163e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 15:52:17,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3569946.6666666665, ans=0.1 2023-11-28 15:52:22,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3569946.6666666665, ans=0.2 2023-11-28 15:52:24,890 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535500 2023-11-28 15:52:30,150 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6450, loss[loss=0.06276, simple_loss=0.07929, pruned_loss=0.013, audio_tagging_loss=0.01011, over 14493.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08885, pruned_loss=0.01194, audio_tagging_loss=0.009039, over 3036182.95 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:52:30,699 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-28 15:52:42,377 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=22.5 2023-11-28 15:52:45,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3570080.0, ans=0.125 2023-11-28 15:52:46,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3570080.0, ans=0.0 2023-11-28 15:52:54,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3570146.6666666665, ans=0.1 2023-11-28 15:53:01,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3570146.6666666665, ans=0.125 2023-11-28 15:53:06,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3570213.3333333335, ans=0.125 2023-11-28 15:53:19,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3570280.0, ans=0.0 2023-11-28 15:53:28,088 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535550 2023-11-28 15:53:32,777 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6500, loss[loss=0.06671, simple_loss=0.08795, pruned_loss=0.01533, audio_tagging_loss=0.007402, over 16476.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08889, pruned_loss=0.01204, audio_tagging_loss=0.009101, over 3034508.15 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:53:33,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3570346.6666666665, ans=0.1 2023-11-28 15:53:48,889 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.726e+01 8.856e+01 9.321e+01 9.995e+01 1.237e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-28 15:53:55,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3570413.3333333335, ans=0.0 2023-11-28 15:54:14,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3570546.6666666665, ans=0.0 2023-11-28 15:54:23,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3570613.3333333335, ans=0.125 2023-11-28 15:54:29,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3570613.3333333335, ans=0.125 2023-11-28 15:54:30,572 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535600 2023-11-28 15:54:35,598 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6550, loss[loss=0.06216, simple_loss=0.08954, pruned_loss=0.01071, audio_tagging_loss=0.00668, over 15143.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.08961, pruned_loss=0.0122, audio_tagging_loss=0.00893, over 3035268.66 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:55:20,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3570880.0, ans=0.0 2023-11-28 15:55:28,468 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=12.0 2023-11-28 15:55:33,435 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535650 2023-11-28 15:55:38,038 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6600, loss[loss=0.05138, simple_loss=0.07355, pruned_loss=0.006147, audio_tagging_loss=0.008451, over 15444.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09012, pruned_loss=0.01233, audio_tagging_loss=0.008804, over 3035243.11 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:55:43,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3571013.3333333335, ans=0.125 2023-11-28 15:55:55,578 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 8.913e+01 9.644e+01 1.048e+02 1.369e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-28 15:56:14,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3571213.3333333335, ans=10.0 2023-11-28 15:56:30,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3571280.0, ans=0.125 2023-11-28 15:56:34,888 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535700 2023-11-28 15:56:40,921 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6650, loss[loss=0.05786, simple_loss=0.08219, pruned_loss=0.008773, audio_tagging_loss=0.007996, over 14527.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.08997, pruned_loss=0.01233, audio_tagging_loss=0.008761, over 3032361.46 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:56:48,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3571346.6666666665, ans=0.125 2023-11-28 15:57:09,617 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.40 vs. limit=15.0 2023-11-28 15:57:16,630 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=11.26 vs. limit=12.0 2023-11-28 15:57:38,104 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535750 2023-11-28 15:57:38,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3571613.3333333335, ans=0.125 2023-11-28 15:57:40,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3571613.3333333335, ans=0.0 2023-11-28 15:57:42,789 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6700, loss[loss=0.04982, simple_loss=0.06469, pruned_loss=0.007513, audio_tagging_loss=0.009963, over 16093.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08969, pruned_loss=0.01234, audio_tagging_loss=0.008656, over 3028069.33 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:57:44,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3571680.0, ans=0.125 2023-11-28 15:57:59,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.93 vs. limit=12.0 2023-11-28 15:57:59,371 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.48 vs. limit=15.0 2023-11-28 15:57:59,991 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 8.880e+01 9.649e+01 1.029e+02 1.368e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 15:58:31,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3571946.6666666665, ans=0.2 2023-11-28 15:58:39,974 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535800 2023-11-28 15:58:45,032 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6750, loss[loss=0.05468, simple_loss=0.07938, pruned_loss=0.006174, audio_tagging_loss=0.00882, over 14890.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08948, pruned_loss=0.01229, audio_tagging_loss=0.00859, over 3025861.84 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 15:58:48,223 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2023-11-28 15:59:03,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3572080.0, ans=0.0 2023-11-28 15:59:06,889 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2023-11-28 15:59:17,283 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.90 vs. limit=22.5 2023-11-28 15:59:38,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3572280.0, ans=0.125 2023-11-28 15:59:42,191 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535850 2023-11-28 15:59:47,481 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6800, loss[loss=0.06698, simple_loss=0.0873, pruned_loss=0.01334, audio_tagging_loss=0.009997, over 15777.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09037, pruned_loss=0.01233, audio_tagging_loss=0.008468, over 3034612.18 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 15:59:48,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3572346.6666666665, ans=0.125 2023-11-28 16:00:00,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3572413.3333333335, ans=0.0 2023-11-28 16:00:02,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3572413.3333333335, ans=0.035 2023-11-28 16:00:04,698 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 9.079e+01 9.608e+01 1.020e+02 1.284e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 16:00:39,602 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.27 vs. limit=22.5 2023-11-28 16:00:45,837 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535900 2023-11-28 16:00:50,603 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6850, loss[loss=0.06399, simple_loss=0.07969, pruned_loss=0.01273, audio_tagging_loss=0.01141, over 15079.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08949, pruned_loss=0.0122, audio_tagging_loss=0.008483, over 3025562.47 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:01:00,869 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=12.0 2023-11-28 16:01:06,077 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=22.5 2023-11-28 16:01:13,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3572746.6666666665, ans=15.0 2023-11-28 16:01:16,900 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2023-11-28 16:01:27,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3572880.0, ans=0.1 2023-11-28 16:01:29,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3572880.0, ans=0.1 2023-11-28 16:01:46,842 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 535950 2023-11-28 16:01:52,234 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6900, loss[loss=0.06105, simple_loss=0.08317, pruned_loss=0.01097, audio_tagging_loss=0.008496, over 15300.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08968, pruned_loss=0.01223, audio_tagging_loss=0.008408, over 3029345.70 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:02:10,311 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.778e+01 9.481e+01 1.016e+02 1.477e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 16:02:25,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3573146.6666666665, ans=0.1 2023-11-28 16:02:25,436 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.92 vs. limit=15.0 2023-11-28 16:02:41,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=3573213.3333333335, ans=0.1 2023-11-28 16:02:42,306 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.09 vs. limit=15.0 2023-11-28 16:02:45,228 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:02:47,850 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:02:51,088 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536000 2023-11-28 16:02:58,695 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 6950, loss[loss=0.08383, simple_loss=0.1093, pruned_loss=0.01856, audio_tagging_loss=0.01063, over 16724.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08938, pruned_loss=0.01207, audio_tagging_loss=0.008601, over 3037329.06 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:03:04,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3573346.6666666665, ans=0.125 2023-11-28 16:03:11,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3573413.3333333335, ans=0.125 2023-11-28 16:03:16,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3573413.3333333335, ans=0.125 2023-11-28 16:03:20,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3573413.3333333335, ans=0.0 2023-11-28 16:03:26,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3573480.0, ans=0.2 2023-11-28 16:03:27,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3573480.0, ans=0.125 2023-11-28 16:03:34,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3573480.0, ans=0.1 2023-11-28 16:03:34,643 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=12.0 2023-11-28 16:03:56,598 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536050 2023-11-28 16:04:01,255 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7000, loss[loss=0.05069, simple_loss=0.06962, pruned_loss=0.005886, audio_tagging_loss=0.01, over 14464.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08935, pruned_loss=0.01198, audio_tagging_loss=0.008648, over 3040260.57 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:04:16,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3573746.6666666665, ans=0.125 2023-11-28 16:04:18,991 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 8.731e+01 9.457e+01 1.025e+02 1.203e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 16:04:26,766 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2023-11-28 16:04:41,724 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2023-11-28 16:04:55,655 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-11-28 16:04:58,733 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536100 2023-11-28 16:05:03,911 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7050, loss[loss=0.05889, simple_loss=0.08499, pruned_loss=0.008263, audio_tagging_loss=0.008133, over 16274.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08896, pruned_loss=0.01209, audio_tagging_loss=0.008752, over 3039784.65 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:05:19,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3574080.0, ans=0.125 2023-11-28 16:05:42,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3574213.3333333335, ans=0.125 2023-11-28 16:05:54,400 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=12.0 2023-11-28 16:06:00,967 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536150 2023-11-28 16:06:05,775 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7100, loss[loss=0.04991, simple_loss=0.0643, pruned_loss=0.008394, audio_tagging_loss=0.009362, over 14598.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08902, pruned_loss=0.01203, audio_tagging_loss=0.008747, over 3048763.04 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:06:23,792 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.873e+01 8.973e+01 9.710e+01 1.043e+02 1.342e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 16:06:41,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3574480.0, ans=0.125 2023-11-28 16:06:42,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3574480.0, ans=0.0 2023-11-28 16:06:47,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.11 vs. limit=15.0 2023-11-28 16:06:57,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3574613.3333333335, ans=0.0 2023-11-28 16:07:04,294 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536200 2023-11-28 16:07:08,937 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2023-11-28 16:07:09,625 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7150, loss[loss=0.07306, simple_loss=0.1027, pruned_loss=0.01571, audio_tagging_loss=0.006015, over 14516.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08932, pruned_loss=0.0121, audio_tagging_loss=0.00875, over 3049627.17 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:07:12,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3574680.0, ans=0.1 2023-11-28 16:07:12,662 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.61 vs. limit=10.0 2023-11-28 16:07:37,933 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=15.0 2023-11-28 16:07:51,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.32 vs. limit=15.0 2023-11-28 16:08:07,003 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536250 2023-11-28 16:08:12,284 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7200, loss[loss=0.06709, simple_loss=0.09651, pruned_loss=0.01234, audio_tagging_loss=0.006487, over 17497.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08977, pruned_loss=0.01221, audio_tagging_loss=0.008817, over 3057562.05 frames. ], batch size: 63, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:08:26,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3575080.0, ans=0.125 2023-11-28 16:08:29,536 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.588e+01 8.876e+01 9.486e+01 1.031e+02 1.370e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 16:08:38,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3575146.6666666665, ans=0.125 2023-11-28 16:09:10,334 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536300 2023-11-28 16:09:14,988 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7250, loss[loss=0.07032, simple_loss=0.09366, pruned_loss=0.01438, audio_tagging_loss=0.009106, over 14982.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.08968, pruned_loss=0.01213, audio_tagging_loss=0.008844, over 3056135.88 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:09:23,151 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=15.0 2023-11-28 16:09:24,335 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.54 vs. limit=22.5 2023-11-28 16:09:55,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3575546.6666666665, ans=0.125 2023-11-28 16:10:08,059 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:10:11,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3575613.3333333335, ans=0.0 2023-11-28 16:10:12,449 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536350 2023-11-28 16:10:17,743 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7300, loss[loss=0.07168, simple_loss=0.1002, pruned_loss=0.01501, audio_tagging_loss=0.006573, over 15723.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08978, pruned_loss=0.0122, audio_tagging_loss=0.008667, over 3052467.93 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:10:20,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3575680.0, ans=0.0 2023-11-28 16:10:21,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3575680.0, ans=0.125 2023-11-28 16:10:29,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3575746.6666666665, ans=0.1 2023-11-28 16:10:34,077 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 8.874e+01 9.514e+01 1.009e+02 1.390e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 16:10:40,185 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.25 vs. limit=6.0 2023-11-28 16:10:52,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3575880.0, ans=0.0 2023-11-28 16:11:00,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3575880.0, ans=0.0 2023-11-28 16:11:09,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3575946.6666666665, ans=0.0 2023-11-28 16:11:14,302 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536400 2023-11-28 16:11:18,708 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.44 vs. limit=22.5 2023-11-28 16:11:19,283 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7350, loss[loss=0.06356, simple_loss=0.08523, pruned_loss=0.01379, audio_tagging_loss=0.007153, over 14995.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08998, pruned_loss=0.01229, audio_tagging_loss=0.008623, over 3056032.42 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:11:30,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3576013.3333333335, ans=0.2 2023-11-28 16:11:31,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3576080.0, ans=0.1 2023-11-28 16:11:33,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3576080.0, ans=0.2 2023-11-28 16:11:38,628 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2023-11-28 16:12:09,254 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.46 vs. limit=22.5 2023-11-28 16:12:12,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3576280.0, ans=0.2 2023-11-28 16:12:17,258 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536450 2023-11-28 16:12:22,026 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7400, loss[loss=0.04616, simple_loss=0.06072, pruned_loss=0.005964, audio_tagging_loss=0.009836, over 15211.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.0896, pruned_loss=0.01221, audio_tagging_loss=0.008591, over 3051114.98 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:12:22,625 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.22 vs. limit=6.0 2023-11-28 16:12:23,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3576346.6666666665, ans=0.025 2023-11-28 16:12:41,957 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 8.849e+01 9.470e+01 1.002e+02 1.238e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 16:13:06,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3576546.6666666665, ans=0.125 2023-11-28 16:13:16,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3576613.3333333335, ans=0.125 2023-11-28 16:13:19,317 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536500 2023-11-28 16:13:23,938 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7450, loss[loss=0.06233, simple_loss=0.0881, pruned_loss=0.009085, audio_tagging_loss=0.009199, over 15353.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08892, pruned_loss=0.01205, audio_tagging_loss=0.008497, over 3048433.03 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:13:29,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3576680.0, ans=0.125 2023-11-28 16:13:29,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3576680.0, ans=0.125 2023-11-28 16:13:46,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3576746.6666666665, ans=0.125 2023-11-28 16:13:48,564 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.47 vs. limit=10.0 2023-11-28 16:13:50,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3576813.3333333335, ans=0.2 2023-11-28 16:14:20,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3576946.6666666665, ans=0.1 2023-11-28 16:14:21,963 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536550 2023-11-28 16:14:26,694 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7500, loss[loss=0.06589, simple_loss=0.09415, pruned_loss=0.01413, audio_tagging_loss=0.004683, over 14283.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08861, pruned_loss=0.01205, audio_tagging_loss=0.008534, over 3046342.95 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:14:27,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3577013.3333333335, ans=0.0 2023-11-28 16:14:29,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3577013.3333333335, ans=0.2 2023-11-28 16:14:29,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3577013.3333333335, ans=0.125 2023-11-28 16:14:31,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3577013.3333333335, ans=0.125 2023-11-28 16:14:41,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=12.0 2023-11-28 16:14:47,281 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 8.969e+01 9.602e+01 1.048e+02 1.798e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 16:15:01,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3577146.6666666665, ans=0.0 2023-11-28 16:15:09,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3577213.3333333335, ans=0.0 2023-11-28 16:15:10,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.08 vs. limit=15.0 2023-11-28 16:15:24,940 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536600 2023-11-28 16:15:29,956 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7550, loss[loss=0.07452, simple_loss=0.1091, pruned_loss=0.01273, audio_tagging_loss=0.007228, over 15457.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08899, pruned_loss=0.01207, audio_tagging_loss=0.008402, over 3047762.64 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:15:34,613 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-11-28 16:15:41,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3577346.6666666665, ans=0.1 2023-11-28 16:16:02,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3577480.0, ans=0.0 2023-11-28 16:16:07,634 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2023-11-28 16:16:21,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3577613.3333333335, ans=0.1 2023-11-28 16:16:27,788 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536650 2023-11-28 16:16:32,308 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7600, loss[loss=0.07034, simple_loss=0.1014, pruned_loss=0.01236, audio_tagging_loss=0.007285, over 14977.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08925, pruned_loss=0.01209, audio_tagging_loss=0.008342, over 3050874.62 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:16:38,297 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.26 vs. limit=15.0 2023-11-28 16:16:46,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3577746.6666666665, ans=0.1 2023-11-28 16:16:51,861 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.772e+01 9.466e+01 1.004e+02 1.199e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 16:16:52,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3577746.6666666665, ans=0.1 2023-11-28 16:17:10,516 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.31 vs. limit=22.5 2023-11-28 16:17:29,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3577946.6666666665, ans=0.125 2023-11-28 16:17:30,281 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536700 2023-11-28 16:17:33,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3578013.3333333335, ans=0.0 2023-11-28 16:17:34,923 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7650, loss[loss=0.06293, simple_loss=0.0865, pruned_loss=0.01235, audio_tagging_loss=0.007338, over 14629.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08822, pruned_loss=0.01199, audio_tagging_loss=0.008419, over 3046100.43 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:17:46,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=3578080.0, ans=0.2 2023-11-28 16:17:48,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3578080.0, ans=0.0 2023-11-28 16:17:54,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3578080.0, ans=0.125 2023-11-28 16:18:02,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3578146.6666666665, ans=0.125 2023-11-28 16:18:09,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3578146.6666666665, ans=0.2 2023-11-28 16:18:26,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3578280.0, ans=0.1 2023-11-28 16:18:32,211 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536750 2023-11-28 16:18:32,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3578280.0, ans=0.5 2023-11-28 16:18:37,465 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7700, loss[loss=0.06691, simple_loss=0.08614, pruned_loss=0.0138, audio_tagging_loss=0.01004, over 15332.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08892, pruned_loss=0.01203, audio_tagging_loss=0.008481, over 3054512.48 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:18:42,459 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:18:55,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3578413.3333333335, ans=0.1 2023-11-28 16:18:57,454 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 9.045e+01 9.681e+01 1.026e+02 1.409e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-28 16:19:12,625 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=12.0 2023-11-28 16:19:13,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3578546.6666666665, ans=0.125 2023-11-28 16:19:33,947 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536800 2023-11-28 16:19:39,985 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7750, loss[loss=0.04595, simple_loss=0.05989, pruned_loss=0.007757, audio_tagging_loss=0.008243, over 15275.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08966, pruned_loss=0.01208, audio_tagging_loss=0.008507, over 3048549.14 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:19:47,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3578680.0, ans=0.0 2023-11-28 16:19:50,531 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2023-11-28 16:20:15,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3578880.0, ans=0.95 2023-11-28 16:20:37,953 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536850 2023-11-28 16:20:39,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3578946.6666666665, ans=0.0 2023-11-28 16:20:41,749 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:20:42,583 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7800, loss[loss=0.06206, simple_loss=0.08302, pruned_loss=0.01107, audio_tagging_loss=0.009484, over 15533.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09, pruned_loss=0.01209, audio_tagging_loss=0.008616, over 3047974.26 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:20:42,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3579013.3333333335, ans=0.125 2023-11-28 16:21:00,434 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.66 vs. limit=15.0 2023-11-28 16:21:01,940 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.300e+01 9.025e+01 9.590e+01 1.021e+02 1.203e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 16:21:07,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3579146.6666666665, ans=0.05 2023-11-28 16:21:16,411 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.72 vs. limit=8.0 2023-11-28 16:21:27,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3579213.3333333335, ans=0.2 2023-11-28 16:21:30,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3579280.0, ans=0.2 2023-11-28 16:21:38,656 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536900 2023-11-28 16:21:40,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3579280.0, ans=0.125 2023-11-28 16:21:43,993 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7850, loss[loss=0.07961, simple_loss=0.1135, pruned_loss=0.0167, audio_tagging_loss=0.006153, over 15176.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.09003, pruned_loss=0.01202, audio_tagging_loss=0.008634, over 3052123.21 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:22:00,433 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=12.0 2023-11-28 16:22:24,839 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:22:40,583 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 536950 2023-11-28 16:22:45,250 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7900, loss[loss=0.0626, simple_loss=0.09512, pruned_loss=0.006779, audio_tagging_loss=0.008261, over 15748.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09019, pruned_loss=0.01213, audio_tagging_loss=0.00875, over 3047644.86 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:23:03,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3579746.6666666665, ans=0.0 2023-11-28 16:23:05,968 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.056e+01 9.028e+01 9.515e+01 1.023e+02 1.434e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 16:23:22,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3579880.0, ans=0.125 2023-11-28 16:23:38,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3579946.6666666665, ans=0.125 2023-11-28 16:23:43,792 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537000 2023-11-28 16:23:48,691 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 7950, loss[loss=0.05947, simple_loss=0.07665, pruned_loss=0.01195, audio_tagging_loss=0.009197, over 14804.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.0895, pruned_loss=0.01216, audio_tagging_loss=0.008828, over 3046739.82 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:23:51,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3580013.3333333335, ans=0.0 2023-11-28 16:23:53,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3580013.3333333335, ans=0.0 2023-11-28 16:23:57,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3580013.3333333335, ans=0.0 2023-11-28 16:24:07,196 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:24:10,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3580080.0, ans=0.2 2023-11-28 16:24:13,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3580146.6666666665, ans=0.125 2023-11-28 16:24:22,376 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-11-28 16:24:22,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2023-11-28 16:24:32,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3580213.3333333335, ans=0.125 2023-11-28 16:24:45,598 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537050 2023-11-28 16:24:46,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3580280.0, ans=0.125 2023-11-28 16:24:50,109 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8000, loss[loss=0.06978, simple_loss=0.08318, pruned_loss=0.01714, audio_tagging_loss=0.01105, over 14694.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08826, pruned_loss=0.01191, audio_tagging_loss=0.008895, over 3044357.48 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:24:55,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3580346.6666666665, ans=0.1 2023-11-28 16:25:11,268 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.288e+01 8.789e+01 9.588e+01 1.026e+02 1.289e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 16:25:33,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3580546.6666666665, ans=0.1 2023-11-28 16:25:46,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3580613.3333333335, ans=0.2 2023-11-28 16:25:47,481 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537100 2023-11-28 16:25:52,294 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8050, loss[loss=0.0558, simple_loss=0.06965, pruned_loss=0.01045, audio_tagging_loss=0.01053, over 15377.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08809, pruned_loss=0.01192, audio_tagging_loss=0.00894, over 3045991.03 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:26:12,710 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=22.5 2023-11-28 16:26:28,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3580813.3333333335, ans=10.0 2023-11-28 16:26:40,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3580946.6666666665, ans=0.125 2023-11-28 16:26:45,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3580946.6666666665, ans=0.125 2023-11-28 16:26:46,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3580946.6666666665, ans=0.0 2023-11-28 16:26:48,855 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537150 2023-11-28 16:26:52,192 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-28 16:26:54,677 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8100, loss[loss=0.04912, simple_loss=0.07018, pruned_loss=0.007737, audio_tagging_loss=0.006298, over 16014.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.0885, pruned_loss=0.012, audio_tagging_loss=0.00885, over 3046932.23 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:27:04,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.12 vs. limit=10.0 2023-11-28 16:27:14,429 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-11-28 16:27:18,046 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.231e+01 8.911e+01 9.512e+01 1.026e+02 1.565e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 16:27:54,004 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537200 2023-11-28 16:27:58,918 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8150, loss[loss=0.06256, simple_loss=0.08677, pruned_loss=0.01156, audio_tagging_loss=0.007617, over 15905.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08913, pruned_loss=0.01215, audio_tagging_loss=0.008649, over 3041420.70 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:28:08,391 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2023-11-28 16:28:40,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3581546.6666666665, ans=0.125 2023-11-28 16:28:44,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3581546.6666666665, ans=0.2 2023-11-28 16:28:52,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3581613.3333333335, ans=0.2 2023-11-28 16:28:56,574 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537250 2023-11-28 16:29:01,128 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8200, loss[loss=0.05207, simple_loss=0.07406, pruned_loss=0.008584, audio_tagging_loss=0.006455, over 15519.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08957, pruned_loss=0.01212, audio_tagging_loss=0.008535, over 3039029.28 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:29:05,707 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:29:20,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3581746.6666666665, ans=0.125 2023-11-28 16:29:23,386 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 9.010e+01 9.552e+01 1.037e+02 1.390e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 16:29:42,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3581880.0, ans=0.0 2023-11-28 16:29:53,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3581946.6666666665, ans=0.0 2023-11-28 16:29:57,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3581946.6666666665, ans=0.2 2023-11-28 16:29:58,261 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537300 2023-11-28 16:30:01,787 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:30:02,798 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8250, loss[loss=0.03994, simple_loss=0.05058, pruned_loss=0.007452, audio_tagging_loss=0.0072, over 15056.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08856, pruned_loss=0.01199, audio_tagging_loss=0.008571, over 3041483.77 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:30:05,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3582013.3333333335, ans=0.1 2023-11-28 16:30:36,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3582146.6666666665, ans=0.125 2023-11-28 16:31:00,836 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537350 2023-11-28 16:31:01,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3582280.0, ans=22.5 2023-11-28 16:31:06,255 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8300, loss[loss=0.06866, simple_loss=0.09171, pruned_loss=0.01241, audio_tagging_loss=0.01039, over 14293.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08895, pruned_loss=0.01189, audio_tagging_loss=0.008595, over 3040316.69 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:31:28,115 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.841e+01 9.438e+01 1.013e+02 1.279e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 16:31:29,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3582413.3333333335, ans=0.125 2023-11-28 16:31:47,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3582546.6666666665, ans=0.125 2023-11-28 16:31:58,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3582613.3333333335, ans=0.125 2023-11-28 16:32:03,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3582613.3333333335, ans=0.125 2023-11-28 16:32:03,972 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537400 2023-11-28 16:32:08,976 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8350, loss[loss=0.06813, simple_loss=0.1026, pruned_loss=0.01081, audio_tagging_loss=0.006011, over 15339.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08843, pruned_loss=0.0118, audio_tagging_loss=0.008529, over 3046224.14 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:32:09,646 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.39 vs. limit=6.0 2023-11-28 16:32:34,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3582813.3333333335, ans=0.125 2023-11-28 16:32:34,769 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=22.5 2023-11-28 16:32:35,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3582813.3333333335, ans=0.125 2023-11-28 16:32:51,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3582880.0, ans=0.1 2023-11-28 16:33:05,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3582946.6666666665, ans=0.125 2023-11-28 16:33:06,826 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537450 2023-11-28 16:33:11,434 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8400, loss[loss=0.06537, simple_loss=0.09066, pruned_loss=0.01268, audio_tagging_loss=0.007364, over 15372.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08871, pruned_loss=0.01198, audio_tagging_loss=0.008467, over 3045005.87 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:33:22,208 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=12.0 2023-11-28 16:33:34,436 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.936e+01 9.656e+01 1.030e+02 3.353e+02, threshold=1.931e+02, percent-clipped=1.0 2023-11-28 16:33:35,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3583146.6666666665, ans=0.125 2023-11-28 16:33:48,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3583213.3333333335, ans=10.0 2023-11-28 16:33:53,322 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.06 vs. limit=10.0 2023-11-28 16:34:09,831 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537500 2023-11-28 16:34:14,419 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8450, loss[loss=0.04581, simple_loss=0.05807, pruned_loss=0.006615, audio_tagging_loss=0.01016, over 14559.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.0885, pruned_loss=0.01202, audio_tagging_loss=0.008439, over 3044829.08 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:34:25,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3583346.6666666665, ans=10.0 2023-11-28 16:34:56,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3583546.6666666665, ans=0.2 2023-11-28 16:35:09,440 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.73 vs. limit=15.0 2023-11-28 16:35:13,096 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537550 2023-11-28 16:35:17,699 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8500, loss[loss=0.07078, simple_loss=0.09435, pruned_loss=0.01496, audio_tagging_loss=0.008646, over 15371.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08899, pruned_loss=0.01202, audio_tagging_loss=0.008446, over 3048954.11 frames. ], batch size: 60, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:35:17,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3583680.0, ans=0.125 2023-11-28 16:35:24,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3583680.0, ans=0.2 2023-11-28 16:35:27,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3583680.0, ans=0.125 2023-11-28 16:35:36,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3583746.6666666665, ans=0.1 2023-11-28 16:35:40,020 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.468e+01 8.912e+01 9.575e+01 1.015e+02 1.303e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 16:35:42,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3583813.3333333335, ans=0.0 2023-11-28 16:35:44,128 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2023-11-28 16:35:55,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3583880.0, ans=0.0 2023-11-28 16:36:09,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3583946.6666666665, ans=0.125 2023-11-28 16:36:13,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3583946.6666666665, ans=0.125 2023-11-28 16:36:14,179 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537600 2023-11-28 16:36:19,724 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8550, loss[loss=0.06919, simple_loss=0.08502, pruned_loss=0.01553, audio_tagging_loss=0.01115, over 15202.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.0891, pruned_loss=0.01208, audio_tagging_loss=0.008496, over 3048616.83 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:36:30,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3584013.3333333335, ans=0.125 2023-11-28 16:36:33,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3584080.0, ans=0.95 2023-11-28 16:36:43,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3584146.6666666665, ans=0.125 2023-11-28 16:36:48,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3584146.6666666665, ans=0.125 2023-11-28 16:36:53,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3584146.6666666665, ans=0.125 2023-11-28 16:37:06,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3584213.3333333335, ans=0.0 2023-11-28 16:37:16,991 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537650 2023-11-28 16:37:21,644 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8600, loss[loss=0.07066, simple_loss=0.1035, pruned_loss=0.01227, audio_tagging_loss=0.006651, over 16105.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08877, pruned_loss=0.01199, audio_tagging_loss=0.00858, over 3044114.70 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:37:29,100 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:37:30,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3584346.6666666665, ans=0.2 2023-11-28 16:37:32,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3584413.3333333335, ans=0.125 2023-11-28 16:37:40,176 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2023-11-28 16:37:44,136 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.628e+01 8.790e+01 9.576e+01 1.011e+02 1.183e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 16:37:53,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3584480.0, ans=0.125 2023-11-28 16:37:54,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3584480.0, ans=0.125 2023-11-28 16:38:01,518 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=22.5 2023-11-28 16:38:16,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3584613.3333333335, ans=0.125 2023-11-28 16:38:18,645 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537700 2023-11-28 16:38:23,729 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8650, loss[loss=0.07403, simple_loss=0.1017, pruned_loss=0.01444, audio_tagging_loss=0.008723, over 16726.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.0902, pruned_loss=0.01224, audio_tagging_loss=0.008558, over 3040953.80 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:38:30,961 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.51 vs. limit=15.0 2023-11-28 16:38:49,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3584813.3333333335, ans=0.125 2023-11-28 16:38:54,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3584813.3333333335, ans=0.125 2023-11-28 16:38:56,534 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:39:21,422 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537750 2023-11-28 16:39:26,594 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8700, loss[loss=0.07244, simple_loss=0.1007, pruned_loss=0.01224, audio_tagging_loss=0.009839, over 15371.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09013, pruned_loss=0.01206, audio_tagging_loss=0.008619, over 3042950.57 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:39:48,535 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 9.145e+01 9.850e+01 1.054e+02 1.476e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-28 16:39:53,937 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2023-11-28 16:39:58,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3585146.6666666665, ans=0.1 2023-11-28 16:40:02,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3585213.3333333335, ans=0.0 2023-11-28 16:40:14,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3585213.3333333335, ans=0.2 2023-11-28 16:40:24,463 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537800 2023-11-28 16:40:25,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2023-11-28 16:40:29,011 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-28 16:40:29,351 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8750, loss[loss=0.05244, simple_loss=0.06748, pruned_loss=0.008513, audio_tagging_loss=0.01018, over 14735.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.0902, pruned_loss=0.01203, audio_tagging_loss=0.008669, over 3048304.91 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:40:36,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3585346.6666666665, ans=0.125 2023-11-28 16:40:40,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3585413.3333333335, ans=0.125 2023-11-28 16:40:57,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3585480.0, ans=0.1 2023-11-28 16:40:58,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3585480.0, ans=0.0 2023-11-28 16:41:19,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3585613.3333333335, ans=0.125 2023-11-28 16:41:21,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3585613.3333333335, ans=0.2 2023-11-28 16:41:26,367 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537850 2023-11-28 16:41:30,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3585680.0, ans=0.0 2023-11-28 16:41:31,153 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8800, loss[loss=0.08351, simple_loss=0.1162, pruned_loss=0.01844, audio_tagging_loss=0.006955, over 14367.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.09159, pruned_loss=0.01222, audio_tagging_loss=0.008697, over 3046348.31 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:41:34,408 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:41:39,219 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:41:46,188 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=12.0 2023-11-28 16:41:54,342 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 9.010e+01 9.598e+01 1.030e+02 1.176e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 16:41:56,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3585813.3333333335, ans=0.0 2023-11-28 16:42:14,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3585880.0, ans=0.0 2023-11-28 16:42:28,829 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537900 2023-11-28 16:42:30,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3585946.6666666665, ans=0.125 2023-11-28 16:42:34,083 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8850, loss[loss=0.06884, simple_loss=0.09834, pruned_loss=0.01065, audio_tagging_loss=0.009017, over 15880.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09116, pruned_loss=0.01224, audio_tagging_loss=0.008776, over 3043112.96 frames. ], batch size: 58, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:42:44,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3586013.3333333335, ans=0.5 2023-11-28 16:42:48,978 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=15.0 2023-11-28 16:42:50,870 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:43:07,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3586146.6666666665, ans=0.1 2023-11-28 16:43:09,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3586146.6666666665, ans=0.125 2023-11-28 16:43:09,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3586146.6666666665, ans=0.07 2023-11-28 16:43:28,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3586280.0, ans=0.125 2023-11-28 16:43:31,366 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 537950 2023-11-28 16:43:36,692 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8900, loss[loss=0.07376, simple_loss=0.09841, pruned_loss=0.01611, audio_tagging_loss=0.008446, over 16022.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09056, pruned_loss=0.01221, audio_tagging_loss=0.008715, over 3038838.23 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:43:51,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3586413.3333333335, ans=0.125 2023-11-28 16:43:59,146 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 9.009e+01 9.604e+01 1.041e+02 1.260e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 16:44:01,996 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.89 vs. limit=15.0 2023-11-28 16:44:04,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3586480.0, ans=0.125 2023-11-28 16:44:17,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3586546.6666666665, ans=0.125 2023-11-28 16:44:18,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3586546.6666666665, ans=0.0 2023-11-28 16:44:28,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3586613.3333333335, ans=0.125 2023-11-28 16:44:28,668 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2023-11-28 16:44:29,882 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.48 vs. limit=22.5 2023-11-28 16:44:33,998 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538000 2023-11-28 16:44:37,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3586680.0, ans=0.125 2023-11-28 16:44:38,997 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 8950, loss[loss=0.0689, simple_loss=0.08902, pruned_loss=0.01324, audio_tagging_loss=0.01115, over 15578.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09167, pruned_loss=0.01224, audio_tagging_loss=0.008466, over 3039017.19 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:44:40,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3586680.0, ans=0.125 2023-11-28 16:44:55,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3586746.6666666665, ans=0.0 2023-11-28 16:44:57,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3586746.6666666665, ans=0.0 2023-11-28 16:44:58,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3586746.6666666665, ans=0.0 2023-11-28 16:45:27,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.85 vs. limit=15.0 2023-11-28 16:45:37,223 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538050 2023-11-28 16:45:41,936 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9000, loss[loss=0.09495, simple_loss=0.1329, pruned_loss=0.02112, audio_tagging_loss=0.007397, over 16320.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09124, pruned_loss=0.0122, audio_tagging_loss=0.008428, over 3036487.81 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:45:41,936 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 16:45:59,789 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.6016, 3.1074, 3.2101, 2.6827], device='cuda:2') 2023-11-28 16:46:23,745 INFO [train_asr.py:1267] (2/4) Epoch 45, validation: loss=0.05837, simple_loss=0.05051, pruned_loss=0.005241, audio_tagging_loss=0.02788, over 4681554.00 frames. 2023-11-28 16:46:23,745 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 16:46:34,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3587080.0, ans=0.0 2023-11-28 16:46:44,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3587080.0, ans=0.2 2023-11-28 16:46:44,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3587080.0, ans=0.0 2023-11-28 16:46:46,757 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.988e+01 9.549e+01 1.029e+02 1.340e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 16:46:51,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3587146.6666666665, ans=0.125 2023-11-28 16:47:05,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3587213.3333333335, ans=0.0 2023-11-28 16:47:05,278 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-28 16:47:14,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3587280.0, ans=0.125 2023-11-28 16:47:20,953 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538100 2023-11-28 16:47:26,271 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9050, loss[loss=0.04309, simple_loss=0.05739, pruned_loss=0.005144, audio_tagging_loss=0.009248, over 14481.00 frames. ], tot_loss[loss=0.06658, simple_loss=0.092, pruned_loss=0.01226, audio_tagging_loss=0.008326, over 3044126.25 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:47:42,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3587413.3333333335, ans=0.05 2023-11-28 16:47:47,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3587413.3333333335, ans=0.125 2023-11-28 16:48:23,510 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538150 2023-11-28 16:48:28,188 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9100, loss[loss=0.06701, simple_loss=0.09585, pruned_loss=0.01162, audio_tagging_loss=0.007464, over 15308.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09077, pruned_loss=0.01216, audio_tagging_loss=0.00833, over 3050077.35 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:48:30,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3587680.0, ans=0.0 2023-11-28 16:48:30,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3587680.0, ans=0.125 2023-11-28 16:48:52,573 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.406e+01 8.776e+01 9.450e+01 1.014e+02 1.425e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 16:49:26,166 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538200 2023-11-28 16:49:30,971 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9150, loss[loss=0.07178, simple_loss=0.1007, pruned_loss=0.01192, audio_tagging_loss=0.009489, over 16119.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.0905, pruned_loss=0.01208, audio_tagging_loss=0.008356, over 3050661.82 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:50:17,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3588213.3333333335, ans=0.125 2023-11-28 16:50:28,702 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538250 2023-11-28 16:50:30,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3588280.0, ans=0.125 2023-11-28 16:50:33,271 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9200, loss[loss=0.08115, simple_loss=0.1198, pruned_loss=0.01469, audio_tagging_loss=0.006583, over 15511.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.09003, pruned_loss=0.01205, audio_tagging_loss=0.008334, over 3046309.01 frames. ], batch size: 56, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:50:48,619 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=15.0 2023-11-28 16:50:56,597 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 9.083e+01 9.538e+01 1.018e+02 1.192e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 16:51:00,833 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.42 vs. limit=15.0 2023-11-28 16:51:20,112 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.28 vs. limit=22.5 2023-11-28 16:51:29,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=15.0 2023-11-28 16:51:30,071 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538300 2023-11-28 16:51:35,107 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9250, loss[loss=0.06789, simple_loss=0.09704, pruned_loss=0.01179, audio_tagging_loss=0.007586, over 16730.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.0896, pruned_loss=0.01201, audio_tagging_loss=0.008379, over 3051206.30 frames. ], batch size: 61, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:52:13,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3588880.0, ans=0.07 2023-11-28 16:52:25,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3588946.6666666665, ans=0.2 2023-11-28 16:52:34,260 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538350 2023-11-28 16:52:39,058 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9300, loss[loss=0.05557, simple_loss=0.0724, pruned_loss=0.01022, audio_tagging_loss=0.009142, over 15807.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08981, pruned_loss=0.01201, audio_tagging_loss=0.008393, over 3048249.97 frames. ], batch size: 62, lr: 1.50e-03, grad_scale: 32.0 2023-11-28 16:52:59,019 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.04 vs. limit=10.0 2023-11-28 16:53:03,656 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.801e+01 8.822e+01 9.388e+01 1.037e+02 1.623e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 16:53:09,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3589146.6666666665, ans=0.2 2023-11-28 16:53:13,749 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2023-11-28 16:53:23,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3589213.3333333335, ans=0.125 2023-11-28 16:53:30,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3589280.0, ans=0.2 2023-11-28 16:53:34,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3589280.0, ans=0.5 2023-11-28 16:53:37,171 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538400 2023-11-28 16:53:39,724 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.84 vs. limit=10.0 2023-11-28 16:53:40,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3589280.0, ans=0.125 2023-11-28 16:53:42,843 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9350, loss[loss=0.04289, simple_loss=0.04982, pruned_loss=0.007996, audio_tagging_loss=0.009979, over 13952.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08985, pruned_loss=0.0122, audio_tagging_loss=0.008432, over 3043611.17 frames. ], batch size: 54, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:54:12,397 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=22.5 2023-11-28 16:54:17,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3589480.0, ans=0.0 2023-11-28 16:54:32,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3589613.3333333335, ans=0.1 2023-11-28 16:54:32,214 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.93 vs. limit=10.0 2023-11-28 16:54:35,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2023-11-28 16:54:38,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3589613.3333333335, ans=0.125 2023-11-28 16:54:40,527 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538450 2023-11-28 16:54:45,248 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9400, loss[loss=0.06733, simple_loss=0.08528, pruned_loss=0.01447, audio_tagging_loss=0.01022, over 14497.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.09004, pruned_loss=0.01224, audio_tagging_loss=0.00856, over 3043922.51 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:54:57,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3589746.6666666665, ans=0.1 2023-11-28 16:55:03,336 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.18 vs. limit=10.0 2023-11-28 16:55:11,218 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.693e+01 8.781e+01 9.524e+01 1.025e+02 1.257e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-28 16:55:28,316 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=12.0 2023-11-28 16:55:42,507 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538500 2023-11-28 16:55:45,147 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 16:55:47,197 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9450, loss[loss=0.08127, simple_loss=0.1084, pruned_loss=0.01905, audio_tagging_loss=0.008015, over 14313.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09018, pruned_loss=0.01232, audio_tagging_loss=0.008663, over 3051848.71 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:55:49,523 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 16:55:57,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3590013.3333333335, ans=0.125 2023-11-28 16:56:06,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3590080.0, ans=0.0 2023-11-28 16:56:12,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3590146.6666666665, ans=0.2 2023-11-28 16:56:15,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3590146.6666666665, ans=0.125 2023-11-28 16:56:15,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3590146.6666666665, ans=0.1 2023-11-28 16:56:16,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3590146.6666666665, ans=0.2 2023-11-28 16:56:26,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3590213.3333333335, ans=0.0 2023-11-28 16:56:39,048 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.80 vs. limit=15.0 2023-11-28 16:56:45,133 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538550 2023-11-28 16:56:49,791 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9500, loss[loss=0.07294, simple_loss=0.09919, pruned_loss=0.01215, audio_tagging_loss=0.0112, over 14367.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.0905, pruned_loss=0.01227, audio_tagging_loss=0.00864, over 3053237.43 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:57:00,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3590346.6666666665, ans=0.1 2023-11-28 16:57:16,060 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.050e+01 9.072e+01 9.708e+01 1.059e+02 2.012e+02, threshold=1.942e+02, percent-clipped=1.0 2023-11-28 16:57:46,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3590613.3333333335, ans=0.1 2023-11-28 16:57:47,648 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538600 2023-11-28 16:57:50,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3590613.3333333335, ans=0.0 2023-11-28 16:57:51,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3590680.0, ans=0.125 2023-11-28 16:57:52,592 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9550, loss[loss=0.05382, simple_loss=0.07096, pruned_loss=0.0099, audio_tagging_loss=0.008438, over 15897.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.0906, pruned_loss=0.01231, audio_tagging_loss=0.008758, over 3051943.85 frames. ], batch size: 59, lr: 1.50e-03, grad_scale: 8.0 2023-11-28 16:58:02,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3590680.0, ans=0.125 2023-11-28 16:58:10,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2023-11-28 16:58:22,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3590813.3333333335, ans=0.0 2023-11-28 16:58:38,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3590880.0, ans=0.2 2023-11-28 16:58:47,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3590946.6666666665, ans=0.125 2023-11-28 16:58:50,192 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538650 2023-11-28 16:58:55,021 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9600, loss[loss=0.08618, simple_loss=0.1203, pruned_loss=0.01862, audio_tagging_loss=0.007389, over 15087.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09119, pruned_loss=0.01252, audio_tagging_loss=0.008755, over 3055083.63 frames. ], batch size: 55, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 16:58:56,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3591013.3333333335, ans=0.125 2023-11-28 16:58:57,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3591013.3333333335, ans=0.125 2023-11-28 16:59:17,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3591080.0, ans=0.2 2023-11-28 16:59:21,716 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.897e+01 9.047e+01 9.589e+01 1.034e+02 1.302e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 16:59:22,403 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2023-11-28 16:59:34,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3591213.3333333335, ans=0.125 2023-11-28 16:59:36,184 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2023-11-28 16:59:40,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3591213.3333333335, ans=0.035 2023-11-28 16:59:42,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3591213.3333333335, ans=0.0 2023-11-28 16:59:52,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3591280.0, ans=0.2 2023-11-28 16:59:53,242 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538700 2023-11-28 16:59:57,917 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9650, loss[loss=0.06515, simple_loss=0.08535, pruned_loss=0.0145, audio_tagging_loss=0.007969, over 14108.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09083, pruned_loss=0.01232, audio_tagging_loss=0.008758, over 3058823.16 frames. ], batch size: 53, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 17:00:12,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3591413.3333333335, ans=0.2 2023-11-28 17:00:15,310 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.62 vs. limit=15.0 2023-11-28 17:00:28,794 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.31 vs. limit=15.0 2023-11-28 17:00:36,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3591546.6666666665, ans=0.1 2023-11-28 17:00:39,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3591546.6666666665, ans=0.0 2023-11-28 17:00:41,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3591546.6666666665, ans=0.07 2023-11-28 17:00:51,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3591613.3333333335, ans=0.125 2023-11-28 17:00:52,406 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:00:54,573 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538750 2023-11-28 17:00:58,064 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.08 vs. limit=15.0 2023-11-28 17:00:59,821 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9700, loss[loss=0.04718, simple_loss=0.05642, pruned_loss=0.007289, audio_tagging_loss=0.01168, over 14567.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08942, pruned_loss=0.01211, audio_tagging_loss=0.008719, over 3045064.72 frames. ], batch size: 57, lr: 1.50e-03, grad_scale: 16.0 2023-11-28 17:01:14,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3591746.6666666665, ans=0.125 2023-11-28 17:01:26,473 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.891e+01 8.979e+01 9.434e+01 1.003e+02 1.570e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 17:01:32,961 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.80 vs. limit=15.0 2023-11-28 17:01:48,001 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.67 vs. limit=12.0 2023-11-28 17:01:48,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3591946.6666666665, ans=0.125 2023-11-28 17:01:52,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3591946.6666666665, ans=0.2 2023-11-28 17:01:54,450 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.55 vs. limit=15.0 2023-11-28 17:01:57,468 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538800 2023-11-28 17:02:02,767 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9750, loss[loss=0.062, simple_loss=0.08902, pruned_loss=0.01056, audio_tagging_loss=0.006936, over 15672.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08943, pruned_loss=0.01211, audio_tagging_loss=0.008689, over 3040758.60 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:02:10,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3592013.3333333335, ans=0.0 2023-11-28 17:02:15,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3592080.0, ans=0.0 2023-11-28 17:02:20,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3592080.0, ans=0.125 2023-11-28 17:02:22,872 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.74 vs. limit=22.5 2023-11-28 17:02:23,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3592080.0, ans=0.125 2023-11-28 17:02:36,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3592146.6666666665, ans=0.125 2023-11-28 17:02:53,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3592280.0, ans=0.125 2023-11-28 17:02:59,583 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538850 2023-11-28 17:03:01,506 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.31 vs. limit=15.0 2023-11-28 17:03:04,984 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9800, loss[loss=0.07539, simple_loss=0.1101, pruned_loss=0.01316, audio_tagging_loss=0.007183, over 16433.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08977, pruned_loss=0.01211, audio_tagging_loss=0.00855, over 3044472.66 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:03:07,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3592346.6666666665, ans=0.125 2023-11-28 17:03:31,251 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.992e+01 9.640e+01 1.016e+02 2.169e+02, threshold=1.928e+02, percent-clipped=1.0 2023-11-28 17:03:43,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3592546.6666666665, ans=0.0 2023-11-28 17:03:55,555 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2023-11-28 17:04:02,263 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538900 2023-11-28 17:04:04,654 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:04:07,524 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9850, loss[loss=0.06365, simple_loss=0.09186, pruned_loss=0.01155, audio_tagging_loss=0.006171, over 15373.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08991, pruned_loss=0.01226, audio_tagging_loss=0.008488, over 3039583.70 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:04:18,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3592680.0, ans=0.125 2023-11-28 17:04:53,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3592880.0, ans=0.0 2023-11-28 17:05:05,082 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 538950 2023-11-28 17:05:10,984 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9900, loss[loss=0.07612, simple_loss=0.1153, pruned_loss=0.01208, audio_tagging_loss=0.006386, over 16069.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09021, pruned_loss=0.01225, audio_tagging_loss=0.008495, over 3040635.25 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:05:12,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3593013.3333333335, ans=0.0 2023-11-28 17:05:12,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3593013.3333333335, ans=0.0 2023-11-28 17:05:12,789 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.64 vs. limit=22.5 2023-11-28 17:05:13,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3593013.3333333335, ans=0.0 2023-11-28 17:05:19,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3593013.3333333335, ans=0.5 2023-11-28 17:05:23,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3593080.0, ans=0.2 2023-11-28 17:05:33,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3593080.0, ans=0.125 2023-11-28 17:05:36,809 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 9.160e+01 9.894e+01 1.082e+02 1.663e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-28 17:05:38,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3593146.6666666665, ans=0.0 2023-11-28 17:05:46,536 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=15.0 2023-11-28 17:06:08,653 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539000 2023-11-28 17:06:14,244 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 9950, loss[loss=0.05664, simple_loss=0.07255, pruned_loss=0.008939, audio_tagging_loss=0.01143, over 16916.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08967, pruned_loss=0.01223, audio_tagging_loss=0.008564, over 3039151.57 frames. ], batch size: 64, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:06:44,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3593480.0, ans=0.0 2023-11-28 17:07:04,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3593613.3333333335, ans=0.0 2023-11-28 17:07:09,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3593613.3333333335, ans=0.2 2023-11-28 17:07:10,807 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539050 2023-11-28 17:07:15,454 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10000, loss[loss=0.05564, simple_loss=0.06864, pruned_loss=0.01047, audio_tagging_loss=0.01085, over 14833.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08876, pruned_loss=0.01226, audio_tagging_loss=0.008657, over 3040456.90 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:07:24,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3593680.0, ans=0.0 2023-11-28 17:07:36,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3593746.6666666665, ans=0.125 2023-11-28 17:07:38,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3593746.6666666665, ans=0.2 2023-11-28 17:07:42,049 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.583e+01 8.648e+01 9.149e+01 9.983e+01 1.212e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-28 17:08:13,298 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539100 2023-11-28 17:08:13,776 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2023-11-28 17:08:14,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3593946.6666666665, ans=0.1 2023-11-28 17:08:18,064 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10050, loss[loss=0.06941, simple_loss=0.08836, pruned_loss=0.01512, audio_tagging_loss=0.01011, over 15136.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08881, pruned_loss=0.01232, audio_tagging_loss=0.0086, over 3041360.94 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:08:20,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3594013.3333333335, ans=0.2 2023-11-28 17:08:27,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3594013.3333333335, ans=0.025 2023-11-28 17:08:31,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3594080.0, ans=0.1 2023-11-28 17:08:43,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3594146.6666666665, ans=0.0 2023-11-28 17:08:55,347 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.66 vs. limit=22.5 2023-11-28 17:09:16,319 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539150 2023-11-28 17:09:18,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3594280.0, ans=0.1 2023-11-28 17:09:21,030 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10100, loss[loss=0.06794, simple_loss=0.08564, pruned_loss=0.0157, audio_tagging_loss=0.009416, over 15358.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08901, pruned_loss=0.01239, audio_tagging_loss=0.00868, over 3044448.51 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:09:48,554 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.910e+01 9.610e+01 1.020e+02 1.223e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 17:10:04,206 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:10:08,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3594546.6666666665, ans=0.0 2023-11-28 17:10:16,220 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:10:18,766 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539200 2023-11-28 17:10:23,885 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10150, loss[loss=0.07671, simple_loss=0.1078, pruned_loss=0.01465, audio_tagging_loss=0.008143, over 15148.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08913, pruned_loss=0.01221, audio_tagging_loss=0.008646, over 3052058.69 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:10:32,060 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-11-28 17:10:34,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3594680.0, ans=0.125 2023-11-28 17:10:44,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3594746.6666666665, ans=0.125 2023-11-28 17:10:57,595 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:11:10,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3594880.0, ans=0.0 2023-11-28 17:11:15,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3594946.6666666665, ans=0.035 2023-11-28 17:11:18,747 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=12.0 2023-11-28 17:11:20,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3594946.6666666665, ans=0.125 2023-11-28 17:11:21,527 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539250 2023-11-28 17:11:26,357 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10200, loss[loss=0.07536, simple_loss=0.1029, pruned_loss=0.01527, audio_tagging_loss=0.008626, over 16518.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09031, pruned_loss=0.01247, audio_tagging_loss=0.008691, over 3058356.30 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:11:28,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.46 vs. limit=15.0 2023-11-28 17:11:54,331 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.974e+01 9.604e+01 1.042e+02 1.393e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 17:11:54,386 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:12:15,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3595280.0, ans=0.0 2023-11-28 17:12:24,017 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539300 2023-11-28 17:12:28,732 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10250, loss[loss=0.06042, simple_loss=0.07889, pruned_loss=0.01159, audio_tagging_loss=0.009379, over 14710.00 frames. ], tot_loss[loss=0.0666, simple_loss=0.09091, pruned_loss=0.01249, audio_tagging_loss=0.008658, over 3048727.64 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:12:33,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3595346.6666666665, ans=0.125 2023-11-28 17:12:33,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3595346.6666666665, ans=0.0 2023-11-28 17:12:53,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3595480.0, ans=0.125 2023-11-28 17:13:11,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2023-11-28 17:13:27,173 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539350 2023-11-28 17:13:31,868 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10300, loss[loss=0.07096, simple_loss=0.1047, pruned_loss=0.01219, audio_tagging_loss=0.006402, over 16539.00 frames. ], tot_loss[loss=0.06672, simple_loss=0.09136, pruned_loss=0.01244, audio_tagging_loss=0.008603, over 3051017.45 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:13:41,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3595680.0, ans=0.125 2023-11-28 17:13:59,039 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.726e+01 9.050e+01 9.766e+01 1.043e+02 1.224e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-28 17:14:29,264 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539400 2023-11-28 17:14:34,238 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10350, loss[loss=0.06814, simple_loss=0.09525, pruned_loss=0.0107, audio_tagging_loss=0.009813, over 15791.00 frames. ], tot_loss[loss=0.06681, simple_loss=0.09141, pruned_loss=0.01242, audio_tagging_loss=0.008686, over 3057502.11 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:14:38,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3596013.3333333335, ans=0.95 2023-11-28 17:15:08,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3596146.6666666665, ans=0.5 2023-11-28 17:15:22,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3596280.0, ans=0.125 2023-11-28 17:15:30,072 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539450 2023-11-28 17:15:34,669 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10400, loss[loss=0.05676, simple_loss=0.0754, pruned_loss=0.009692, audio_tagging_loss=0.009371, over 15521.00 frames. ], tot_loss[loss=0.06638, simple_loss=0.09045, pruned_loss=0.01234, audio_tagging_loss=0.008815, over 3049937.88 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:16:01,200 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.937e+01 9.027e+01 9.708e+01 1.021e+02 1.407e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 17:16:05,546 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.03 vs. limit=22.5 2023-11-28 17:16:14,854 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2023-11-28 17:16:25,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3596613.3333333335, ans=0.0 2023-11-28 17:16:30,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3596613.3333333335, ans=0.2 2023-11-28 17:16:30,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3596613.3333333335, ans=0.1 2023-11-28 17:16:32,233 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539500 2023-11-28 17:16:33,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3596613.3333333335, ans=0.0 2023-11-28 17:16:36,730 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10450, loss[loss=0.06579, simple_loss=0.097, pruned_loss=0.008336, audio_tagging_loss=0.008952, over 15475.00 frames. ], tot_loss[loss=0.06615, simple_loss=0.09001, pruned_loss=0.01227, audio_tagging_loss=0.008878, over 3042798.68 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:16:51,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3596746.6666666665, ans=0.125 2023-11-28 17:17:02,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3596813.3333333335, ans=0.125 2023-11-28 17:17:05,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3596813.3333333335, ans=0.125 2023-11-28 17:17:09,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3596813.3333333335, ans=0.2 2023-11-28 17:17:15,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3596880.0, ans=0.2 2023-11-28 17:17:31,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3596946.6666666665, ans=0.125 2023-11-28 17:17:33,677 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539550 2023-11-28 17:17:38,926 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10500, loss[loss=0.0629, simple_loss=0.08084, pruned_loss=0.01461, audio_tagging_loss=0.007867, over 14422.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09022, pruned_loss=0.01234, audio_tagging_loss=0.008742, over 3053331.89 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:17:50,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-28 17:17:55,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3597080.0, ans=0.0 2023-11-28 17:17:58,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3597080.0, ans=0.04949747468305833 2023-11-28 17:18:06,789 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.931e+01 9.605e+01 1.033e+02 2.073e+02, threshold=1.921e+02, percent-clipped=1.0 2023-11-28 17:18:26,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3597213.3333333335, ans=0.0 2023-11-28 17:18:26,386 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2023-11-28 17:18:35,971 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539600 2023-11-28 17:18:40,856 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10550, loss[loss=0.06569, simple_loss=0.09034, pruned_loss=0.01139, audio_tagging_loss=0.009128, over 16212.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.0892, pruned_loss=0.01206, audio_tagging_loss=0.008649, over 3057303.11 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:18:45,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3597346.6666666665, ans=0.125 2023-11-28 17:18:52,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3597413.3333333335, ans=0.0 2023-11-28 17:19:03,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3597413.3333333335, ans=0.04949747468305833 2023-11-28 17:19:07,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3597480.0, ans=0.1 2023-11-28 17:19:16,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3597480.0, ans=10.0 2023-11-28 17:19:33,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3597613.3333333335, ans=0.125 2023-11-28 17:19:37,822 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539650 2023-11-28 17:19:42,577 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10600, loss[loss=0.08105, simple_loss=0.1084, pruned_loss=0.01907, audio_tagging_loss=0.007788, over 15590.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08856, pruned_loss=0.01203, audio_tagging_loss=0.008622, over 3056686.76 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:19:42,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3597680.0, ans=0.0 2023-11-28 17:19:50,936 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.98 vs. limit=15.0 2023-11-28 17:20:03,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3597746.6666666665, ans=0.0 2023-11-28 17:20:07,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3597813.3333333335, ans=0.125 2023-11-28 17:20:07,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3597813.3333333335, ans=0.0 2023-11-28 17:20:11,851 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.139e+01 8.954e+01 9.589e+01 1.025e+02 1.251e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 17:20:40,000 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539700 2023-11-28 17:20:45,322 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10650, loss[loss=0.06431, simple_loss=0.08285, pruned_loss=0.01233, audio_tagging_loss=0.01056, over 14626.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08849, pruned_loss=0.012, audio_tagging_loss=0.008676, over 3049225.14 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:20:54,064 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.37 vs. limit=22.5 2023-11-28 17:21:06,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-11-28 17:21:17,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3598146.6666666665, ans=0.0 2023-11-28 17:21:24,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3598213.3333333335, ans=0.125 2023-11-28 17:21:29,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3598213.3333333335, ans=0.0 2023-11-28 17:21:41,938 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539750 2023-11-28 17:21:47,279 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10700, loss[loss=0.05177, simple_loss=0.06459, pruned_loss=0.01114, audio_tagging_loss=0.008331, over 15364.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08947, pruned_loss=0.01202, audio_tagging_loss=0.008632, over 3046503.21 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:22:04,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3598413.3333333335, ans=0.5 2023-11-28 17:22:15,499 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.694e+01 8.739e+01 9.237e+01 1.012e+02 1.304e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-28 17:22:32,429 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2023-11-28 17:22:43,824 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539800 2023-11-28 17:22:48,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3598680.0, ans=0.0 2023-11-28 17:22:49,046 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10750, loss[loss=0.06906, simple_loss=0.1025, pruned_loss=0.01148, audio_tagging_loss=0.006312, over 15755.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08958, pruned_loss=0.01202, audio_tagging_loss=0.008643, over 3038246.35 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:22:49,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3598680.0, ans=0.125 2023-11-28 17:23:01,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3598746.6666666665, ans=0.125 2023-11-28 17:23:11,207 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=22.5 2023-11-28 17:23:13,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.77 vs. limit=10.0 2023-11-28 17:23:42,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3598946.6666666665, ans=0.1 2023-11-28 17:23:45,903 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539850 2023-11-28 17:23:51,145 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10800, loss[loss=0.05859, simple_loss=0.08532, pruned_loss=0.00824, audio_tagging_loss=0.00769, over 14469.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.0888, pruned_loss=0.01186, audio_tagging_loss=0.008574, over 3038896.03 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:24:19,459 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 8.985e+01 9.429e+01 1.046e+02 1.643e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 17:24:23,057 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=12.0 2023-11-28 17:24:27,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3599213.3333333335, ans=0.1 2023-11-28 17:24:48,007 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539900 2023-11-28 17:24:52,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3599346.6666666665, ans=0.125 2023-11-28 17:24:53,251 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10850, loss[loss=0.0358, simple_loss=0.04501, pruned_loss=0.003274, audio_tagging_loss=0.01002, over 15926.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08833, pruned_loss=0.01179, audio_tagging_loss=0.008567, over 3044363.26 frames. ], batch size: 61, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:24:56,363 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=15.0 2023-11-28 17:25:03,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3599413.3333333335, ans=0.0 2023-11-28 17:25:20,485 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-28 17:25:23,755 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.22 vs. limit=22.5 2023-11-28 17:25:37,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3599546.6666666665, ans=0.0 2023-11-28 17:25:49,329 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2023-11-28 17:25:49,847 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 539950 2023-11-28 17:25:54,391 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10900, loss[loss=0.07634, simple_loss=0.114, pruned_loss=0.01059, audio_tagging_loss=0.00874, over 15775.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08874, pruned_loss=0.01191, audio_tagging_loss=0.008623, over 3040196.44 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:25:54,433 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:26:17,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3599746.6666666665, ans=0.0 2023-11-28 17:26:23,249 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.592e+01 8.863e+01 9.525e+01 1.011e+02 1.256e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-28 17:26:28,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3599813.3333333335, ans=0.125 2023-11-28 17:26:42,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3599880.0, ans=0.125 2023-11-28 17:26:49,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3599946.6666666665, ans=0.125 2023-11-28 17:26:51,450 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540000 2023-11-28 17:26:58,850 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 10950, loss[loss=0.06983, simple_loss=0.1035, pruned_loss=0.01069, audio_tagging_loss=0.007384, over 14885.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08922, pruned_loss=0.0121, audio_tagging_loss=0.008605, over 3043558.60 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:26:59,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3600013.3333333335, ans=0.0 2023-11-28 17:27:21,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3600080.0, ans=0.025 2023-11-28 17:27:39,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3600213.3333333335, ans=0.125 2023-11-28 17:27:43,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3600213.3333333335, ans=0.125 2023-11-28 17:27:48,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3600280.0, ans=0.125 2023-11-28 17:27:54,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3600280.0, ans=0.07 2023-11-28 17:27:56,518 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540050 2023-11-28 17:28:01,172 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11000, loss[loss=0.06757, simple_loss=0.09291, pruned_loss=0.01369, audio_tagging_loss=0.007426, over 15104.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08916, pruned_loss=0.01207, audio_tagging_loss=0.008594, over 3051832.88 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:28:02,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3600346.6666666665, ans=0.1 2023-11-28 17:28:13,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3600413.3333333335, ans=0.025 2023-11-28 17:28:15,638 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:28:23,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3600413.3333333335, ans=0.0 2023-11-28 17:28:26,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3600480.0, ans=0.5 2023-11-28 17:28:30,163 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.650e+01 8.952e+01 9.649e+01 1.058e+02 1.351e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 17:28:31,833 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2023-11-28 17:28:33,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3600480.0, ans=0.2 2023-11-28 17:28:33,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3600480.0, ans=0.2 2023-11-28 17:28:58,201 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540100 2023-11-28 17:29:02,659 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11050, loss[loss=0.07761, simple_loss=0.1151, pruned_loss=0.01288, audio_tagging_loss=0.007175, over 15028.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08928, pruned_loss=0.01204, audio_tagging_loss=0.008674, over 3046025.98 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:29:03,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3600680.0, ans=0.125 2023-11-28 17:29:08,704 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2023-11-28 17:29:50,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3600946.6666666665, ans=0.125 2023-11-28 17:29:50,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3600946.6666666665, ans=0.125 2023-11-28 17:29:59,781 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540150 2023-11-28 17:30:00,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3600946.6666666665, ans=0.125 2023-11-28 17:30:04,330 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11100, loss[loss=0.0556, simple_loss=0.07847, pruned_loss=0.0078, audio_tagging_loss=0.008565, over 15573.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.0897, pruned_loss=0.01216, audio_tagging_loss=0.008763, over 3049566.38 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:30:28,000 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2023-11-28 17:30:34,437 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.951e+01 8.960e+01 9.547e+01 1.044e+02 1.303e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-28 17:30:37,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2023-11-28 17:30:47,417 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2023-11-28 17:30:48,409 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2023-11-28 17:30:51,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3601213.3333333335, ans=0.2 2023-11-28 17:30:57,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3601280.0, ans=0.125 2023-11-28 17:31:01,476 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540200 2023-11-28 17:31:06,500 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11150, loss[loss=0.03939, simple_loss=0.04786, pruned_loss=0.005119, audio_tagging_loss=0.01034, over 14829.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.08988, pruned_loss=0.01227, audio_tagging_loss=0.008891, over 3047282.28 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:31:06,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3601346.6666666665, ans=0.125 2023-11-28 17:31:12,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3601346.6666666665, ans=0.0 2023-11-28 17:31:13,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3601346.6666666665, ans=0.125 2023-11-28 17:31:21,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3601413.3333333335, ans=0.2 2023-11-28 17:31:25,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3601413.3333333335, ans=0.2 2023-11-28 17:31:35,955 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-28 17:31:44,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3601546.6666666665, ans=0.1 2023-11-28 17:31:58,990 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.62 vs. limit=8.0 2023-11-28 17:32:04,028 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540250 2023-11-28 17:32:04,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3601613.3333333335, ans=0.125 2023-11-28 17:32:08,651 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11200, loss[loss=0.07306, simple_loss=0.1063, pruned_loss=0.01416, audio_tagging_loss=0.005745, over 16267.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.09089, pruned_loss=0.01248, audio_tagging_loss=0.008922, over 3052757.61 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:32:24,826 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:32:26,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3601746.6666666665, ans=0.125 2023-11-28 17:32:28,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-11-28 17:32:38,186 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.987e+01 9.522e+01 1.045e+02 1.448e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 17:32:46,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3601880.0, ans=0.125 2023-11-28 17:32:55,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3601880.0, ans=0.0 2023-11-28 17:33:03,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=12.0 2023-11-28 17:33:05,113 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540300 2023-11-28 17:33:09,750 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11250, loss[loss=0.08647, simple_loss=0.1258, pruned_loss=0.01734, audio_tagging_loss=0.006227, over 16344.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09002, pruned_loss=0.01223, audio_tagging_loss=0.008931, over 3058577.18 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:33:10,555 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=22.5 2023-11-28 17:33:21,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3602080.0, ans=0.125 2023-11-28 17:33:44,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3602146.6666666665, ans=0.2 2023-11-28 17:33:45,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3602146.6666666665, ans=0.0 2023-11-28 17:33:54,411 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.86 vs. limit=10.0 2023-11-28 17:34:05,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3602280.0, ans=0.025 2023-11-28 17:34:07,200 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540350 2023-11-28 17:34:08,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3602280.0, ans=0.125 2023-11-28 17:34:08,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3602280.0, ans=0.125 2023-11-28 17:34:11,698 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11300, loss[loss=0.08348, simple_loss=0.1217, pruned_loss=0.01699, audio_tagging_loss=0.005622, over 15551.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.0899, pruned_loss=0.01221, audio_tagging_loss=0.008821, over 3053270.08 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:34:12,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3602346.6666666665, ans=0.2 2023-11-28 17:34:16,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3602346.6666666665, ans=0.025 2023-11-28 17:34:32,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3602413.3333333335, ans=0.0 2023-11-28 17:34:41,446 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.983e+01 9.542e+01 1.053e+02 1.409e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 17:34:41,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3602480.0, ans=0.0 2023-11-28 17:34:58,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3602546.6666666665, ans=0.0 2023-11-28 17:35:08,567 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540400 2023-11-28 17:35:13,591 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2023-11-28 17:35:14,226 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11350, loss[loss=0.06623, simple_loss=0.09452, pruned_loss=0.01213, audio_tagging_loss=0.006842, over 15203.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.08998, pruned_loss=0.01226, audio_tagging_loss=0.008705, over 3047604.58 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:35:14,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3602680.0, ans=0.0 2023-11-28 17:35:41,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3602813.3333333335, ans=0.1 2023-11-28 17:35:42,671 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:36:11,251 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540450 2023-11-28 17:36:15,767 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11400, loss[loss=0.05706, simple_loss=0.07125, pruned_loss=0.01016, audio_tagging_loss=0.01127, over 15022.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09048, pruned_loss=0.01234, audio_tagging_loss=0.00858, over 3042645.65 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:36:17,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3603013.3333333335, ans=0.125 2023-11-28 17:36:21,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3603013.3333333335, ans=0.05 2023-11-28 17:36:29,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3603080.0, ans=0.5 2023-11-28 17:36:47,210 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 9.040e+01 9.661e+01 1.043e+02 1.391e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 17:36:47,888 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2023-11-28 17:36:51,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3603146.6666666665, ans=0.125 2023-11-28 17:36:54,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3603213.3333333335, ans=0.125 2023-11-28 17:36:54,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3603213.3333333335, ans=0.125 2023-11-28 17:37:12,688 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540500 2023-11-28 17:37:18,026 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11450, loss[loss=0.05582, simple_loss=0.08486, pruned_loss=0.007224, audio_tagging_loss=0.006161, over 15661.00 frames. ], tot_loss[loss=0.06616, simple_loss=0.09064, pruned_loss=0.01236, audio_tagging_loss=0.00848, over 3039666.88 frames. ], batch size: 60, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:37:28,539 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.25 vs. limit=15.0 2023-11-28 17:37:29,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3603413.3333333335, ans=0.125 2023-11-28 17:37:38,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3603413.3333333335, ans=0.05 2023-11-28 17:37:47,461 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=15.0 2023-11-28 17:37:48,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3603480.0, ans=0.0 2023-11-28 17:37:55,490 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2023-11-28 17:37:56,674 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.57 vs. limit=22.5 2023-11-28 17:38:15,181 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540550 2023-11-28 17:38:18,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3603680.0, ans=0.125 2023-11-28 17:38:19,838 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11500, loss[loss=0.0639, simple_loss=0.08951, pruned_loss=0.0114, audio_tagging_loss=0.007742, over 15502.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08957, pruned_loss=0.01211, audio_tagging_loss=0.008537, over 3042800.51 frames. ], batch size: 57, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:38:35,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3603746.6666666665, ans=0.0 2023-11-28 17:38:50,463 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.641e+01 9.267e+01 1.033e+02 1.350e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-28 17:38:54,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3603813.3333333335, ans=0.0 2023-11-28 17:39:17,485 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540600 2023-11-28 17:39:22,433 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11550, loss[loss=0.07173, simple_loss=0.103, pruned_loss=0.01141, audio_tagging_loss=0.008835, over 14917.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08918, pruned_loss=0.01209, audio_tagging_loss=0.008545, over 3049975.80 frames. ], batch size: 56, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:39:36,015 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.27 vs. limit=10.0 2023-11-28 17:39:54,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3604146.6666666665, ans=10.0 2023-11-28 17:40:01,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3604213.3333333335, ans=0.125 2023-11-28 17:40:05,106 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 17:40:05,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3604213.3333333335, ans=0.125 2023-11-28 17:40:07,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3604213.3333333335, ans=0.125 2023-11-28 17:40:18,749 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540650 2023-11-28 17:40:23,063 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11600, loss[loss=0.06232, simple_loss=0.08543, pruned_loss=0.01113, audio_tagging_loss=0.008481, over 14322.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08874, pruned_loss=0.01204, audio_tagging_loss=0.008564, over 3052609.01 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:40:37,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3604413.3333333335, ans=0.125 2023-11-28 17:40:55,428 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.847e+01 9.637e+01 1.017e+02 1.289e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 17:41:09,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3604546.6666666665, ans=0.0 2023-11-28 17:41:16,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3604613.3333333335, ans=0.125 2023-11-28 17:41:21,371 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540700 2023-11-28 17:41:25,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3604680.0, ans=0.2 2023-11-28 17:41:26,708 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11650, loss[loss=0.1009, simple_loss=0.1386, pruned_loss=0.0238, audio_tagging_loss=0.0078, over 15091.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.08966, pruned_loss=0.01214, audio_tagging_loss=0.00862, over 3050746.08 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:41:29,479 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=15.0 2023-11-28 17:41:45,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3604746.6666666665, ans=0.125 2023-11-28 17:41:52,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3604813.3333333335, ans=0.0 2023-11-28 17:41:55,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3604813.3333333335, ans=0.125 2023-11-28 17:42:01,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3604813.3333333335, ans=0.0 2023-11-28 17:42:02,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3604880.0, ans=0.125 2023-11-28 17:42:04,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3604880.0, ans=0.0 2023-11-28 17:42:19,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3604946.6666666665, ans=0.125 2023-11-28 17:42:22,900 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540750 2023-11-28 17:42:28,519 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11700, loss[loss=0.074, simple_loss=0.1085, pruned_loss=0.013, audio_tagging_loss=0.006772, over 14106.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08965, pruned_loss=0.01209, audio_tagging_loss=0.00863, over 3049802.17 frames. ], batch size: 53, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:42:30,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3605013.3333333335, ans=0.015 2023-11-28 17:42:48,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3605080.0, ans=0.2 2023-11-28 17:42:52,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3605146.6666666665, ans=0.0 2023-11-28 17:42:57,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3605146.6666666665, ans=0.125 2023-11-28 17:42:58,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3605146.6666666665, ans=0.125 2023-11-28 17:42:58,966 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.776e+01 9.057e+01 9.679e+01 1.035e+02 1.386e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-28 17:43:24,702 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540800 2023-11-28 17:43:29,699 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11750, loss[loss=0.06416, simple_loss=0.0789, pruned_loss=0.01564, audio_tagging_loss=0.009069, over 15336.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09038, pruned_loss=0.01235, audio_tagging_loss=0.008661, over 3047639.32 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:43:36,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3605346.6666666665, ans=0.1 2023-11-28 17:43:40,387 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.43 vs. limit=15.0 2023-11-28 17:43:43,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3605413.3333333335, ans=0.1 2023-11-28 17:43:49,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3605413.3333333335, ans=0.0 2023-11-28 17:44:03,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3605480.0, ans=0.0 2023-11-28 17:44:10,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3605546.6666666665, ans=0.125 2023-11-28 17:44:12,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3605546.6666666665, ans=0.0 2023-11-28 17:44:27,124 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540850 2023-11-28 17:44:32,156 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11800, loss[loss=0.06988, simple_loss=0.09601, pruned_loss=0.01492, audio_tagging_loss=0.006956, over 14706.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.08976, pruned_loss=0.0123, audio_tagging_loss=0.008681, over 3044956.89 frames. ], batch size: 54, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:44:49,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3605746.6666666665, ans=0.2 2023-11-28 17:44:58,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3605813.3333333335, ans=0.125 2023-11-28 17:45:00,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3605813.3333333335, ans=0.05 2023-11-28 17:45:02,551 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 8.699e+01 9.349e+01 9.967e+01 1.294e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 17:45:10,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3605880.0, ans=0.125 2023-11-28 17:45:28,729 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540900 2023-11-28 17:45:31,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3605946.6666666665, ans=0.0 2023-11-28 17:45:31,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3605946.6666666665, ans=0.1 2023-11-28 17:45:33,904 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11850, loss[loss=0.06393, simple_loss=0.08664, pruned_loss=0.01115, audio_tagging_loss=0.009456, over 15729.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09099, pruned_loss=0.01245, audio_tagging_loss=0.008668, over 3044131.50 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:45:45,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3606080.0, ans=0.125 2023-11-28 17:46:10,423 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.09 vs. limit=22.5 2023-11-28 17:46:18,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3606213.3333333335, ans=0.125 2023-11-28 17:46:18,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3606213.3333333335, ans=0.0 2023-11-28 17:46:23,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3606280.0, ans=0.125 2023-11-28 17:46:26,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3606280.0, ans=0.125 2023-11-28 17:46:30,827 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 540950 2023-11-28 17:46:34,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3606346.6666666665, ans=0.1 2023-11-28 17:46:35,528 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11900, loss[loss=0.06458, simple_loss=0.1013, pruned_loss=0.006972, audio_tagging_loss=0.006974, over 16158.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09093, pruned_loss=0.01238, audio_tagging_loss=0.0087, over 3045796.35 frames. ], batch size: 58, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:47:06,552 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.830e+01 8.976e+01 9.494e+01 1.024e+02 1.214e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 17:47:20,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3606546.6666666665, ans=0.95 2023-11-28 17:47:22,705 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.54 vs. limit=5.0 2023-11-28 17:47:33,197 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541000 2023-11-28 17:47:38,210 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 11950, loss[loss=0.07546, simple_loss=0.1044, pruned_loss=0.01513, audio_tagging_loss=0.008132, over 14706.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09019, pruned_loss=0.01239, audio_tagging_loss=0.008869, over 3041688.01 frames. ], batch size: 55, lr: 1.49e-03, grad_scale: 16.0 2023-11-28 17:47:48,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3606680.0, ans=0.015 2023-11-28 17:47:58,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3606746.6666666665, ans=0.015 2023-11-28 17:47:58,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3606746.6666666665, ans=0.0 2023-11-28 17:48:29,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3606946.6666666665, ans=0.125 2023-11-28 17:48:33,719 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541050 2023-11-28 17:48:38,224 INFO [train_asr.py:1235] (2/4) Epoch 45, batch 12000, loss[loss=0.06118, simple_loss=0.08147, pruned_loss=0.01173, audio_tagging_loss=0.008712, over 15693.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.0898, pruned_loss=0.0123, audio_tagging_loss=0.008916, over 3042093.23 frames. ], batch size: 59, lr: 1.49e-03, grad_scale: 32.0 2023-11-28 17:48:38,225 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 17:48:56,858 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.2197, 4.1072, 5.0855, 4.3745], device='cuda:2') 2023-11-28 17:49:02,139 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1508, 2.4671, 5.0483, 2.9332], device='cuda:2') 2023-11-28 17:49:16,768 INFO [train_asr.py:1267] (2/4) Epoch 45, validation: loss=0.05759, simple_loss=0.05051, pruned_loss=0.005251, audio_tagging_loss=0.02709, over 4681554.00 frames. 2023-11-28 17:49:16,768 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 17:49:16,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3607013.3333333335, ans=0.125 2023-11-28 17:49:18,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3607013.3333333335, ans=0.1 2023-11-28 17:49:27,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3607080.0, ans=0.125 2023-11-28 17:49:34,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3607080.0, ans=0.2 2023-11-28 17:49:36,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3607080.0, ans=0.0 2023-11-28 17:49:37,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3607080.0, ans=0.125 2023-11-28 17:50:04,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3607186.6666666665, ans=0.0 2023-11-28 17:50:04,679 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2023-11-28 17:50:05,867 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 0, loss[loss=0.06812, simple_loss=0.06807, pruned_loss=0.009712, audio_tagging_loss=0.02438, over 15313.00 frames. ], tot_loss[loss=0.06812, simple_loss=0.06807, pruned_loss=0.009712, audio_tagging_loss=0.02438, over 15313.00 frames. ], batch size: 58, lr: 1.48e-03, grad_scale: 32.0 2023-11-28 17:50:05,868 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 17:50:19,512 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.4673, 6.1173, 6.3990, 5.8709], device='cuda:2') 2023-11-28 17:50:20,530 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.6067, 3.0716, 3.2216, 2.7369, 3.5026, 3.4769, 3.5504, 3.3902], device='cuda:2') 2023-11-28 17:50:41,983 INFO [train_asr.py:1267] (2/4) Epoch 46, validation: loss=0.05787, simple_loss=0.05054, pruned_loss=0.005286, audio_tagging_loss=0.02732, over 4681554.00 frames. 2023-11-28 17:50:41,983 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 17:50:42,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3607186.6666666665, ans=0.125 2023-11-28 17:50:43,141 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.886e+01 9.608e+01 1.034e+02 1.479e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 17:50:44,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3607186.6666666665, ans=0.125 2023-11-28 17:50:44,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3607186.6666666665, ans=0.1 2023-11-28 17:51:01,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3607253.3333333335, ans=0.0 2023-11-28 17:51:01,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3607253.3333333335, ans=0.125 2023-11-28 17:51:06,760 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541100 2023-11-28 17:51:17,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3607386.6666666665, ans=0.0 2023-11-28 17:51:18,742 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:51:43,533 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 50, loss[loss=0.0754, simple_loss=0.09732, pruned_loss=0.01171, audio_tagging_loss=0.01504, over 14594.00 frames. ], tot_loss[loss=0.0708, simple_loss=0.08459, pruned_loss=0.01128, audio_tagging_loss=0.01723, over 680352.58 frames. ], batch size: 54, lr: 1.48e-03, grad_scale: 16.0 2023-11-28 17:51:45,294 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2023-11-28 17:51:46,549 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2023-11-28 17:52:07,722 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541150 2023-11-28 17:52:17,672 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=15.0 2023-11-28 17:52:19,521 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 17:52:22,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3607720.0, ans=0.07 2023-11-28 17:52:44,950 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 100, loss[loss=0.05073, simple_loss=0.05311, pruned_loss=0.005317, audio_tagging_loss=0.01886, over 14191.00 frames. ], tot_loss[loss=0.07081, simple_loss=0.08593, pruned_loss=0.01148, audio_tagging_loss=0.01637, over 1202929.26 frames. ], batch size: 56, lr: 1.48e-03, grad_scale: 16.0 2023-11-28 17:52:47,326 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.554e+01 1.000e+02 1.063e+02 1.121e+02 1.597e+02, threshold=2.127e+02, percent-clipped=0.0 2023-11-28 17:53:10,013 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541200 2023-11-28 17:53:10,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.39 vs. limit=15.0 2023-11-28 17:53:35,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=3608120.0, ans=15.0 2023-11-28 17:53:38,744 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=22.5 2023-11-28 17:53:47,623 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 150, loss[loss=0.06659, simple_loss=0.08624, pruned_loss=0.01166, audio_tagging_loss=0.01181, over 14938.00 frames. ], tot_loss[loss=0.06972, simple_loss=0.08684, pruned_loss=0.01175, audio_tagging_loss=0.01455, over 1609429.52 frames. ], batch size: 56, lr: 1.48e-03, grad_scale: 16.0 2023-11-28 17:54:11,760 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541250 2023-11-28 17:54:17,630 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.93 vs. limit=12.0 2023-11-28 17:54:24,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3608386.6666666665, ans=0.125 2023-11-28 17:54:34,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3608386.6666666665, ans=0.09899494936611666 2023-11-28 17:54:34,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3608386.6666666665, ans=0.125 2023-11-28 17:54:42,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3608453.3333333335, ans=0.125 2023-11-28 17:54:49,305 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 200, loss[loss=0.05071, simple_loss=0.06879, pruned_loss=0.007387, audio_tagging_loss=0.008925, over 14352.00 frames. ], tot_loss[loss=0.06892, simple_loss=0.08823, pruned_loss=0.01195, audio_tagging_loss=0.01285, over 1924970.89 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 17:54:51,569 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.671e+01 9.120e+01 9.843e+01 1.065e+02 1.310e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-28 17:55:03,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3608586.6666666665, ans=0.1 2023-11-28 17:55:13,221 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541300 2023-11-28 17:55:17,298 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.86 vs. limit=10.0 2023-11-28 17:55:34,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3608720.0, ans=0.125 2023-11-28 17:55:45,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3608786.6666666665, ans=0.1 2023-11-28 17:55:50,909 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 250, loss[loss=0.08744, simple_loss=0.1217, pruned_loss=0.01837, audio_tagging_loss=0.008198, over 15488.00 frames. ], tot_loss[loss=0.06933, simple_loss=0.09112, pruned_loss=0.01227, audio_tagging_loss=0.0115, over 2186652.31 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 17:56:03,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3608920.0, ans=0.0 2023-11-28 17:56:16,211 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541350 2023-11-28 17:56:53,099 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 300, loss[loss=0.07888, simple_loss=0.1097, pruned_loss=0.01841, audio_tagging_loss=0.005632, over 14520.00 frames. ], tot_loss[loss=0.06838, simple_loss=0.09122, pruned_loss=0.0122, audio_tagging_loss=0.01057, over 2374211.08 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 17:56:55,356 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.170e+01 9.069e+01 9.733e+01 1.020e+02 1.805e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 17:57:06,703 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.90 vs. limit=22.5 2023-11-28 17:57:09,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3609253.3333333335, ans=0.2 2023-11-28 17:57:17,676 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541400 2023-11-28 17:57:22,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3609320.0, ans=0.125 2023-11-28 17:57:22,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3609320.0, ans=0.0 2023-11-28 17:57:31,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3609386.6666666665, ans=0.125 2023-11-28 17:57:34,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3609386.6666666665, ans=0.125 2023-11-28 17:57:37,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3609386.6666666665, ans=0.125 2023-11-28 17:57:38,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3609386.6666666665, ans=0.1 2023-11-28 17:57:40,111 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2023-11-28 17:57:55,485 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 350, loss[loss=0.04388, simple_loss=0.05709, pruned_loss=0.005866, audio_tagging_loss=0.009468, over 14488.00 frames. ], tot_loss[loss=0.06772, simple_loss=0.09066, pruned_loss=0.01232, audio_tagging_loss=0.01008, over 2521636.49 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 17:58:06,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3609586.6666666665, ans=0.125 2023-11-28 17:58:07,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3609586.6666666665, ans=0.1 2023-11-28 17:58:19,612 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541450 2023-11-28 17:58:19,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3609653.3333333335, ans=0.1 2023-11-28 17:58:22,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3609653.3333333335, ans=0.125 2023-11-28 17:58:24,591 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=22.5 2023-11-28 17:58:47,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3609786.6666666665, ans=0.125 2023-11-28 17:58:57,261 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 400, loss[loss=0.08401, simple_loss=0.1254, pruned_loss=0.01496, audio_tagging_loss=0.006344, over 16734.00 frames. ], tot_loss[loss=0.06769, simple_loss=0.09126, pruned_loss=0.0125, audio_tagging_loss=0.009566, over 2639802.55 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 17:58:59,619 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.699e+01 9.057e+01 9.604e+01 1.022e+02 1.428e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 17:59:02,432 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2023-11-28 17:59:03,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3609853.3333333335, ans=0.125 2023-11-28 17:59:05,097 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2023-11-28 17:59:06,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3609853.3333333335, ans=0.125 2023-11-28 17:59:15,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3609920.0, ans=0.1 2023-11-28 17:59:21,504 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541500 2023-11-28 17:59:28,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3609986.6666666665, ans=0.125 2023-11-28 17:59:50,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3610120.0, ans=0.125 2023-11-28 17:59:52,730 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.47 vs. limit=15.0 2023-11-28 17:59:58,038 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 450, loss[loss=0.07042, simple_loss=0.1051, pruned_loss=0.01075, audio_tagging_loss=0.007134, over 15492.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.0905, pruned_loss=0.01219, audio_tagging_loss=0.009341, over 2736381.17 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:00:23,706 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541550 2023-11-28 18:00:59,953 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:01:00,950 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 500, loss[loss=0.06415, simple_loss=0.08697, pruned_loss=0.01189, audio_tagging_loss=0.008778, over 14885.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08997, pruned_loss=0.01205, audio_tagging_loss=0.009208, over 2809960.85 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:01:04,991 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.722e+01 9.408e+01 1.020e+02 1.286e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-28 18:01:25,486 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541600 2023-11-28 18:01:58,870 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.40 vs. limit=15.0 2023-11-28 18:02:02,613 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 550, loss[loss=0.06168, simple_loss=0.07486, pruned_loss=0.01289, audio_tagging_loss=0.01136, over 15322.00 frames. ], tot_loss[loss=0.066, simple_loss=0.08974, pruned_loss=0.012, audio_tagging_loss=0.009129, over 2861831.26 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:02:05,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3610853.3333333335, ans=0.2 2023-11-28 18:02:13,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3610920.0, ans=0.0 2023-11-28 18:02:16,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3610920.0, ans=0.125 2023-11-28 18:02:27,398 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541650 2023-11-28 18:02:36,727 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=22.5 2023-11-28 18:02:38,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3611053.3333333335, ans=0.0 2023-11-28 18:02:40,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3611053.3333333335, ans=0.125 2023-11-28 18:02:41,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3611053.3333333335, ans=0.0 2023-11-28 18:02:46,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3611053.3333333335, ans=0.125 2023-11-28 18:02:59,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3611120.0, ans=0.0 2023-11-28 18:03:04,254 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 600, loss[loss=0.06749, simple_loss=0.09588, pruned_loss=0.01155, audio_tagging_loss=0.007998, over 15881.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.08998, pruned_loss=0.01211, audio_tagging_loss=0.008988, over 2898415.84 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:03:04,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3611186.6666666665, ans=0.1 2023-11-28 18:03:06,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3611186.6666666665, ans=0.125 2023-11-28 18:03:07,703 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 9.115e+01 9.737e+01 1.046e+02 1.247e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 18:03:29,338 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541700 2023-11-28 18:03:38,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3611320.0, ans=0.125 2023-11-28 18:03:58,170 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=15.0 2023-11-28 18:04:05,980 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 650, loss[loss=0.08506, simple_loss=0.1179, pruned_loss=0.02023, audio_tagging_loss=0.005893, over 14358.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08891, pruned_loss=0.01192, audio_tagging_loss=0.008938, over 2930887.70 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:04:18,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3611586.6666666665, ans=0.2 2023-11-28 18:04:31,648 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541750 2023-11-28 18:04:33,269 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.08 vs. limit=6.0 2023-11-28 18:04:40,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3611653.3333333335, ans=0.2 2023-11-28 18:05:08,077 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 700, loss[loss=0.07532, simple_loss=0.109, pruned_loss=0.01258, audio_tagging_loss=0.008251, over 14334.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.08952, pruned_loss=0.01204, audio_tagging_loss=0.008891, over 2960012.75 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:05:09,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3611853.3333333335, ans=0.125 2023-11-28 18:05:12,356 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.880e+01 8.893e+01 9.585e+01 1.037e+02 1.398e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 18:05:26,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3611920.0, ans=0.0 2023-11-28 18:05:33,484 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541800 2023-11-28 18:05:41,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3611986.6666666665, ans=0.0 2023-11-28 18:06:06,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3612120.0, ans=0.125 2023-11-28 18:06:07,626 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.04 vs. limit=22.5 2023-11-28 18:06:10,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3612186.6666666665, ans=0.125 2023-11-28 18:06:11,791 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 750, loss[loss=0.05595, simple_loss=0.07404, pruned_loss=0.009051, audio_tagging_loss=0.009879, over 15568.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.08979, pruned_loss=0.01207, audio_tagging_loss=0.008945, over 2978020.38 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:06:22,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3612186.6666666665, ans=0.2 2023-11-28 18:06:33,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3612253.3333333335, ans=0.125 2023-11-28 18:06:35,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3612320.0, ans=0.125 2023-11-28 18:06:36,835 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541850 2023-11-28 18:06:38,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3612320.0, ans=0.0 2023-11-28 18:06:41,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3612320.0, ans=0.125 2023-11-28 18:06:55,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3612386.6666666665, ans=0.125 2023-11-28 18:06:58,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3612386.6666666665, ans=0.0 2023-11-28 18:06:59,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3612386.6666666665, ans=0.1 2023-11-28 18:07:04,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3612453.3333333335, ans=0.07 2023-11-28 18:07:14,127 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 800, loss[loss=0.07041, simple_loss=0.1044, pruned_loss=0.01213, audio_tagging_loss=0.006063, over 16118.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.08996, pruned_loss=0.01217, audio_tagging_loss=0.008874, over 2999694.93 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:07:15,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3612520.0, ans=0.1 2023-11-28 18:07:17,632 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 9.017e+01 9.748e+01 1.044e+02 1.462e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-28 18:07:24,407 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:07:39,734 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541900 2023-11-28 18:07:40,078 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=15.0 2023-11-28 18:07:43,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3612653.3333333335, ans=0.1 2023-11-28 18:07:44,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3612653.3333333335, ans=0.1 2023-11-28 18:07:52,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3612720.0, ans=0.125 2023-11-28 18:07:58,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3612720.0, ans=0.125 2023-11-28 18:08:07,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.42 vs. limit=15.0 2023-11-28 18:08:08,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3612786.6666666665, ans=0.125 2023-11-28 18:08:16,474 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 850, loss[loss=0.0731, simple_loss=0.1037, pruned_loss=0.01313, audio_tagging_loss=0.008125, over 15587.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08952, pruned_loss=0.01201, audio_tagging_loss=0.008917, over 3008594.78 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:08:41,248 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 541950 2023-11-28 18:09:03,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3613053.3333333335, ans=0.1 2023-11-28 18:09:18,556 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 900, loss[loss=0.07956, simple_loss=0.111, pruned_loss=0.01804, audio_tagging_loss=0.00603, over 15410.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08902, pruned_loss=0.01192, audio_tagging_loss=0.008989, over 3020350.19 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:09:18,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3613186.6666666665, ans=0.09899494936611666 2023-11-28 18:09:24,238 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 8.864e+01 9.446e+01 1.016e+02 1.435e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 18:09:26,112 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2023-11-28 18:09:38,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3613253.3333333335, ans=0.015 2023-11-28 18:09:43,176 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542000 2023-11-28 18:09:44,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3613320.0, ans=0.2 2023-11-28 18:10:07,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3613453.3333333335, ans=0.125 2023-11-28 18:10:14,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3613453.3333333335, ans=0.2 2023-11-28 18:10:18,973 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.36 vs. limit=22.5 2023-11-28 18:10:20,691 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 950, loss[loss=0.07249, simple_loss=0.09455, pruned_loss=0.01669, audio_tagging_loss=0.008521, over 14220.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08897, pruned_loss=0.01193, audio_tagging_loss=0.008947, over 3021101.11 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:10:45,402 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542050 2023-11-28 18:11:01,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3613720.0, ans=0.2 2023-11-28 18:11:06,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3613720.0, ans=0.0 2023-11-28 18:11:09,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-11-28 18:11:16,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3613786.6666666665, ans=0.125 2023-11-28 18:11:21,862 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1000, loss[loss=0.0597, simple_loss=0.08184, pruned_loss=0.0102, audio_tagging_loss=0.008587, over 15817.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08942, pruned_loss=0.01193, audio_tagging_loss=0.00875, over 3026338.86 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:11:27,663 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 8.919e+01 9.596e+01 1.036e+02 1.232e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-28 18:11:30,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3613853.3333333335, ans=0.1 2023-11-28 18:11:32,627 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.11 vs. limit=15.0 2023-11-28 18:11:40,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=15.0 2023-11-28 18:11:46,606 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542100 2023-11-28 18:11:50,858 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:11:55,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3613986.6666666665, ans=0.0 2023-11-28 18:11:55,926 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:12:08,813 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=22.5 2023-11-28 18:12:14,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3614120.0, ans=0.125 2023-11-28 18:12:24,615 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1050, loss[loss=0.07251, simple_loss=0.1012, pruned_loss=0.01236, audio_tagging_loss=0.009553, over 14944.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08966, pruned_loss=0.01214, audio_tagging_loss=0.0087, over 3026264.04 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:12:31,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3614186.6666666665, ans=0.2 2023-11-28 18:12:39,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3614253.3333333335, ans=0.5 2023-11-28 18:12:40,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3614253.3333333335, ans=0.125 2023-11-28 18:12:49,404 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542150 2023-11-28 18:12:57,068 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.32 vs. limit=12.0 2023-11-28 18:13:14,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3614453.3333333335, ans=0.125 2023-11-28 18:13:22,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3614453.3333333335, ans=0.2 2023-11-28 18:13:24,388 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2023-11-28 18:13:26,605 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1100, loss[loss=0.06024, simple_loss=0.08522, pruned_loss=0.01254, audio_tagging_loss=0.005096, over 14508.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08988, pruned_loss=0.01221, audio_tagging_loss=0.008572, over 3037129.18 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:13:29,544 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2023-11-28 18:13:29,553 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.62 vs. limit=10.0 2023-11-28 18:13:31,289 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:13:32,349 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 8.948e+01 9.564e+01 1.065e+02 1.707e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-28 18:13:33,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3614520.0, ans=0.125 2023-11-28 18:13:44,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3614586.6666666665, ans=0.125 2023-11-28 18:13:47,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3614586.6666666665, ans=0.1 2023-11-28 18:13:50,845 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542200 2023-11-28 18:13:50,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3614653.3333333335, ans=0.0 2023-11-28 18:14:07,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3614720.0, ans=0.07 2023-11-28 18:14:09,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3614720.0, ans=0.125 2023-11-28 18:14:26,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3614786.6666666665, ans=0.125 2023-11-28 18:14:28,994 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1150, loss[loss=0.06381, simple_loss=0.08145, pruned_loss=0.01529, audio_tagging_loss=0.007797, over 16297.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08979, pruned_loss=0.01221, audio_tagging_loss=0.008593, over 3041742.83 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:14:36,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3614853.3333333335, ans=0.125 2023-11-28 18:14:37,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3614853.3333333335, ans=0.1 2023-11-28 18:14:53,884 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542250 2023-11-28 18:14:59,221 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2023-11-28 18:15:10,792 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:15:15,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3615053.3333333335, ans=0.0 2023-11-28 18:15:23,829 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.21 vs. limit=22.5 2023-11-28 18:15:31,080 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1200, loss[loss=0.0469, simple_loss=0.05999, pruned_loss=0.006456, audio_tagging_loss=0.01045, over 16236.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08933, pruned_loss=0.01218, audio_tagging_loss=0.008578, over 3044478.63 frames. ], batch size: 64, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:15:35,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3615186.6666666665, ans=0.1 2023-11-28 18:15:36,979 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.628e+01 8.860e+01 9.476e+01 1.010e+02 2.147e+02, threshold=1.895e+02, percent-clipped=1.0 2023-11-28 18:15:55,708 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542300 2023-11-28 18:16:07,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3615386.6666666665, ans=0.2 2023-11-28 18:16:33,379 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1250, loss[loss=0.07384, simple_loss=0.09878, pruned_loss=0.01761, audio_tagging_loss=0.006841, over 15990.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.0899, pruned_loss=0.01217, audio_tagging_loss=0.008551, over 3041825.43 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:16:39,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3615520.0, ans=0.0 2023-11-28 18:16:57,673 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542350 2023-11-28 18:17:05,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3615653.3333333335, ans=0.125 2023-11-28 18:17:13,665 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=15.0 2023-11-28 18:17:19,049 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=12.0 2023-11-28 18:17:20,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3615720.0, ans=0.125 2023-11-28 18:17:30,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3615786.6666666665, ans=0.125 2023-11-28 18:17:35,333 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1300, loss[loss=0.06507, simple_loss=0.09466, pruned_loss=0.009174, audio_tagging_loss=0.008563, over 14770.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09043, pruned_loss=0.0123, audio_tagging_loss=0.008563, over 3038278.39 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:17:35,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3615853.3333333335, ans=0.125 2023-11-28 18:17:41,173 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.626e+01 8.784e+01 9.305e+01 1.002e+02 1.226e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-28 18:17:41,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3615853.3333333335, ans=10.0 2023-11-28 18:17:55,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3615920.0, ans=0.1 2023-11-28 18:17:59,140 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542400 2023-11-28 18:17:59,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3615986.6666666665, ans=0.125 2023-11-28 18:18:00,480 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:18:02,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3615986.6666666665, ans=0.2 2023-11-28 18:18:02,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3615986.6666666665, ans=0.0 2023-11-28 18:18:09,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3615986.6666666665, ans=0.125 2023-11-28 18:18:20,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3616053.3333333335, ans=0.125 2023-11-28 18:18:29,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.05 vs. limit=8.0 2023-11-28 18:18:37,339 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1350, loss[loss=0.07037, simple_loss=0.09949, pruned_loss=0.01089, audio_tagging_loss=0.00974, over 14836.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09048, pruned_loss=0.01222, audio_tagging_loss=0.008598, over 3037759.77 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:19:02,222 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542450 2023-11-28 18:19:05,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3616320.0, ans=0.125 2023-11-28 18:19:23,265 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:19:31,779 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=22.5 2023-11-28 18:19:36,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3616453.3333333335, ans=0.0 2023-11-28 18:19:38,410 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1400, loss[loss=0.06258, simple_loss=0.08082, pruned_loss=0.01197, audio_tagging_loss=0.0102, over 16630.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.09002, pruned_loss=0.01228, audio_tagging_loss=0.008638, over 3045437.19 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:19:38,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3616520.0, ans=0.0 2023-11-28 18:19:39,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3616520.0, ans=0.125 2023-11-28 18:19:45,122 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.015e+01 9.002e+01 9.471e+01 1.001e+02 1.235e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 18:20:03,121 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.86 vs. limit=22.5 2023-11-28 18:20:03,656 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542500 2023-11-28 18:20:19,980 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.92 vs. limit=22.5 2023-11-28 18:20:25,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3616720.0, ans=10.0 2023-11-28 18:20:29,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3616786.6666666665, ans=0.1 2023-11-28 18:20:39,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3616853.3333333335, ans=0.1 2023-11-28 18:20:40,612 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1450, loss[loss=0.07893, simple_loss=0.1108, pruned_loss=0.0145, audio_tagging_loss=0.009023, over 16353.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09027, pruned_loss=0.01228, audio_tagging_loss=0.008656, over 3046235.32 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:20:43,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3616853.3333333335, ans=0.125 2023-11-28 18:20:54,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3616920.0, ans=0.0 2023-11-28 18:21:05,399 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542550 2023-11-28 18:21:11,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3616986.6666666665, ans=0.04949747468305833 2023-11-28 18:21:40,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3617120.0, ans=0.125 2023-11-28 18:21:42,930 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1500, loss[loss=0.03649, simple_loss=0.04799, pruned_loss=0.002389, audio_tagging_loss=0.0101, over 13708.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08929, pruned_loss=0.01201, audio_tagging_loss=0.008756, over 3041422.43 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:21:50,638 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 9.243e+01 1.008e+02 1.066e+02 1.395e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-28 18:22:07,883 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542600 2023-11-28 18:22:37,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3617453.3333333335, ans=0.04949747468305833 2023-11-28 18:22:45,108 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1550, loss[loss=0.06607, simple_loss=0.0875, pruned_loss=0.01295, audio_tagging_loss=0.009367, over 15365.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09016, pruned_loss=0.01203, audio_tagging_loss=0.008786, over 3042813.80 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 18:22:46,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3617520.0, ans=0.1 2023-11-28 18:22:59,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3617586.6666666665, ans=0.125 2023-11-28 18:23:06,085 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.30 vs. limit=10.0 2023-11-28 18:23:10,259 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542650 2023-11-28 18:23:12,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3617653.3333333335, ans=0.125 2023-11-28 18:23:17,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3617653.3333333335, ans=15.0 2023-11-28 18:23:38,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3617786.6666666665, ans=0.125 2023-11-28 18:23:47,206 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1600, loss[loss=0.09695, simple_loss=0.128, pruned_loss=0.02544, audio_tagging_loss=0.007499, over 15276.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08944, pruned_loss=0.01198, audio_tagging_loss=0.008903, over 3043512.69 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:23:54,776 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.889e+01 9.133e+01 9.762e+01 1.043e+02 1.262e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-28 18:24:08,987 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=12.0 2023-11-28 18:24:11,805 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542700 2023-11-28 18:24:20,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3617986.6666666665, ans=0.125 2023-11-28 18:24:26,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3618053.3333333335, ans=0.125 2023-11-28 18:24:38,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3618120.0, ans=0.0 2023-11-28 18:24:48,498 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1650, loss[loss=0.05846, simple_loss=0.08245, pruned_loss=0.0096, audio_tagging_loss=0.007635, over 14719.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08987, pruned_loss=0.01207, audio_tagging_loss=0.008852, over 3046024.38 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:24:54,716 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.97 vs. limit=15.0 2023-11-28 18:25:05,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3618253.3333333335, ans=0.125 2023-11-28 18:25:13,467 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542750 2023-11-28 18:25:17,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3618320.0, ans=0.2 2023-11-28 18:25:20,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3618320.0, ans=0.125 2023-11-28 18:25:29,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3618386.6666666665, ans=22.5 2023-11-28 18:25:33,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3618386.6666666665, ans=0.1 2023-11-28 18:25:38,220 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2023-11-28 18:25:49,849 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1700, loss[loss=0.07372, simple_loss=0.1071, pruned_loss=0.01578, audio_tagging_loss=0.00439, over 15278.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08949, pruned_loss=0.01202, audio_tagging_loss=0.008897, over 3044368.18 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:25:57,432 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.816e+01 8.880e+01 9.352e+01 1.002e+02 1.354e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-28 18:26:15,584 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542800 2023-11-28 18:26:20,658 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:26:23,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3618653.3333333335, ans=0.0 2023-11-28 18:26:27,331 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:26:48,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3618786.6666666665, ans=0.0 2023-11-28 18:26:52,312 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1750, loss[loss=0.06576, simple_loss=0.0837, pruned_loss=0.01266, audio_tagging_loss=0.01124, over 15027.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.08975, pruned_loss=0.01213, audio_tagging_loss=0.008874, over 3042581.68 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:26:52,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3618853.3333333335, ans=0.125 2023-11-28 18:27:09,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3618920.0, ans=0.125 2023-11-28 18:27:11,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3618920.0, ans=0.015 2023-11-28 18:27:17,727 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542850 2023-11-28 18:27:32,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3619053.3333333335, ans=0.125 2023-11-28 18:27:35,309 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=15.0 2023-11-28 18:27:35,546 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.61 vs. limit=22.5 2023-11-28 18:27:41,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2023-11-28 18:27:54,814 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1800, loss[loss=0.05727, simple_loss=0.08248, pruned_loss=0.01051, audio_tagging_loss=0.005518, over 14739.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08966, pruned_loss=0.01201, audio_tagging_loss=0.008678, over 3045621.67 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:28:02,578 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 8.817e+01 9.553e+01 1.013e+02 1.527e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 18:28:07,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3619253.3333333335, ans=0.125 2023-11-28 18:28:10,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3619253.3333333335, ans=10.0 2023-11-28 18:28:14,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3619253.3333333335, ans=0.125 2023-11-28 18:28:19,571 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542900 2023-11-28 18:28:30,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3619386.6666666665, ans=0.1 2023-11-28 18:28:44,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3619453.3333333335, ans=0.125 2023-11-28 18:28:56,465 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1850, loss[loss=0.06827, simple_loss=0.0964, pruned_loss=0.01295, audio_tagging_loss=0.007119, over 14943.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08969, pruned_loss=0.01206, audio_tagging_loss=0.008637, over 3046169.97 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:28:56,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3619520.0, ans=0.125 2023-11-28 18:29:07,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3619586.6666666665, ans=0.1 2023-11-28 18:29:14,209 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.19 vs. limit=10.0 2023-11-28 18:29:20,985 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 542950 2023-11-28 18:29:21,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3619653.3333333335, ans=0.125 2023-11-28 18:29:34,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3619720.0, ans=0.125 2023-11-28 18:29:47,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=3619786.6666666665, ans=12.0 2023-11-28 18:29:58,050 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1900, loss[loss=0.06522, simple_loss=0.08722, pruned_loss=0.01294, audio_tagging_loss=0.008672, over 16412.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09027, pruned_loss=0.01228, audio_tagging_loss=0.008564, over 3052213.64 frames. ], batch size: 63, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:29:58,883 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=22.5 2023-11-28 18:30:04,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3619853.3333333335, ans=0.125 2023-11-28 18:30:06,343 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.846e+01 9.695e+01 1.030e+02 1.290e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 18:30:12,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3619920.0, ans=0.09899494936611666 2023-11-28 18:30:24,913 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543000 2023-11-28 18:30:40,037 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.19 vs. limit=10.0 2023-11-28 18:30:46,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3620053.3333333335, ans=0.1 2023-11-28 18:30:48,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3620120.0, ans=10.0 2023-11-28 18:30:55,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3620120.0, ans=0.125 2023-11-28 18:31:02,012 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 1950, loss[loss=0.05956, simple_loss=0.08403, pruned_loss=0.009993, audio_tagging_loss=0.007551, over 15785.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08911, pruned_loss=0.01215, audio_tagging_loss=0.008653, over 3039990.70 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:31:02,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3620186.6666666665, ans=0.025 2023-11-28 18:31:07,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3620186.6666666665, ans=0.1 2023-11-28 18:31:11,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3620186.6666666665, ans=0.1 2023-11-28 18:31:27,041 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543050 2023-11-28 18:31:41,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3620386.6666666665, ans=0.1 2023-11-28 18:31:42,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3620386.6666666665, ans=0.1 2023-11-28 18:32:05,226 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2000, loss[loss=0.05834, simple_loss=0.08271, pruned_loss=0.008139, audio_tagging_loss=0.008847, over 15405.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08898, pruned_loss=0.01217, audio_tagging_loss=0.008607, over 3039965.89 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:32:12,237 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 8.843e+01 9.517e+01 1.017e+02 1.675e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 18:32:23,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3620586.6666666665, ans=0.07 2023-11-28 18:32:29,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2023-11-28 18:32:30,315 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543100 2023-11-28 18:33:07,958 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2050, loss[loss=0.06713, simple_loss=0.08675, pruned_loss=0.01203, audio_tagging_loss=0.01172, over 14838.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08931, pruned_loss=0.01217, audio_tagging_loss=0.008584, over 3042944.04 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:33:16,279 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:33:18,593 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.51 vs. limit=6.0 2023-11-28 18:33:21,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3620920.0, ans=0.0 2023-11-28 18:33:32,703 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543150 2023-11-28 18:33:33,045 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.63 vs. limit=15.0 2023-11-28 18:33:42,050 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-11-28 18:33:55,539 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=12.0 2023-11-28 18:33:56,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3621120.0, ans=0.0 2023-11-28 18:34:09,465 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2100, loss[loss=0.07602, simple_loss=0.1057, pruned_loss=0.01592, audio_tagging_loss=0.007253, over 15519.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08942, pruned_loss=0.01218, audio_tagging_loss=0.008499, over 3048144.04 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:34:17,668 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.878e+01 9.444e+01 1.002e+02 1.258e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 18:34:22,858 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.28 vs. limit=10.0 2023-11-28 18:34:27,496 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2023-11-28 18:34:34,162 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543200 2023-11-28 18:34:37,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3621320.0, ans=0.125 2023-11-28 18:34:47,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3621386.6666666665, ans=0.1 2023-11-28 18:34:49,727 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-11-28 18:34:51,788 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:34:58,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3621453.3333333335, ans=0.125 2023-11-28 18:35:12,355 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2150, loss[loss=0.06244, simple_loss=0.0743, pruned_loss=0.01481, audio_tagging_loss=0.01049, over 15327.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08923, pruned_loss=0.01205, audio_tagging_loss=0.008492, over 3047300.42 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:35:18,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3621520.0, ans=0.05 2023-11-28 18:35:28,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3621586.6666666665, ans=0.0 2023-11-28 18:35:32,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3621586.6666666665, ans=0.0 2023-11-28 18:35:35,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3621653.3333333335, ans=0.2 2023-11-28 18:35:36,805 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543250 2023-11-28 18:35:46,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3621653.3333333335, ans=0.0 2023-11-28 18:35:50,175 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:35:55,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3621720.0, ans=0.125 2023-11-28 18:36:10,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3621786.6666666665, ans=0.125 2023-11-28 18:36:14,605 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2200, loss[loss=0.07898, simple_loss=0.1113, pruned_loss=0.01543, audio_tagging_loss=0.00788, over 14947.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09043, pruned_loss=0.01216, audio_tagging_loss=0.008485, over 3049322.73 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:36:16,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3621853.3333333335, ans=0.0 2023-11-28 18:36:22,865 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.392e+01 9.070e+01 9.676e+01 1.027e+02 1.399e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 18:36:38,703 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543300 2023-11-28 18:36:45,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3621986.6666666665, ans=0.1 2023-11-28 18:36:46,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3621986.6666666665, ans=0.125 2023-11-28 18:36:56,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3622053.3333333335, ans=0.125 2023-11-28 18:37:02,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3622053.3333333335, ans=0.0 2023-11-28 18:37:03,818 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2023-11-28 18:37:12,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3622120.0, ans=0.0 2023-11-28 18:37:16,416 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2250, loss[loss=0.0544, simple_loss=0.07247, pruned_loss=0.008808, audio_tagging_loss=0.009351, over 14899.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08998, pruned_loss=0.01217, audio_tagging_loss=0.00857, over 3040978.31 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:37:25,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3622186.6666666665, ans=0.2 2023-11-28 18:37:26,365 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=15.0 2023-11-28 18:37:28,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3622253.3333333335, ans=0.125 2023-11-28 18:37:31,376 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:37:41,262 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543350 2023-11-28 18:37:42,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3622320.0, ans=0.125 2023-11-28 18:38:06,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3622453.3333333335, ans=0.1 2023-11-28 18:38:17,959 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2300, loss[loss=0.07226, simple_loss=0.09792, pruned_loss=0.01266, audio_tagging_loss=0.01065, over 14110.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.0893, pruned_loss=0.01209, audio_tagging_loss=0.008654, over 3042411.07 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:38:20,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3622520.0, ans=0.0 2023-11-28 18:38:26,658 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.756e+01 9.268e+01 1.034e+02 1.497e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-28 18:38:30,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3622586.6666666665, ans=0.125 2023-11-28 18:38:31,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3622586.6666666665, ans=0.125 2023-11-28 18:38:38,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3622586.6666666665, ans=0.125 2023-11-28 18:38:42,551 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543400 2023-11-28 18:38:57,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3622720.0, ans=0.1 2023-11-28 18:39:04,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3622720.0, ans=0.125 2023-11-28 18:39:14,311 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:39:20,158 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2350, loss[loss=0.06786, simple_loss=0.08442, pruned_loss=0.01553, audio_tagging_loss=0.01012, over 15076.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09062, pruned_loss=0.01242, audio_tagging_loss=0.008714, over 3043271.32 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:39:23,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3622853.3333333335, ans=0.125 2023-11-28 18:39:27,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3622853.3333333335, ans=0.125 2023-11-28 18:39:33,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3622920.0, ans=0.1 2023-11-28 18:39:45,253 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543450 2023-11-28 18:39:52,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3622986.6666666665, ans=0.1 2023-11-28 18:40:04,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3623053.3333333335, ans=0.1 2023-11-28 18:40:22,005 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2400, loss[loss=0.07352, simple_loss=0.1086, pruned_loss=0.01023, audio_tagging_loss=0.008985, over 15170.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.09172, pruned_loss=0.01259, audio_tagging_loss=0.008707, over 3042375.75 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:40:30,762 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 8.938e+01 9.455e+01 1.032e+02 1.610e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-28 18:40:32,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3623186.6666666665, ans=0.125 2023-11-28 18:40:45,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3623320.0, ans=0.1 2023-11-28 18:40:46,759 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543500 2023-11-28 18:41:06,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3623386.6666666665, ans=0.125 2023-11-28 18:41:07,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2023-11-28 18:41:21,648 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.55 vs. limit=15.0 2023-11-28 18:41:23,591 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2450, loss[loss=0.04152, simple_loss=0.05161, pruned_loss=0.005218, audio_tagging_loss=0.0105, over 15625.00 frames. ], tot_loss[loss=0.06662, simple_loss=0.09104, pruned_loss=0.01236, audio_tagging_loss=0.008741, over 3039784.05 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:41:26,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3623520.0, ans=0.125 2023-11-28 18:41:35,348 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.24 vs. limit=22.5 2023-11-28 18:41:39,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3623586.6666666665, ans=0.0 2023-11-28 18:41:39,576 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=15.0 2023-11-28 18:41:40,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3623586.6666666665, ans=0.0 2023-11-28 18:41:40,642 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2023-11-28 18:41:43,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3623586.6666666665, ans=0.125 2023-11-28 18:41:49,397 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543550 2023-11-28 18:41:55,900 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.43 vs. limit=12.0 2023-11-28 18:42:12,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3623786.6666666665, ans=0.125 2023-11-28 18:42:25,820 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2500, loss[loss=0.071, simple_loss=0.09923, pruned_loss=0.01316, audio_tagging_loss=0.008225, over 15325.00 frames. ], tot_loss[loss=0.06678, simple_loss=0.09095, pruned_loss=0.01243, audio_tagging_loss=0.008872, over 3042500.56 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:42:35,309 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.618e+01 8.803e+01 9.255e+01 1.000e+02 1.311e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-28 18:42:51,444 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543600 2023-11-28 18:42:55,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3623986.6666666665, ans=0.125 2023-11-28 18:43:28,619 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2550, loss[loss=0.05586, simple_loss=0.0733, pruned_loss=0.009406, audio_tagging_loss=0.009806, over 14949.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09012, pruned_loss=0.01221, audio_tagging_loss=0.008758, over 3043878.01 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:43:31,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3624186.6666666665, ans=0.125 2023-11-28 18:43:53,707 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543650 2023-11-28 18:43:53,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3624320.0, ans=0.125 2023-11-28 18:43:59,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3624320.0, ans=0.0 2023-11-28 18:44:13,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3624386.6666666665, ans=0.015 2023-11-28 18:44:17,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3624453.3333333335, ans=0.125 2023-11-28 18:44:20,531 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-28 18:44:24,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3624453.3333333335, ans=0.2 2023-11-28 18:44:28,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3624453.3333333335, ans=0.0 2023-11-28 18:44:30,714 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2600, loss[loss=0.0656, simple_loss=0.08895, pruned_loss=0.01381, audio_tagging_loss=0.007315, over 15310.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.09018, pruned_loss=0.01214, audio_tagging_loss=0.008663, over 3044849.44 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:44:37,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3624520.0, ans=0.035 2023-11-28 18:44:39,475 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.728e+01 8.738e+01 9.385e+01 1.004e+02 1.373e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 18:44:56,230 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543700 2023-11-28 18:44:56,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3624653.3333333335, ans=0.2 2023-11-28 18:45:03,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3624653.3333333335, ans=0.0 2023-11-28 18:45:19,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3624786.6666666665, ans=0.0 2023-11-28 18:45:24,214 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2023-11-28 18:45:32,088 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2023-11-28 18:45:32,155 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=15.0 2023-11-28 18:45:32,752 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2650, loss[loss=0.06539, simple_loss=0.0924, pruned_loss=0.0118, audio_tagging_loss=0.007384, over 16186.00 frames. ], tot_loss[loss=0.06613, simple_loss=0.09052, pruned_loss=0.01222, audio_tagging_loss=0.008643, over 3046092.50 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:45:33,471 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2023-11-28 18:45:58,422 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543750 2023-11-28 18:46:22,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3625120.0, ans=0.125 2023-11-28 18:46:35,468 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2700, loss[loss=0.05261, simple_loss=0.07017, pruned_loss=0.009248, audio_tagging_loss=0.008275, over 14754.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08981, pruned_loss=0.01204, audio_tagging_loss=0.008541, over 3046916.13 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:46:36,329 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.09 vs. limit=10.0 2023-11-28 18:46:44,264 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 9.009e+01 9.559e+01 1.022e+02 1.303e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 18:46:51,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3625253.3333333335, ans=0.125 2023-11-28 18:46:58,044 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:47:00,353 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543800 2023-11-28 18:47:19,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3625386.6666666665, ans=0.2 2023-11-28 18:47:28,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3625453.3333333335, ans=0.2 2023-11-28 18:47:33,836 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.38 vs. limit=15.0 2023-11-28 18:47:37,924 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2750, loss[loss=0.05936, simple_loss=0.08514, pruned_loss=0.009509, audio_tagging_loss=0.007286, over 14896.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08938, pruned_loss=0.0121, audio_tagging_loss=0.008517, over 3046004.12 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:47:43,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3625520.0, ans=0.0 2023-11-28 18:47:50,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3625586.6666666665, ans=0.0 2023-11-28 18:48:02,716 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543850 2023-11-28 18:48:03,172 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.00 vs. limit=15.0 2023-11-28 18:48:06,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3625653.3333333335, ans=0.0 2023-11-28 18:48:19,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3625720.0, ans=0.125 2023-11-28 18:48:22,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3625720.0, ans=0.0 2023-11-28 18:48:23,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3625720.0, ans=0.0 2023-11-28 18:48:26,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=3625786.6666666665, ans=12.0 2023-11-28 18:48:32,321 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:48:36,014 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:48:39,506 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2800, loss[loss=0.05516, simple_loss=0.07829, pruned_loss=0.007209, audio_tagging_loss=0.008805, over 14450.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08885, pruned_loss=0.01206, audio_tagging_loss=0.008539, over 3039258.32 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:48:42,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3625853.3333333335, ans=0.125 2023-11-28 18:48:49,555 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.340e+01 8.943e+01 9.576e+01 1.040e+02 1.629e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 18:48:55,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3625920.0, ans=0.1 2023-11-28 18:48:57,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3625920.0, ans=0.2 2023-11-28 18:49:05,246 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543900 2023-11-28 18:49:05,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3625986.6666666665, ans=0.0 2023-11-28 18:49:30,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3626120.0, ans=0.1 2023-11-28 18:49:35,833 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 18:49:41,561 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2850, loss[loss=0.0746, simple_loss=0.1084, pruned_loss=0.01311, audio_tagging_loss=0.007284, over 15338.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08937, pruned_loss=0.01208, audio_tagging_loss=0.008547, over 3037305.58 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:49:48,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3626186.6666666665, ans=0.0 2023-11-28 18:49:59,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3626253.3333333335, ans=0.05 2023-11-28 18:50:06,808 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 543950 2023-11-28 18:50:43,898 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2900, loss[loss=0.07377, simple_loss=0.1088, pruned_loss=0.01366, audio_tagging_loss=0.005698, over 15845.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08948, pruned_loss=0.01218, audio_tagging_loss=0.008608, over 3038558.12 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:50:55,067 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.774e+01 8.790e+01 9.510e+01 1.033e+02 1.199e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 18:50:57,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3626586.6666666665, ans=0.125 2023-11-28 18:50:59,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3626586.6666666665, ans=0.1 2023-11-28 18:51:08,226 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544000 2023-11-28 18:51:14,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3626653.3333333335, ans=0.125 2023-11-28 18:51:33,433 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.58 vs. limit=15.0 2023-11-28 18:51:40,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3626786.6666666665, ans=0.0 2023-11-28 18:51:45,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3626786.6666666665, ans=0.125 2023-11-28 18:51:48,519 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 2950, loss[loss=0.06275, simple_loss=0.08853, pruned_loss=0.01061, audio_tagging_loss=0.007865, over 15844.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08957, pruned_loss=0.01193, audio_tagging_loss=0.00865, over 3036984.46 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:52:13,292 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544050 2023-11-28 18:52:28,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3627053.3333333335, ans=0.0 2023-11-28 18:52:31,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3627053.3333333335, ans=0.2 2023-11-28 18:52:37,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3627120.0, ans=0.0 2023-11-28 18:52:44,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3627120.0, ans=0.125 2023-11-28 18:52:46,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3627120.0, ans=0.125 2023-11-28 18:52:50,282 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3000, loss[loss=0.06371, simple_loss=0.08593, pruned_loss=0.01004, audio_tagging_loss=0.0107, over 15736.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09031, pruned_loss=0.01202, audio_tagging_loss=0.008611, over 3042139.98 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:52:50,283 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 18:53:33,230 INFO [train_asr.py:1267] (2/4) Epoch 46, validation: loss=0.05731, simple_loss=0.05055, pruned_loss=0.005328, audio_tagging_loss=0.02671, over 4681554.00 frames. 2023-11-28 18:53:33,231 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 18:53:41,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3627186.6666666665, ans=0.0 2023-11-28 18:53:44,163 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 9.011e+01 9.606e+01 1.015e+02 1.587e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 18:53:45,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3627253.3333333335, ans=0.05 2023-11-28 18:53:49,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3627253.3333333335, ans=0.125 2023-11-28 18:53:57,526 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544100 2023-11-28 18:53:57,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3627320.0, ans=0.125 2023-11-28 18:54:18,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3627386.6666666665, ans=0.125 2023-11-28 18:54:22,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3627453.3333333335, ans=0.2 2023-11-28 18:54:34,904 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3050, loss[loss=0.06497, simple_loss=0.09489, pruned_loss=0.01076, audio_tagging_loss=0.006766, over 16655.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09118, pruned_loss=0.0121, audio_tagging_loss=0.008628, over 3044188.17 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:54:36,610 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=15.0 2023-11-28 18:54:59,366 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544150 2023-11-28 18:55:13,357 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 18:55:19,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3627720.0, ans=0.125 2023-11-28 18:55:24,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3627786.6666666665, ans=0.125 2023-11-28 18:55:37,538 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3100, loss[loss=0.07335, simple_loss=0.098, pruned_loss=0.01612, audio_tagging_loss=0.008229, over 15196.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09086, pruned_loss=0.0121, audio_tagging_loss=0.008697, over 3041741.03 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:55:48,881 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.644e+01 9.039e+01 9.695e+01 1.074e+02 1.445e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 18:55:54,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3627920.0, ans=0.125 2023-11-28 18:55:55,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3627920.0, ans=0.125 2023-11-28 18:56:03,221 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544200 2023-11-28 18:56:03,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3627986.6666666665, ans=0.2 2023-11-28 18:56:15,909 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2023-11-28 18:56:39,958 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3150, loss[loss=0.06495, simple_loss=0.09049, pruned_loss=0.01204, audio_tagging_loss=0.007665, over 15375.00 frames. ], tot_loss[loss=0.06648, simple_loss=0.09092, pruned_loss=0.01227, audio_tagging_loss=0.008756, over 3041066.49 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 18:56:45,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3628186.6666666665, ans=0.0 2023-11-28 18:56:45,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3628186.6666666665, ans=0.125 2023-11-28 18:56:59,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3628253.3333333335, ans=0.125 2023-11-28 18:57:05,155 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544250 2023-11-28 18:57:19,270 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.90 vs. limit=22.5 2023-11-28 18:57:25,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3628386.6666666665, ans=0.125 2023-11-28 18:57:42,603 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3200, loss[loss=0.08118, simple_loss=0.1162, pruned_loss=0.01378, audio_tagging_loss=0.009307, over 14842.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08993, pruned_loss=0.01202, audio_tagging_loss=0.0088, over 3042685.32 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:57:46,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3628520.0, ans=0.125 2023-11-28 18:57:52,874 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.859e+01 9.188e+01 9.825e+01 1.034e+02 1.228e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-28 18:57:54,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3628586.6666666665, ans=0.125 2023-11-28 18:57:57,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3628586.6666666665, ans=0.125 2023-11-28 18:58:06,981 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544300 2023-11-28 18:58:17,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3628653.3333333335, ans=0.05 2023-11-28 18:58:44,569 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3250, loss[loss=0.04028, simple_loss=0.04992, pruned_loss=0.004877, audio_tagging_loss=0.01044, over 15858.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08899, pruned_loss=0.01193, audio_tagging_loss=0.008863, over 3050240.81 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:58:44,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3628853.3333333335, ans=0.0 2023-11-28 18:58:46,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3628853.3333333335, ans=0.125 2023-11-28 18:58:52,214 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2023-11-28 18:59:03,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3628920.0, ans=0.125 2023-11-28 18:59:09,809 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544350 2023-11-28 18:59:11,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3628986.6666666665, ans=0.1 2023-11-28 18:59:46,043 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3300, loss[loss=0.08968, simple_loss=0.1217, pruned_loss=0.0217, audio_tagging_loss=0.007118, over 14596.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08928, pruned_loss=0.01199, audio_tagging_loss=0.008933, over 3047424.89 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 18:59:47,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3629186.6666666665, ans=0.125 2023-11-28 18:59:57,861 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 9.009e+01 9.919e+01 1.085e+02 1.499e+02, threshold=1.984e+02, percent-clipped=0.0 2023-11-28 18:59:58,079 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:00:10,742 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544400 2023-11-28 19:00:21,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3629320.0, ans=0.1 2023-11-28 19:00:26,450 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.64 vs. limit=15.0 2023-11-28 19:00:28,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3629386.6666666665, ans=0.125 2023-11-28 19:00:28,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3629386.6666666665, ans=0.0 2023-11-28 19:00:32,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3629386.6666666665, ans=0.0 2023-11-28 19:00:40,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3629453.3333333335, ans=0.0 2023-11-28 19:00:48,536 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3350, loss[loss=0.07001, simple_loss=0.1041, pruned_loss=0.01275, audio_tagging_loss=0.005194, over 14886.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.0893, pruned_loss=0.01197, audio_tagging_loss=0.008882, over 3056374.97 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:00:51,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3629520.0, ans=0.035 2023-11-28 19:01:12,640 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544450 2023-11-28 19:01:19,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3629653.3333333335, ans=0.125 2023-11-28 19:01:26,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3629720.0, ans=0.0 2023-11-28 19:01:41,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3629786.6666666665, ans=0.0 2023-11-28 19:01:42,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3629786.6666666665, ans=0.2 2023-11-28 19:01:49,487 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3400, loss[loss=0.06129, simple_loss=0.08032, pruned_loss=0.01344, audio_tagging_loss=0.007684, over 14971.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08885, pruned_loss=0.01192, audio_tagging_loss=0.008772, over 3058846.55 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:01:49,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3629853.3333333335, ans=0.1 2023-11-28 19:01:55,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3629853.3333333335, ans=0.09899494936611666 2023-11-28 19:01:57,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3629853.3333333335, ans=0.0 2023-11-28 19:01:58,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3629853.3333333335, ans=0.5 2023-11-28 19:02:01,874 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.836e+01 9.096e+01 9.800e+01 1.047e+02 1.329e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-28 19:02:02,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3629920.0, ans=0.2 2023-11-28 19:02:14,088 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544500 2023-11-28 19:02:17,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3629986.6666666665, ans=0.0 2023-11-28 19:02:20,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3629986.6666666665, ans=0.0 2023-11-28 19:02:51,158 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3450, loss[loss=0.06898, simple_loss=0.0857, pruned_loss=0.01326, audio_tagging_loss=0.01287, over 14868.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08909, pruned_loss=0.01188, audio_tagging_loss=0.008684, over 3054506.74 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:02:53,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3630186.6666666665, ans=0.125 2023-11-28 19:02:56,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3630186.6666666665, ans=0.09899494936611666 2023-11-28 19:03:10,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3630253.3333333335, ans=0.0 2023-11-28 19:03:16,952 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544550 2023-11-28 19:03:41,111 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.02 vs. limit=15.0 2023-11-28 19:03:53,797 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3500, loss[loss=0.04312, simple_loss=0.05193, pruned_loss=0.006532, audio_tagging_loss=0.01062, over 14328.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08912, pruned_loss=0.01191, audio_tagging_loss=0.008667, over 3049127.80 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:04:03,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3630520.0, ans=0.0 2023-11-28 19:04:06,259 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 8.809e+01 9.584e+01 1.024e+02 1.310e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 19:04:08,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3630586.6666666665, ans=0.125 2023-11-28 19:04:11,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3630586.6666666665, ans=0.125 2023-11-28 19:04:16,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3630586.6666666665, ans=15.0 2023-11-28 19:04:18,798 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544600 2023-11-28 19:04:23,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3630653.3333333335, ans=0.0 2023-11-28 19:04:27,703 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:04:39,496 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.84 vs. limit=10.0 2023-11-28 19:04:56,003 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3550, loss[loss=0.04333, simple_loss=0.05434, pruned_loss=0.007858, audio_tagging_loss=0.008301, over 14797.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08898, pruned_loss=0.01204, audio_tagging_loss=0.008593, over 3044567.12 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 19:05:20,975 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544650 2023-11-28 19:05:33,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3631053.3333333335, ans=0.2 2023-11-28 19:05:51,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3631120.0, ans=0.0 2023-11-28 19:05:58,328 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3600, loss[loss=0.05432, simple_loss=0.07601, pruned_loss=0.01117, audio_tagging_loss=0.005146, over 13789.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08952, pruned_loss=0.01215, audio_tagging_loss=0.008527, over 3042287.51 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:06:00,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3631186.6666666665, ans=0.125 2023-11-28 19:06:11,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3631253.3333333335, ans=0.125 2023-11-28 19:06:12,392 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.905e+01 9.661e+01 1.038e+02 1.227e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 19:06:19,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3631253.3333333335, ans=0.1 2023-11-28 19:06:23,002 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544700 2023-11-28 19:06:46,131 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2023-11-28 19:06:58,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3631453.3333333335, ans=0.0 2023-11-28 19:07:00,634 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3650, loss[loss=0.06056, simple_loss=0.08306, pruned_loss=0.01043, audio_tagging_loss=0.008605, over 14938.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08899, pruned_loss=0.0121, audio_tagging_loss=0.008518, over 3044827.76 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:07:00,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3631520.0, ans=0.2 2023-11-28 19:07:02,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3631520.0, ans=0.05 2023-11-28 19:07:05,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3631520.0, ans=0.125 2023-11-28 19:07:19,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3631586.6666666665, ans=0.0 2023-11-28 19:07:25,296 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544750 2023-11-28 19:07:26,086 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.83 vs. limit=10.0 2023-11-28 19:07:31,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3631653.3333333335, ans=0.125 2023-11-28 19:07:41,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3631720.0, ans=0.1 2023-11-28 19:07:48,166 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=12.0 2023-11-28 19:08:01,867 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3700, loss[loss=0.07779, simple_loss=0.1083, pruned_loss=0.01553, audio_tagging_loss=0.008104, over 15515.00 frames. ], tot_loss[loss=0.06599, simple_loss=0.09019, pruned_loss=0.01233, audio_tagging_loss=0.008568, over 3049917.36 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:08:02,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3631853.3333333335, ans=0.125 2023-11-28 19:08:05,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3631853.3333333335, ans=0.125 2023-11-28 19:08:07,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3631853.3333333335, ans=0.1 2023-11-28 19:08:07,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3631853.3333333335, ans=0.125 2023-11-28 19:08:15,917 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.815e+01 9.135e+01 9.668e+01 1.042e+02 1.211e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 19:08:27,256 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544800 2023-11-28 19:08:55,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3632120.0, ans=0.0 2023-11-28 19:09:05,289 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3750, loss[loss=0.09319, simple_loss=0.1331, pruned_loss=0.01958, audio_tagging_loss=0.007074, over 15450.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09036, pruned_loss=0.01226, audio_tagging_loss=0.008561, over 3048568.20 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:09:15,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3632186.6666666665, ans=0.125 2023-11-28 19:09:30,351 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544850 2023-11-28 19:09:36,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3632320.0, ans=0.0 2023-11-28 19:09:36,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3632320.0, ans=0.125 2023-11-28 19:09:50,371 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:10:03,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3632453.3333333335, ans=0.125 2023-11-28 19:10:08,029 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3800, loss[loss=0.06246, simple_loss=0.07796, pruned_loss=0.01031, audio_tagging_loss=0.01317, over 15067.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09058, pruned_loss=0.0122, audio_tagging_loss=0.008616, over 3050760.69 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:10:20,983 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.825e+01 8.975e+01 9.556e+01 1.041e+02 1.200e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 19:10:24,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3632586.6666666665, ans=0.0 2023-11-28 19:10:32,435 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544900 2023-11-28 19:10:49,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3632720.0, ans=0.0 2023-11-28 19:10:57,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3632786.6666666665, ans=0.125 2023-11-28 19:11:06,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3632786.6666666665, ans=0.1 2023-11-28 19:11:08,706 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3850, loss[loss=0.07089, simple_loss=0.08895, pruned_loss=0.01644, audio_tagging_loss=0.009978, over 15136.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09125, pruned_loss=0.01233, audio_tagging_loss=0.008596, over 3057026.17 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 19:11:10,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3632853.3333333335, ans=0.0 2023-11-28 19:11:34,469 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 544950 2023-11-28 19:11:35,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3632986.6666666665, ans=0.0 2023-11-28 19:11:43,481 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:11:45,700 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:11:53,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3633053.3333333335, ans=0.125 2023-11-28 19:11:53,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3633053.3333333335, ans=0.1 2023-11-28 19:12:03,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3633120.0, ans=0.125 2023-11-28 19:12:03,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3633120.0, ans=0.0 2023-11-28 19:12:11,430 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3900, loss[loss=0.07058, simple_loss=0.1041, pruned_loss=0.01144, audio_tagging_loss=0.007087, over 15929.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09064, pruned_loss=0.01215, audio_tagging_loss=0.008632, over 3052023.13 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 19:12:26,149 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.937e+01 9.555e+01 1.040e+02 1.282e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-28 19:12:36,335 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545000 2023-11-28 19:12:48,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3633386.6666666665, ans=0.125 2023-11-28 19:13:10,871 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=12.0 2023-11-28 19:13:11,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3633453.3333333335, ans=0.125 2023-11-28 19:13:11,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3633453.3333333335, ans=0.125 2023-11-28 19:13:13,851 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 3950, loss[loss=0.05768, simple_loss=0.07768, pruned_loss=0.0101, audio_tagging_loss=0.008741, over 14456.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.091, pruned_loss=0.01242, audio_tagging_loss=0.008566, over 3050702.51 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 8.0 2023-11-28 19:13:29,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3633586.6666666665, ans=0.125 2023-11-28 19:13:38,146 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545050 2023-11-28 19:14:01,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3633720.0, ans=0.1 2023-11-28 19:14:11,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3633786.6666666665, ans=0.125 2023-11-28 19:14:15,545 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4000, loss[loss=0.05664, simple_loss=0.08066, pruned_loss=0.008037, audio_tagging_loss=0.008271, over 15786.00 frames. ], tot_loss[loss=0.06686, simple_loss=0.09179, pruned_loss=0.01241, audio_tagging_loss=0.008557, over 3056706.55 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:14:23,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3633853.3333333335, ans=0.0 2023-11-28 19:14:30,282 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 9.097e+01 9.894e+01 1.091e+02 1.423e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-28 19:14:40,515 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545100 2023-11-28 19:14:54,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3634053.3333333335, ans=0.2 2023-11-28 19:15:06,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3634120.0, ans=0.1 2023-11-28 19:15:17,445 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4050, loss[loss=0.07212, simple_loss=0.09711, pruned_loss=0.01371, audio_tagging_loss=0.009861, over 14867.00 frames. ], tot_loss[loss=0.06689, simple_loss=0.0915, pruned_loss=0.0125, audio_tagging_loss=0.008639, over 3054945.75 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:15:22,263 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:15:25,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3634186.6666666665, ans=0.125 2023-11-28 19:15:43,003 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545150 2023-11-28 19:15:58,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3634386.6666666665, ans=0.2 2023-11-28 19:16:09,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3634453.3333333335, ans=0.125 2023-11-28 19:16:10,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3634453.3333333335, ans=0.125 2023-11-28 19:16:13,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3634453.3333333335, ans=0.05 2023-11-28 19:16:19,687 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4100, loss[loss=0.06946, simple_loss=0.08966, pruned_loss=0.01395, audio_tagging_loss=0.01068, over 15535.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09104, pruned_loss=0.01241, audio_tagging_loss=0.008677, over 3055488.02 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:16:34,094 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.259e+01 9.021e+01 9.586e+01 1.040e+02 1.361e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 19:16:43,521 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545200 2023-11-28 19:16:43,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3634653.3333333335, ans=0.125 2023-11-28 19:17:01,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3634720.0, ans=0.125 2023-11-28 19:17:07,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3634786.6666666665, ans=0.0 2023-11-28 19:17:21,151 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4150, loss[loss=0.06764, simple_loss=0.09188, pruned_loss=0.01282, audio_tagging_loss=0.008878, over 14178.00 frames. ], tot_loss[loss=0.06701, simple_loss=0.09171, pruned_loss=0.01254, audio_tagging_loss=0.008616, over 3054762.97 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:17:33,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3634920.0, ans=0.1 2023-11-28 19:17:43,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3634920.0, ans=10.0 2023-11-28 19:17:45,621 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545250 2023-11-28 19:17:49,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3634986.6666666665, ans=0.0 2023-11-28 19:18:00,977 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.03 vs. limit=15.0 2023-11-28 19:18:04,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3635053.3333333335, ans=0.125 2023-11-28 19:18:08,580 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:18:14,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3635120.0, ans=0.2 2023-11-28 19:18:22,684 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4200, loss[loss=0.07457, simple_loss=0.1051, pruned_loss=0.01377, audio_tagging_loss=0.008265, over 16358.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09069, pruned_loss=0.01247, audio_tagging_loss=0.008466, over 3054344.89 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:18:36,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3635253.3333333335, ans=0.125 2023-11-28 19:18:37,258 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.060e+01 9.058e+01 9.549e+01 9.941e+01 2.004e+02, threshold=1.910e+02, percent-clipped=1.0 2023-11-28 19:18:47,566 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.86 vs. limit=10.0 2023-11-28 19:18:48,316 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545300 2023-11-28 19:19:01,590 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:19:02,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3635386.6666666665, ans=0.125 2023-11-28 19:19:03,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3635386.6666666665, ans=0.2 2023-11-28 19:19:10,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3635386.6666666665, ans=0.1 2023-11-28 19:19:10,365 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.49 vs. limit=15.0 2023-11-28 19:19:25,336 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4250, loss[loss=0.0466, simple_loss=0.06529, pruned_loss=0.004654, audio_tagging_loss=0.009302, over 14383.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.09106, pruned_loss=0.0124, audio_tagging_loss=0.008366, over 3052599.21 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:19:50,988 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545350 2023-11-28 19:20:12,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3635720.0, ans=0.0 2023-11-28 19:20:15,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3635786.6666666665, ans=0.2 2023-11-28 19:20:20,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3635786.6666666665, ans=0.1 2023-11-28 19:20:28,732 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4300, loss[loss=0.06642, simple_loss=0.09457, pruned_loss=0.01072, audio_tagging_loss=0.008412, over 16068.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08969, pruned_loss=0.01208, audio_tagging_loss=0.008405, over 3054207.41 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:20:29,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3635853.3333333335, ans=0.125 2023-11-28 19:20:42,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.171e+01 9.029e+01 9.603e+01 1.044e+02 1.295e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 19:20:52,815 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545400 2023-11-28 19:20:54,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3635986.6666666665, ans=0.125 2023-11-28 19:21:00,097 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.60 vs. limit=22.5 2023-11-28 19:21:03,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3635986.6666666665, ans=0.2 2023-11-28 19:21:05,468 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:21:29,127 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4350, loss[loss=0.08276, simple_loss=0.1041, pruned_loss=0.02039, audio_tagging_loss=0.01031, over 14225.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.0892, pruned_loss=0.01199, audio_tagging_loss=0.008388, over 3055637.76 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:21:38,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3636186.6666666665, ans=0.1 2023-11-28 19:21:39,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3636186.6666666665, ans=0.0 2023-11-28 19:21:50,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3636253.3333333335, ans=0.1 2023-11-28 19:21:54,041 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545450 2023-11-28 19:22:02,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3636320.0, ans=0.125 2023-11-28 19:22:07,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3636386.6666666665, ans=0.1 2023-11-28 19:22:08,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3636386.6666666665, ans=0.125 2023-11-28 19:22:10,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3636386.6666666665, ans=0.2 2023-11-28 19:22:31,069 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4400, loss[loss=0.05851, simple_loss=0.07013, pruned_loss=0.01328, audio_tagging_loss=0.01017, over 15165.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.0896, pruned_loss=0.01204, audio_tagging_loss=0.008373, over 3062942.57 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:22:36,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3636520.0, ans=0.125 2023-11-28 19:22:44,579 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.62 vs. limit=15.0 2023-11-28 19:22:46,742 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 9.163e+01 9.666e+01 1.055e+02 1.360e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-28 19:22:46,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3636586.6666666665, ans=0.125 2023-11-28 19:22:56,130 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545500 2023-11-28 19:22:59,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3636653.3333333335, ans=0.125 2023-11-28 19:23:21,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3636786.6666666665, ans=0.0 2023-11-28 19:23:26,257 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.95 vs. limit=12.0 2023-11-28 19:23:33,506 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4450, loss[loss=0.07221, simple_loss=0.09907, pruned_loss=0.01565, audio_tagging_loss=0.007021, over 14953.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08936, pruned_loss=0.01191, audio_tagging_loss=0.008381, over 3059298.35 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:23:50,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3636920.0, ans=0.04949747468305833 2023-11-28 19:23:52,906 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.57 vs. limit=12.0 2023-11-28 19:23:58,944 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545550 2023-11-28 19:24:07,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3636986.6666666665, ans=0.125 2023-11-28 19:24:30,540 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2023-11-28 19:24:35,784 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4500, loss[loss=0.05717, simple_loss=0.06577, pruned_loss=0.01135, audio_tagging_loss=0.01293, over 13697.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.09024, pruned_loss=0.0121, audio_tagging_loss=0.00837, over 3059426.32 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:24:36,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3637186.6666666665, ans=0.125 2023-11-28 19:24:36,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3637186.6666666665, ans=0.125 2023-11-28 19:24:37,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3637186.6666666665, ans=0.0 2023-11-28 19:24:40,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3637186.6666666665, ans=0.125 2023-11-28 19:24:50,605 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.752e+01 9.380e+01 1.023e+02 1.206e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 19:25:00,924 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545600 2023-11-28 19:25:03,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3637320.0, ans=0.0 2023-11-28 19:25:06,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3637320.0, ans=0.95 2023-11-28 19:25:14,879 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=15.0 2023-11-28 19:25:16,053 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2023-11-28 19:25:18,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3637386.6666666665, ans=0.125 2023-11-28 19:25:38,407 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4550, loss[loss=0.08024, simple_loss=0.1027, pruned_loss=0.01943, audio_tagging_loss=0.009479, over 14671.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.0901, pruned_loss=0.01217, audio_tagging_loss=0.008359, over 3057389.19 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:25:54,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3637586.6666666665, ans=0.125 2023-11-28 19:25:56,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3637586.6666666665, ans=0.0 2023-11-28 19:26:03,951 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545650 2023-11-28 19:26:06,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3637653.3333333335, ans=0.125 2023-11-28 19:26:28,097 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:26:40,915 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4600, loss[loss=0.07067, simple_loss=0.09654, pruned_loss=0.01242, audio_tagging_loss=0.009977, over 14743.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09046, pruned_loss=0.01213, audio_tagging_loss=0.00842, over 3053731.76 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:26:41,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3637853.3333333335, ans=0.04949747468305833 2023-11-28 19:26:42,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3637853.3333333335, ans=0.0 2023-11-28 19:26:48,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3637853.3333333335, ans=0.0 2023-11-28 19:26:50,785 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.07 vs. limit=10.0 2023-11-28 19:26:55,388 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.046e+01 8.776e+01 9.447e+01 1.031e+02 1.407e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 19:27:05,236 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545700 2023-11-28 19:27:11,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3637986.6666666665, ans=0.1 2023-11-28 19:27:30,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3638120.0, ans=0.0 2023-11-28 19:27:42,005 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4650, loss[loss=0.05851, simple_loss=0.07694, pruned_loss=0.007063, audio_tagging_loss=0.01297, over 14760.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08943, pruned_loss=0.01198, audio_tagging_loss=0.008567, over 3054015.73 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:27:48,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3638186.6666666665, ans=0.125 2023-11-28 19:27:55,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3638253.3333333335, ans=0.1 2023-11-28 19:28:06,585 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545750 2023-11-28 19:28:07,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3638320.0, ans=0.125 2023-11-28 19:28:16,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3638320.0, ans=0.125 2023-11-28 19:28:20,509 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:28:23,122 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.04 vs. limit=15.0 2023-11-28 19:28:28,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3638386.6666666665, ans=0.1 2023-11-28 19:28:33,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3638453.3333333335, ans=0.125 2023-11-28 19:28:34,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3638453.3333333335, ans=0.2 2023-11-28 19:28:44,180 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4700, loss[loss=0.05594, simple_loss=0.07666, pruned_loss=0.00894, audio_tagging_loss=0.00867, over 14931.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08979, pruned_loss=0.01228, audio_tagging_loss=0.00861, over 3054008.80 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:29:00,541 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 9.174e+01 9.774e+01 1.029e+02 1.399e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-28 19:29:03,519 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.34 vs. limit=12.0 2023-11-28 19:29:09,105 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545800 2023-11-28 19:29:18,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3638653.3333333335, ans=0.125 2023-11-28 19:29:19,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3638653.3333333335, ans=0.125 2023-11-28 19:29:20,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3638720.0, ans=0.0 2023-11-28 19:29:23,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3638720.0, ans=0.125 2023-11-28 19:29:24,433 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:29:27,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3638720.0, ans=0.125 2023-11-28 19:29:36,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3638786.6666666665, ans=0.125 2023-11-28 19:29:47,374 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4750, loss[loss=0.05682, simple_loss=0.07892, pruned_loss=0.008361, audio_tagging_loss=0.008999, over 14920.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08897, pruned_loss=0.01223, audio_tagging_loss=0.008789, over 3056375.57 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:29:47,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3638853.3333333335, ans=0.0 2023-11-28 19:29:48,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3638853.3333333335, ans=0.125 2023-11-28 19:29:48,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3638853.3333333335, ans=0.0 2023-11-28 19:29:51,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3638853.3333333335, ans=0.125 2023-11-28 19:29:58,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3638920.0, ans=0.0 2023-11-28 19:30:01,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3638920.0, ans=0.125 2023-11-28 19:30:11,896 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545850 2023-11-28 19:30:15,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3638986.6666666665, ans=0.125 2023-11-28 19:30:28,132 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.75 vs. limit=15.0 2023-11-28 19:30:37,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3639120.0, ans=0.2 2023-11-28 19:30:48,673 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4800, loss[loss=0.06238, simple_loss=0.0865, pruned_loss=0.009417, audio_tagging_loss=0.009709, over 15396.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.08958, pruned_loss=0.01219, audio_tagging_loss=0.008828, over 3057563.91 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:30:48,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3639186.6666666665, ans=0.125 2023-11-28 19:30:50,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3639186.6666666665, ans=0.125 2023-11-28 19:31:05,143 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 8.815e+01 9.365e+01 1.036e+02 1.386e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-28 19:31:14,138 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545900 2023-11-28 19:31:32,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3639386.6666666665, ans=0.0 2023-11-28 19:31:37,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3639453.3333333335, ans=0.125 2023-11-28 19:31:38,520 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.64 vs. limit=22.5 2023-11-28 19:31:46,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3639453.3333333335, ans=0.125 2023-11-28 19:31:51,024 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4850, loss[loss=0.07163, simple_loss=0.1019, pruned_loss=0.01303, audio_tagging_loss=0.007666, over 15974.00 frames. ], tot_loss[loss=0.06593, simple_loss=0.0897, pruned_loss=0.01214, audio_tagging_loss=0.008936, over 3053335.51 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:32:01,933 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-28 19:32:10,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3639586.6666666665, ans=0.125 2023-11-28 19:32:15,712 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 545950 2023-11-28 19:32:15,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3639653.3333333335, ans=0.1 2023-11-28 19:32:35,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3639720.0, ans=0.125 2023-11-28 19:32:52,921 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4900, loss[loss=0.06938, simple_loss=0.09943, pruned_loss=0.0131, audio_tagging_loss=0.006572, over 16236.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08871, pruned_loss=0.01194, audio_tagging_loss=0.008904, over 3053649.70 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:32:56,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3639853.3333333335, ans=0.125 2023-11-28 19:32:59,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3639853.3333333335, ans=0.2 2023-11-28 19:33:10,135 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.219e+01 8.839e+01 9.451e+01 1.014e+02 1.484e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 19:33:16,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3639986.6666666665, ans=0.125 2023-11-28 19:33:17,177 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546000 2023-11-28 19:33:27,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=3639986.6666666665, ans=0.02 2023-11-28 19:33:39,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3640053.3333333335, ans=0.1 2023-11-28 19:33:40,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3640053.3333333335, ans=0.1 2023-11-28 19:33:41,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3640120.0, ans=0.0 2023-11-28 19:33:44,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3640120.0, ans=0.125 2023-11-28 19:33:48,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3640120.0, ans=0.125 2023-11-28 19:33:52,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3640120.0, ans=0.125 2023-11-28 19:33:54,883 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 4950, loss[loss=0.06879, simple_loss=0.1038, pruned_loss=0.008901, audio_tagging_loss=0.007965, over 15438.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08956, pruned_loss=0.01203, audio_tagging_loss=0.008692, over 3060129.03 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:33:59,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3640186.6666666665, ans=0.1 2023-11-28 19:34:12,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3640253.3333333335, ans=0.0 2023-11-28 19:34:15,509 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.68 vs. limit=10.0 2023-11-28 19:34:19,500 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546050 2023-11-28 19:34:27,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3640320.0, ans=0.125 2023-11-28 19:34:36,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3640386.6666666665, ans=0.0 2023-11-28 19:34:40,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3640386.6666666665, ans=0.0 2023-11-28 19:34:44,164 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:34:55,712 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5000, loss[loss=0.05896, simple_loss=0.08331, pruned_loss=0.008687, audio_tagging_loss=0.008611, over 15504.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08868, pruned_loss=0.01205, audio_tagging_loss=0.008614, over 3050497.61 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:35:13,455 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.947e+01 9.568e+01 1.019e+02 1.168e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 19:35:14,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3640586.6666666665, ans=0.2 2023-11-28 19:35:21,190 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546100 2023-11-28 19:35:55,425 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.04 vs. limit=22.5 2023-11-28 19:35:58,094 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5050, loss[loss=0.04881, simple_loss=0.06344, pruned_loss=0.007643, audio_tagging_loss=0.009446, over 15026.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08918, pruned_loss=0.01206, audio_tagging_loss=0.008518, over 3046274.90 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:35:59,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3640853.3333333335, ans=0.2 2023-11-28 19:36:01,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3640853.3333333335, ans=0.0 2023-11-28 19:36:08,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3640853.3333333335, ans=0.125 2023-11-28 19:36:22,310 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546150 2023-11-28 19:36:26,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3640986.6666666665, ans=0.0 2023-11-28 19:36:57,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3641120.0, ans=0.0 2023-11-28 19:36:58,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3641120.0, ans=0.0 2023-11-28 19:37:00,160 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5100, loss[loss=0.07724, simple_loss=0.1078, pruned_loss=0.01308, audio_tagging_loss=0.01026, over 14907.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08954, pruned_loss=0.01205, audio_tagging_loss=0.008566, over 3044657.50 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:37:12,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3641253.3333333335, ans=0.125 2023-11-28 19:37:17,157 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.885e+01 9.689e+01 1.021e+02 1.449e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-28 19:37:22,869 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2023-11-28 19:37:24,874 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546200 2023-11-28 19:37:34,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3641320.0, ans=0.2 2023-11-28 19:37:43,052 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=22.5 2023-11-28 19:37:46,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3641386.6666666665, ans=0.2 2023-11-28 19:37:57,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3641453.3333333335, ans=0.1 2023-11-28 19:38:01,240 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5150, loss[loss=0.04834, simple_loss=0.0713, pruned_loss=0.005566, audio_tagging_loss=0.007125, over 16439.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08951, pruned_loss=0.01194, audio_tagging_loss=0.00856, over 3049717.00 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:38:01,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3641520.0, ans=0.125 2023-11-28 19:38:01,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=15.0 2023-11-28 19:38:08,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3641520.0, ans=0.0 2023-11-28 19:38:27,204 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546250 2023-11-28 19:38:32,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3641653.3333333335, ans=0.125 2023-11-28 19:38:36,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3641653.3333333335, ans=0.125 2023-11-28 19:38:39,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3641720.0, ans=0.0 2023-11-28 19:38:47,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3641720.0, ans=0.125 2023-11-28 19:38:47,642 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.26 vs. limit=15.0 2023-11-28 19:38:48,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=12.0 2023-11-28 19:39:04,447 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5200, loss[loss=0.08999, simple_loss=0.1302, pruned_loss=0.01916, audio_tagging_loss=0.00575, over 15751.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09065, pruned_loss=0.01205, audio_tagging_loss=0.008415, over 3051191.99 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:39:22,724 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 9.099e+01 9.660e+01 1.024e+02 1.324e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 19:39:28,757 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546300 2023-11-28 19:39:41,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3642053.3333333335, ans=0.0 2023-11-28 19:39:43,829 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2023-11-28 19:40:06,044 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5250, loss[loss=0.06787, simple_loss=0.09108, pruned_loss=0.01547, audio_tagging_loss=0.006863, over 14777.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09038, pruned_loss=0.01212, audio_tagging_loss=0.008447, over 3050084.57 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:40:30,095 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546350 2023-11-28 19:40:35,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3642320.0, ans=0.0 2023-11-28 19:40:43,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3642386.6666666665, ans=0.0 2023-11-28 19:40:51,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3642386.6666666665, ans=0.2 2023-11-28 19:41:03,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3642453.3333333335, ans=0.05 2023-11-28 19:41:06,895 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5300, loss[loss=0.06723, simple_loss=0.09679, pruned_loss=0.01103, audio_tagging_loss=0.007814, over 16250.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09151, pruned_loss=0.01219, audio_tagging_loss=0.008374, over 3052386.42 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:41:25,544 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.603e+01 9.027e+01 9.738e+01 1.069e+02 1.273e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-28 19:41:31,984 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546400 2023-11-28 19:41:32,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3642653.3333333335, ans=0.0 2023-11-28 19:41:52,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3642720.0, ans=0.125 2023-11-28 19:41:55,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3642786.6666666665, ans=0.125 2023-11-28 19:42:00,667 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=22.5 2023-11-28 19:42:08,123 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5350, loss[loss=0.05145, simple_loss=0.06919, pruned_loss=0.007524, audio_tagging_loss=0.009332, over 16263.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.0905, pruned_loss=0.01213, audio_tagging_loss=0.008432, over 3050957.84 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:42:08,339 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:42:33,663 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546450 2023-11-28 19:42:41,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3642986.6666666665, ans=10.0 2023-11-28 19:42:50,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3643053.3333333335, ans=0.125 2023-11-28 19:42:56,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3643120.0, ans=0.125 2023-11-28 19:43:10,544 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5400, loss[loss=0.07779, simple_loss=0.1005, pruned_loss=0.01722, audio_tagging_loss=0.01033, over 16747.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09198, pruned_loss=0.01239, audio_tagging_loss=0.008372, over 3051708.58 frames. ], batch size: 62, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:43:17,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3643186.6666666665, ans=0.125 2023-11-28 19:43:23,997 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2023-11-28 19:43:28,025 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 9.157e+01 9.833e+01 1.043e+02 1.444e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-28 19:43:28,387 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:43:34,953 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546500 2023-11-28 19:43:39,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3643320.0, ans=0.125 2023-11-28 19:43:40,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3643320.0, ans=0.1 2023-11-28 19:44:12,577 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5450, loss[loss=0.06526, simple_loss=0.08353, pruned_loss=0.01271, audio_tagging_loss=0.01079, over 15106.00 frames. ], tot_loss[loss=0.06688, simple_loss=0.09209, pruned_loss=0.01246, audio_tagging_loss=0.008379, over 3053346.97 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:44:23,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3643520.0, ans=15.0 2023-11-28 19:44:37,500 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546550 2023-11-28 19:44:37,994 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2023-11-28 19:44:39,857 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=15.0 2023-11-28 19:44:56,988 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=12.0 2023-11-28 19:45:05,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3643786.6666666665, ans=0.0 2023-11-28 19:45:07,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3643786.6666666665, ans=0.125 2023-11-28 19:45:09,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3643786.6666666665, ans=0.2 2023-11-28 19:45:13,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3643853.3333333335, ans=0.0 2023-11-28 19:45:14,838 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5500, loss[loss=0.06621, simple_loss=0.08568, pruned_loss=0.0124, audio_tagging_loss=0.01097, over 15554.00 frames. ], tot_loss[loss=0.0669, simple_loss=0.09206, pruned_loss=0.01252, audio_tagging_loss=0.008354, over 3046965.14 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:45:20,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3643853.3333333335, ans=0.0 2023-11-28 19:45:29,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3643920.0, ans=0.125 2023-11-28 19:45:33,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3643920.0, ans=0.125 2023-11-28 19:45:34,333 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.526e+01 8.789e+01 9.570e+01 1.015e+02 1.276e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 19:45:37,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3643920.0, ans=0.125 2023-11-28 19:45:40,411 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546600 2023-11-28 19:45:52,949 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2023-11-28 19:46:16,818 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:46:17,727 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5550, loss[loss=0.07202, simple_loss=0.09838, pruned_loss=0.01439, audio_tagging_loss=0.008441, over 14694.00 frames. ], tot_loss[loss=0.06693, simple_loss=0.09182, pruned_loss=0.01262, audio_tagging_loss=0.008404, over 3044896.84 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:46:26,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3644186.6666666665, ans=0.125 2023-11-28 19:46:35,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3644253.3333333335, ans=0.125 2023-11-28 19:46:41,303 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546650 2023-11-28 19:46:48,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3644320.0, ans=0.2 2023-11-28 19:46:50,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3644320.0, ans=0.125 2023-11-28 19:46:51,564 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:47:00,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3644386.6666666665, ans=0.125 2023-11-28 19:47:18,554 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5600, loss[loss=0.05228, simple_loss=0.06549, pruned_loss=0.008231, audio_tagging_loss=0.01131, over 15061.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.0911, pruned_loss=0.01243, audio_tagging_loss=0.008573, over 3045411.23 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:47:33,797 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2023-11-28 19:47:38,126 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 8.943e+01 9.603e+01 1.015e+02 1.818e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 19:47:43,575 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546700 2023-11-28 19:48:05,761 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:48:13,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3644786.6666666665, ans=0.2 2023-11-28 19:48:20,621 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5650, loss[loss=0.07279, simple_loss=0.1028, pruned_loss=0.01607, audio_tagging_loss=0.005333, over 14512.00 frames. ], tot_loss[loss=0.06661, simple_loss=0.09076, pruned_loss=0.01249, audio_tagging_loss=0.008742, over 3046128.78 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:48:30,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3644853.3333333335, ans=0.125 2023-11-28 19:48:44,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3644986.6666666665, ans=0.2 2023-11-28 19:48:45,947 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546750 2023-11-28 19:49:04,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3645053.3333333335, ans=0.125 2023-11-28 19:49:06,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3645053.3333333335, ans=0.025 2023-11-28 19:49:21,869 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5700, loss[loss=0.0571, simple_loss=0.08733, pruned_loss=0.007296, audio_tagging_loss=0.006134, over 14307.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.0909, pruned_loss=0.0125, audio_tagging_loss=0.008732, over 3048291.10 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:49:27,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3645186.6666666665, ans=0.1 2023-11-28 19:49:41,953 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.069e+01 8.661e+01 9.434e+01 1.006e+02 1.407e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 19:49:42,700 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-28 19:49:44,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3645253.3333333335, ans=0.0 2023-11-28 19:49:46,725 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546800 2023-11-28 19:50:04,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3645386.6666666665, ans=0.125 2023-11-28 19:50:12,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3645453.3333333335, ans=0.05 2023-11-28 19:50:24,567 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5750, loss[loss=0.0532, simple_loss=0.06837, pruned_loss=0.009329, audio_tagging_loss=0.009682, over 15394.00 frames. ], tot_loss[loss=0.06629, simple_loss=0.0905, pruned_loss=0.01239, audio_tagging_loss=0.008653, over 3051542.95 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:50:26,358 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2023-11-28 19:50:29,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3645520.0, ans=0.2 2023-11-28 19:50:49,882 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546850 2023-11-28 19:51:26,959 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5800, loss[loss=0.06949, simple_loss=0.1006, pruned_loss=0.0108, audio_tagging_loss=0.008408, over 14800.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08992, pruned_loss=0.01228, audio_tagging_loss=0.008521, over 3042370.22 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:51:46,757 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.935e+01 9.736e+01 1.038e+02 1.216e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 19:51:51,504 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546900 2023-11-28 19:52:07,230 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2023-11-28 19:52:13,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3646053.3333333335, ans=0.125 2023-11-28 19:52:13,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3646053.3333333335, ans=0.0 2023-11-28 19:52:21,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3646120.0, ans=0.2 2023-11-28 19:52:28,736 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5850, loss[loss=0.06429, simple_loss=0.09133, pruned_loss=0.01021, audio_tagging_loss=0.008418, over 14087.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08939, pruned_loss=0.01224, audio_tagging_loss=0.008551, over 3041440.92 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:52:52,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3646320.0, ans=0.125 2023-11-28 19:52:53,854 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 546950 2023-11-28 19:53:04,917 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:53:07,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3646386.6666666665, ans=0.125 2023-11-28 19:53:11,813 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.04 vs. limit=15.0 2023-11-28 19:53:11,907 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=12.0 2023-11-28 19:53:23,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3646453.3333333335, ans=0.125 2023-11-28 19:53:26,646 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2023-11-28 19:53:30,695 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5900, loss[loss=0.06723, simple_loss=0.08543, pruned_loss=0.01452, audio_tagging_loss=0.01, over 14291.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08889, pruned_loss=0.01219, audio_tagging_loss=0.008582, over 3044331.26 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:53:50,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.425e+01 9.034e+01 9.568e+01 1.043e+02 1.665e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 19:53:51,419 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-28 19:53:55,723 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547000 2023-11-28 19:54:01,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3646653.3333333335, ans=0.0 2023-11-28 19:54:12,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3646720.0, ans=0.0 2023-11-28 19:54:30,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3646786.6666666665, ans=0.2 2023-11-28 19:54:31,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3646853.3333333335, ans=0.125 2023-11-28 19:54:33,352 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 5950, loss[loss=0.07184, simple_loss=0.09684, pruned_loss=0.01628, audio_tagging_loss=0.007141, over 15980.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08886, pruned_loss=0.01217, audio_tagging_loss=0.008573, over 3053917.42 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:54:47,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3646920.0, ans=0.1 2023-11-28 19:54:58,262 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547050 2023-11-28 19:55:09,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3647053.3333333335, ans=0.125 2023-11-28 19:55:34,999 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6000, loss[loss=0.06798, simple_loss=0.085, pruned_loss=0.01442, audio_tagging_loss=0.01106, over 14738.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08821, pruned_loss=0.01197, audio_tagging_loss=0.008659, over 3047845.65 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 19:55:35,000 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 19:56:14,887 INFO [train_asr.py:1267] (2/4) Epoch 46, validation: loss=0.05742, simple_loss=0.05049, pruned_loss=0.005198, audio_tagging_loss=0.02698, over 4681554.00 frames. 2023-11-28 19:56:14,887 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 19:56:16,525 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2023-11-28 19:56:34,501 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.009e+01 8.785e+01 9.507e+01 1.045e+02 2.026e+02, threshold=1.901e+02, percent-clipped=1.0 2023-11-28 19:56:40,027 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547100 2023-11-28 19:56:50,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3647320.0, ans=0.2 2023-11-28 19:57:01,538 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 19:57:16,620 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6050, loss[loss=0.06173, simple_loss=0.08876, pruned_loss=0.01106, audio_tagging_loss=0.006282, over 15260.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09029, pruned_loss=0.01236, audio_tagging_loss=0.008489, over 3049200.79 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:57:16,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3647520.0, ans=0.1 2023-11-28 19:57:41,456 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547150 2023-11-28 19:57:41,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3647653.3333333335, ans=0.125 2023-11-28 19:57:53,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3647720.0, ans=0.0 2023-11-28 19:57:57,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3647720.0, ans=0.125 2023-11-28 19:58:12,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3647786.6666666665, ans=0.07 2023-11-28 19:58:12,243 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:58:18,273 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6100, loss[loss=0.07504, simple_loss=0.1065, pruned_loss=0.01375, audio_tagging_loss=0.008011, over 15932.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09046, pruned_loss=0.01231, audio_tagging_loss=0.008531, over 3049128.61 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:58:29,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3647920.0, ans=0.09899494936611666 2023-11-28 19:58:32,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3647920.0, ans=0.2 2023-11-28 19:58:39,056 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.974e+01 9.540e+01 1.034e+02 1.321e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 19:58:40,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3647920.0, ans=0.0 2023-11-28 19:58:42,840 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547200 2023-11-28 19:58:43,398 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2023-11-28 19:58:54,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3647986.6666666665, ans=0.125 2023-11-28 19:59:15,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3648120.0, ans=0.2 2023-11-28 19:59:20,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3648186.6666666665, ans=0.125 2023-11-28 19:59:20,935 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6150, loss[loss=0.06536, simple_loss=0.08977, pruned_loss=0.01277, audio_tagging_loss=0.007704, over 15682.00 frames. ], tot_loss[loss=0.06607, simple_loss=0.09043, pruned_loss=0.01238, audio_tagging_loss=0.008471, over 3046488.78 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 19:59:23,567 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 19:59:31,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3648253.3333333335, ans=0.05 2023-11-28 19:59:44,236 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2023-11-28 19:59:46,024 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547250 2023-11-28 20:00:08,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3648386.6666666665, ans=0.125 2023-11-28 20:00:08,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3648386.6666666665, ans=0.0 2023-11-28 20:00:20,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3648520.0, ans=0.2 2023-11-28 20:00:21,934 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6200, loss[loss=0.072, simple_loss=0.09952, pruned_loss=0.0144, audio_tagging_loss=0.007841, over 15907.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08942, pruned_loss=0.01218, audio_tagging_loss=0.00853, over 3044671.10 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:00:32,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3648520.0, ans=0.05 2023-11-28 20:00:42,978 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 8.739e+01 9.716e+01 1.027e+02 1.338e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-28 20:00:43,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3648586.6666666665, ans=0.1 2023-11-28 20:00:44,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3648586.6666666665, ans=0.125 2023-11-28 20:00:47,080 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547300 2023-11-28 20:01:23,797 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6250, loss[loss=0.08214, simple_loss=0.1205, pruned_loss=0.01545, audio_tagging_loss=0.006449, over 15026.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.0898, pruned_loss=0.01227, audio_tagging_loss=0.008682, over 3043134.35 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:01:26,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3648853.3333333335, ans=0.2 2023-11-28 20:01:42,552 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.12 vs. limit=22.5 2023-11-28 20:01:46,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3648986.6666666665, ans=0.125 2023-11-28 20:01:47,669 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547350 2023-11-28 20:01:50,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3648986.6666666665, ans=0.0 2023-11-28 20:01:51,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3648986.6666666665, ans=0.2 2023-11-28 20:02:06,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3649053.3333333335, ans=0.125 2023-11-28 20:02:15,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3649120.0, ans=0.0 2023-11-28 20:02:25,158 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6300, loss[loss=0.06304, simple_loss=0.089, pruned_loss=0.01143, audio_tagging_loss=0.007106, over 14821.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08985, pruned_loss=0.01225, audio_tagging_loss=0.008769, over 3042143.06 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:02:26,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3649186.6666666665, ans=0.0 2023-11-28 20:02:45,250 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.610e+01 8.912e+01 9.440e+01 1.014e+02 1.345e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 20:02:49,503 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547400 2023-11-28 20:03:05,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3649386.6666666665, ans=0.04949747468305833 2023-11-28 20:03:11,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3649386.6666666665, ans=0.125 2023-11-28 20:03:14,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3649453.3333333335, ans=0.125 2023-11-28 20:03:18,555 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.36 vs. limit=15.0 2023-11-28 20:03:26,064 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6350, loss[loss=0.05922, simple_loss=0.0783, pruned_loss=0.009994, audio_tagging_loss=0.01008, over 15549.00 frames. ], tot_loss[loss=0.06634, simple_loss=0.09069, pruned_loss=0.01228, audio_tagging_loss=0.008719, over 3044278.63 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:03:51,523 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547450 2023-11-28 20:03:53,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3649653.3333333335, ans=0.1 2023-11-28 20:03:59,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3649653.3333333335, ans=0.125 2023-11-28 20:04:17,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3649786.6666666665, ans=0.125 2023-11-28 20:04:18,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3649786.6666666665, ans=0.1 2023-11-28 20:04:18,735 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.60 vs. limit=22.5 2023-11-28 20:04:19,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3649786.6666666665, ans=0.125 2023-11-28 20:04:25,218 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.25 vs. limit=22.5 2023-11-28 20:04:28,181 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6400, loss[loss=0.06183, simple_loss=0.07074, pruned_loss=0.01423, audio_tagging_loss=0.01223, over 14140.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.08987, pruned_loss=0.0122, audio_tagging_loss=0.00889, over 3046264.53 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:04:49,338 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 9.001e+01 9.641e+01 1.036e+02 1.339e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-28 20:04:52,952 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547500 2023-11-28 20:05:05,829 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2023-11-28 20:05:09,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3650053.3333333335, ans=0.2 2023-11-28 20:05:14,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3650053.3333333335, ans=0.0 2023-11-28 20:05:18,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3650120.0, ans=0.125 2023-11-28 20:05:30,123 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6450, loss[loss=0.06736, simple_loss=0.09555, pruned_loss=0.01118, audio_tagging_loss=0.008397, over 14758.00 frames. ], tot_loss[loss=0.06655, simple_loss=0.09063, pruned_loss=0.01234, audio_tagging_loss=0.008901, over 3045512.68 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:05:31,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3650186.6666666665, ans=0.04949747468305833 2023-11-28 20:05:38,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3650186.6666666665, ans=10.0 2023-11-28 20:05:45,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3650253.3333333335, ans=0.0 2023-11-28 20:05:51,395 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.91 vs. limit=15.0 2023-11-28 20:05:54,432 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547550 2023-11-28 20:05:57,194 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2023-11-28 20:06:30,884 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6500, loss[loss=0.04966, simple_loss=0.06641, pruned_loss=0.008156, audio_tagging_loss=0.008301, over 15032.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08939, pruned_loss=0.01202, audio_tagging_loss=0.008848, over 3044135.01 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:06:39,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3650520.0, ans=0.1 2023-11-28 20:06:41,406 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2023-11-28 20:06:51,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3650586.6666666665, ans=0.2 2023-11-28 20:06:54,010 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.749e+01 9.341e+01 9.951e+01 1.412e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-28 20:06:56,590 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547600 2023-11-28 20:07:17,034 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2023-11-28 20:07:33,354 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6550, loss[loss=0.05818, simple_loss=0.07755, pruned_loss=0.0122, audio_tagging_loss=0.007213, over 15564.00 frames. ], tot_loss[loss=0.06635, simple_loss=0.09073, pruned_loss=0.01235, audio_tagging_loss=0.008642, over 3057267.98 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:07:35,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3650853.3333333335, ans=0.125 2023-11-28 20:07:44,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3650920.0, ans=0.1 2023-11-28 20:07:58,198 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547650 2023-11-28 20:08:35,793 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6600, loss[loss=0.0706, simple_loss=0.1009, pruned_loss=0.01245, audio_tagging_loss=0.007695, over 14686.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.0912, pruned_loss=0.01245, audio_tagging_loss=0.008487, over 3051453.98 frames. ], batch size: 53, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:08:43,687 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.77 vs. limit=12.0 2023-11-28 20:08:46,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3651186.6666666665, ans=0.125 2023-11-28 20:08:52,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3651253.3333333335, ans=0.1 2023-11-28 20:08:58,744 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.741e+01 8.944e+01 9.598e+01 1.039e+02 1.454e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 20:09:01,275 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547700 2023-11-28 20:09:13,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3651386.6666666665, ans=0.0 2023-11-28 20:09:19,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3651386.6666666665, ans=0.125 2023-11-28 20:09:32,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3651453.3333333335, ans=0.1 2023-11-28 20:09:38,385 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6650, loss[loss=0.06813, simple_loss=0.09763, pruned_loss=0.009784, audio_tagging_loss=0.009536, over 15887.00 frames. ], tot_loss[loss=0.06608, simple_loss=0.09066, pruned_loss=0.01225, audio_tagging_loss=0.008501, over 3054842.32 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:09:38,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3651520.0, ans=0.125 2023-11-28 20:10:02,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3651653.3333333335, ans=15.0 2023-11-28 20:10:03,137 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547750 2023-11-28 20:10:09,038 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=15.0 2023-11-28 20:10:24,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3651720.0, ans=0.2 2023-11-28 20:10:39,457 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6700, loss[loss=0.06839, simple_loss=0.09641, pruned_loss=0.01427, audio_tagging_loss=0.005912, over 14963.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09076, pruned_loss=0.01218, audio_tagging_loss=0.008472, over 3051468.92 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:10:43,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3651853.3333333335, ans=0.1 2023-11-28 20:10:52,348 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2023-11-28 20:10:57,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3651920.0, ans=0.0 2023-11-28 20:11:02,320 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.565e+01 9.256e+01 9.960e+01 1.372e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-28 20:11:04,733 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547800 2023-11-28 20:11:16,399 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.83 vs. limit=15.0 2023-11-28 20:11:20,102 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-28 20:11:20,213 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.96 vs. limit=6.0 2023-11-28 20:11:21,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3652053.3333333335, ans=0.125 2023-11-28 20:11:32,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3652120.0, ans=0.125 2023-11-28 20:11:34,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3652120.0, ans=0.1 2023-11-28 20:11:42,288 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6750, loss[loss=0.06454, simple_loss=0.08276, pruned_loss=0.01445, audio_tagging_loss=0.008701, over 15898.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08918, pruned_loss=0.01193, audio_tagging_loss=0.008562, over 3038126.62 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:12:06,690 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547850 2023-11-28 20:12:35,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3652453.3333333335, ans=0.04949747468305833 2023-11-28 20:12:36,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3652453.3333333335, ans=0.0 2023-11-28 20:12:38,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=22.5 2023-11-28 20:12:42,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3652520.0, ans=0.125 2023-11-28 20:12:43,511 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6800, loss[loss=0.0695, simple_loss=0.09869, pruned_loss=0.01158, audio_tagging_loss=0.008583, over 14501.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08938, pruned_loss=0.01195, audio_tagging_loss=0.008502, over 3039724.17 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:13:05,386 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.899e+01 8.847e+01 9.241e+01 1.022e+02 1.257e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-28 20:13:05,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3652586.6666666665, ans=0.125 2023-11-28 20:13:07,827 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547900 2023-11-28 20:13:36,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3652786.6666666665, ans=0.0 2023-11-28 20:13:45,134 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6850, loss[loss=0.07572, simple_loss=0.1032, pruned_loss=0.01571, audio_tagging_loss=0.008425, over 15113.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08981, pruned_loss=0.01197, audio_tagging_loss=0.008422, over 3040993.39 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:13:54,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3652853.3333333335, ans=0.125 2023-11-28 20:14:10,188 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 547950 2023-11-28 20:14:27,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3653053.3333333335, ans=0.125 2023-11-28 20:14:30,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3653053.3333333335, ans=0.125 2023-11-28 20:14:34,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3653120.0, ans=0.0 2023-11-28 20:14:38,407 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2023-11-28 20:14:46,446 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6900, loss[loss=0.07703, simple_loss=0.1009, pruned_loss=0.01738, audio_tagging_loss=0.00921, over 15579.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.09022, pruned_loss=0.01224, audio_tagging_loss=0.008361, over 3041035.83 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 32.0 2023-11-28 20:15:00,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3653253.3333333335, ans=0.125 2023-11-28 20:15:09,731 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 9.041e+01 9.811e+01 1.026e+02 3.153e+02, threshold=1.962e+02, percent-clipped=1.0 2023-11-28 20:15:11,014 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548000 2023-11-28 20:15:22,008 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.00 vs. limit=10.0 2023-11-28 20:15:34,265 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.74 vs. limit=15.0 2023-11-28 20:15:39,534 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 20:15:50,592 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 6950, loss[loss=0.07854, simple_loss=0.1096, pruned_loss=0.0152, audio_tagging_loss=0.008529, over 14934.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09048, pruned_loss=0.01212, audio_tagging_loss=0.008464, over 3039565.03 frames. ], batch size: 59, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:15:51,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3653520.0, ans=0.125 2023-11-28 20:16:10,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3653586.6666666665, ans=0.0 2023-11-28 20:16:10,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.19 vs. limit=6.0 2023-11-28 20:16:13,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3653653.3333333335, ans=0.1 2023-11-28 20:16:14,666 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548050 2023-11-28 20:16:19,344 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.50 vs. limit=15.0 2023-11-28 20:16:27,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3653720.0, ans=0.125 2023-11-28 20:16:51,955 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7000, loss[loss=0.063, simple_loss=0.08106, pruned_loss=0.01278, audio_tagging_loss=0.009682, over 14770.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08992, pruned_loss=0.01204, audio_tagging_loss=0.008444, over 3040906.51 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:16:52,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3653853.3333333335, ans=0.1 2023-11-28 20:17:02,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3653920.0, ans=0.1 2023-11-28 20:17:15,185 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.929e+01 9.464e+01 1.026e+02 1.498e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-28 20:17:15,763 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.12 vs. limit=15.0 2023-11-28 20:17:16,470 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548100 2023-11-28 20:17:19,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3653986.6666666665, ans=0.125 2023-11-28 20:17:27,094 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2023-11-28 20:17:53,194 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.79 vs. limit=15.0 2023-11-28 20:17:53,497 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7050, loss[loss=0.05591, simple_loss=0.06835, pruned_loss=0.01165, audio_tagging_loss=0.01008, over 15469.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08949, pruned_loss=0.01215, audio_tagging_loss=0.008551, over 3042507.33 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:18:04,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3654253.3333333335, ans=0.125 2023-11-28 20:18:10,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3654253.3333333335, ans=0.125 2023-11-28 20:18:18,699 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548150 2023-11-28 20:18:27,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3654320.0, ans=0.09899494936611666 2023-11-28 20:18:43,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3654453.3333333335, ans=0.125 2023-11-28 20:18:56,451 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7100, loss[loss=0.0647, simple_loss=0.07978, pruned_loss=0.01587, audio_tagging_loss=0.008942, over 14855.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08931, pruned_loss=0.01206, audio_tagging_loss=0.008611, over 3044959.18 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:19:00,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=3654520.0, ans=0.2 2023-11-28 20:19:09,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3654586.6666666665, ans=0.1 2023-11-28 20:19:13,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3654586.6666666665, ans=0.125 2023-11-28 20:19:17,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3654586.6666666665, ans=0.125 2023-11-28 20:19:19,319 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 9.027e+01 9.544e+01 1.031e+02 1.344e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-28 20:19:20,604 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548200 2023-11-28 20:19:20,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3654653.3333333335, ans=0.5 2023-11-28 20:19:25,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3654653.3333333335, ans=0.0 2023-11-28 20:19:29,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3654653.3333333335, ans=0.125 2023-11-28 20:19:34,116 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.35 vs. limit=15.0 2023-11-28 20:19:36,182 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-11-28 20:19:44,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3654786.6666666665, ans=0.05 2023-11-28 20:19:58,793 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7150, loss[loss=0.06278, simple_loss=0.08634, pruned_loss=0.009026, audio_tagging_loss=0.01058, over 16087.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08999, pruned_loss=0.01208, audio_tagging_loss=0.008662, over 3050530.41 frames. ], batch size: 61, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:19:58,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3654853.3333333335, ans=0.1 2023-11-28 20:20:03,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3654853.3333333335, ans=0.125 2023-11-28 20:20:17,260 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=15.0 2023-11-28 20:20:22,551 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548250 2023-11-28 20:20:22,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3654986.6666666665, ans=0.125 2023-11-28 20:20:26,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3654986.6666666665, ans=0.125 2023-11-28 20:20:58,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3655186.6666666665, ans=0.125 2023-11-28 20:20:59,403 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7200, loss[loss=0.07818, simple_loss=0.1066, pruned_loss=0.01577, audio_tagging_loss=0.009135, over 15400.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.09, pruned_loss=0.01215, audio_tagging_loss=0.008716, over 3053986.06 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:21:02,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3655186.6666666665, ans=0.1 2023-11-28 20:21:07,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3655186.6666666665, ans=0.1 2023-11-28 20:21:08,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.94 vs. limit=10.0 2023-11-28 20:21:13,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3655253.3333333335, ans=0.125 2023-11-28 20:21:13,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3655253.3333333335, ans=0.125 2023-11-28 20:21:15,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3655253.3333333335, ans=0.125 2023-11-28 20:21:23,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3655320.0, ans=0.0 2023-11-28 20:21:24,330 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.059e+01 9.674e+01 1.051e+02 1.523e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 20:21:24,446 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548300 2023-11-28 20:21:34,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3655320.0, ans=0.125 2023-11-28 20:22:01,300 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7250, loss[loss=0.05601, simple_loss=0.07402, pruned_loss=0.01004, audio_tagging_loss=0.008959, over 14046.00 frames. ], tot_loss[loss=0.06625, simple_loss=0.09047, pruned_loss=0.01226, audio_tagging_loss=0.00876, over 3050011.18 frames. ], batch size: 54, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:22:08,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3655520.0, ans=0.125 2023-11-28 20:22:26,006 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548350 2023-11-28 20:22:33,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3655653.3333333335, ans=0.125 2023-11-28 20:22:33,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3655653.3333333335, ans=0.0 2023-11-28 20:22:43,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3655720.0, ans=0.07 2023-11-28 20:22:55,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3655786.6666666665, ans=0.2 2023-11-28 20:23:03,247 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7300, loss[loss=0.05575, simple_loss=0.07809, pruned_loss=0.007876, audio_tagging_loss=0.008827, over 15097.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09032, pruned_loss=0.01216, audio_tagging_loss=0.008684, over 3048784.44 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:23:24,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3655920.0, ans=0.0 2023-11-28 20:23:24,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3655920.0, ans=0.125 2023-11-28 20:23:27,656 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.253e+01 8.815e+01 9.477e+01 1.035e+02 1.367e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 20:23:27,761 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548400 2023-11-28 20:23:34,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3655986.6666666665, ans=0.125 2023-11-28 20:23:42,400 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=8.0 2023-11-28 20:23:58,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3656120.0, ans=0.125 2023-11-28 20:24:02,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3656120.0, ans=0.1 2023-11-28 20:24:04,864 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7350, loss[loss=0.05808, simple_loss=0.08442, pruned_loss=0.00593, audio_tagging_loss=0.009941, over 16458.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08989, pruned_loss=0.01197, audio_tagging_loss=0.008497, over 3045508.99 frames. ], batch size: 60, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:24:09,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3656186.6666666665, ans=10.0 2023-11-28 20:24:17,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3656253.3333333335, ans=0.125 2023-11-28 20:24:29,517 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548450 2023-11-28 20:25:06,608 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7400, loss[loss=0.06305, simple_loss=0.08379, pruned_loss=0.01303, audio_tagging_loss=0.008121, over 15324.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08936, pruned_loss=0.012, audio_tagging_loss=0.008469, over 3049448.70 frames. ], batch size: 57, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:25:08,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3656520.0, ans=0.125 2023-11-28 20:25:19,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3656586.6666666665, ans=0.0 2023-11-28 20:25:19,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3656586.6666666665, ans=0.0 2023-11-28 20:25:26,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3656586.6666666665, ans=0.125 2023-11-28 20:25:30,864 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 8.862e+01 9.547e+01 1.020e+02 1.427e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-28 20:25:30,977 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548500 2023-11-28 20:25:35,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3656653.3333333335, ans=0.2 2023-11-28 20:25:44,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3656720.0, ans=0.07 2023-11-28 20:25:52,673 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.08 vs. limit=15.0 2023-11-28 20:25:55,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3656786.6666666665, ans=0.125 2023-11-28 20:26:02,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3656786.6666666665, ans=0.125 2023-11-28 20:26:05,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3656786.6666666665, ans=0.0 2023-11-28 20:26:07,291 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7450, loss[loss=0.05819, simple_loss=0.07735, pruned_loss=0.01045, audio_tagging_loss=0.009058, over 15116.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.09016, pruned_loss=0.01206, audio_tagging_loss=0.008396, over 3052950.85 frames. ], batch size: 58, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:26:32,853 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548550 2023-11-28 20:26:37,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3656986.6666666665, ans=0.125 2023-11-28 20:26:44,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3657053.3333333335, ans=0.125 2023-11-28 20:26:46,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3657053.3333333335, ans=0.125 2023-11-28 20:26:47,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3657053.3333333335, ans=0.0 2023-11-28 20:26:50,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3657053.3333333335, ans=0.0 2023-11-28 20:26:56,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3657120.0, ans=0.0 2023-11-28 20:27:06,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3657120.0, ans=0.2 2023-11-28 20:27:06,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3657120.0, ans=0.125 2023-11-28 20:27:07,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3657120.0, ans=0.125 2023-11-28 20:27:09,833 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7500, loss[loss=0.05921, simple_loss=0.08611, pruned_loss=0.009136, audio_tagging_loss=0.00702, over 15224.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08934, pruned_loss=0.01195, audio_tagging_loss=0.008457, over 3057447.15 frames. ], batch size: 55, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:27:11,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3657186.6666666665, ans=0.2 2023-11-28 20:27:13,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3657186.6666666665, ans=0.125 2023-11-28 20:27:22,545 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.39 vs. limit=10.0 2023-11-28 20:27:25,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3657253.3333333335, ans=0.2 2023-11-28 20:27:34,126 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.864e+01 9.552e+01 1.022e+02 1.615e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 20:27:34,228 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548600 2023-11-28 20:27:34,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3657320.0, ans=0.125 2023-11-28 20:27:38,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3657320.0, ans=0.0 2023-11-28 20:27:44,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3657320.0, ans=0.0 2023-11-28 20:27:56,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3657386.6666666665, ans=0.09899494936611666 2023-11-28 20:28:02,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=3657453.3333333335, ans=0.1 2023-11-28 20:28:09,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3657453.3333333335, ans=0.125 2023-11-28 20:28:12,510 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7550, loss[loss=0.05885, simple_loss=0.07953, pruned_loss=0.01234, audio_tagging_loss=0.00674, over 15388.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09043, pruned_loss=0.01229, audio_tagging_loss=0.008432, over 3058643.88 frames. ], batch size: 56, lr: 1.47e-03, grad_scale: 16.0 2023-11-28 20:28:14,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3657520.0, ans=0.125 2023-11-28 20:28:18,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3657520.0, ans=0.0 2023-11-28 20:28:37,439 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548650 2023-11-28 20:28:39,035 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2023-11-28 20:29:13,267 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7600, loss[loss=0.0821, simple_loss=0.1102, pruned_loss=0.01605, audio_tagging_loss=0.01095, over 14691.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08983, pruned_loss=0.01225, audio_tagging_loss=0.008493, over 3054411.00 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:29:36,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3657920.0, ans=0.2 2023-11-28 20:29:38,925 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.869e+01 9.036e+01 9.725e+01 1.055e+02 1.335e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-28 20:29:39,057 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548700 2023-11-28 20:29:45,761 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.83 vs. limit=22.5 2023-11-28 20:29:56,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3658053.3333333335, ans=0.125 2023-11-28 20:30:10,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3658120.0, ans=0.2 2023-11-28 20:30:15,866 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7650, loss[loss=0.07462, simple_loss=0.1042, pruned_loss=0.01666, audio_tagging_loss=0.005841, over 15261.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09024, pruned_loss=0.01239, audio_tagging_loss=0.008473, over 3058830.90 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:30:27,569 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=12.0 2023-11-28 20:30:37,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3658253.3333333335, ans=0.125 2023-11-28 20:30:40,711 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548750 2023-11-28 20:30:42,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3658320.0, ans=0.2 2023-11-28 20:30:44,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3658320.0, ans=0.2 2023-11-28 20:31:03,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3658386.6666666665, ans=0.1 2023-11-28 20:31:05,981 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=12.0 2023-11-28 20:31:17,767 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7700, loss[loss=0.05362, simple_loss=0.06598, pruned_loss=0.01178, audio_tagging_loss=0.008848, over 14703.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09044, pruned_loss=0.01243, audio_tagging_loss=0.008445, over 3058912.94 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:31:23,882 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=15.0 2023-11-28 20:31:37,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3658586.6666666665, ans=0.0 2023-11-28 20:31:42,360 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548800 2023-11-28 20:31:43,473 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.090e+01 9.039e+01 9.612e+01 1.044e+02 1.351e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 20:32:06,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3658786.6666666665, ans=0.125 2023-11-28 20:32:09,669 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:32:12,208 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2023-11-28 20:32:19,642 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7750, loss[loss=0.04409, simple_loss=0.05782, pruned_loss=0.007023, audio_tagging_loss=0.008153, over 14937.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09041, pruned_loss=0.01236, audio_tagging_loss=0.008496, over 3056908.39 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:32:22,565 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=15.0 2023-11-28 20:32:27,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3658853.3333333335, ans=0.125 2023-11-28 20:32:44,874 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548850 2023-11-28 20:32:57,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3659053.3333333335, ans=0.0 2023-11-28 20:33:08,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3659120.0, ans=0.0 2023-11-28 20:33:16,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3659120.0, ans=0.125 2023-11-28 20:33:22,036 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7800, loss[loss=0.0845, simple_loss=0.109, pruned_loss=0.01832, audio_tagging_loss=0.01171, over 14988.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08951, pruned_loss=0.01221, audio_tagging_loss=0.008577, over 3062436.22 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:33:42,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3659253.3333333335, ans=0.1 2023-11-28 20:33:47,372 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548900 2023-11-28 20:33:48,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.914e+01 9.757e+01 1.064e+02 1.306e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-28 20:33:53,803 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2023-11-28 20:34:24,306 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7850, loss[loss=0.05891, simple_loss=0.07596, pruned_loss=0.01269, audio_tagging_loss=0.008232, over 15683.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.09039, pruned_loss=0.0124, audio_tagging_loss=0.008575, over 3060927.32 frames. ], batch size: 62, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:34:25,624 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:34:29,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3659520.0, ans=0.125 2023-11-28 20:34:34,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3659520.0, ans=0.0 2023-11-28 20:34:36,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3659586.6666666665, ans=0.0 2023-11-28 20:34:41,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3659586.6666666665, ans=0.125 2023-11-28 20:34:46,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3659586.6666666665, ans=0.125 2023-11-28 20:34:49,043 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 548950 2023-11-28 20:35:18,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3659786.6666666665, ans=0.125 2023-11-28 20:35:25,337 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7900, loss[loss=0.07472, simple_loss=0.09819, pruned_loss=0.01345, audio_tagging_loss=0.01218, over 15229.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.09031, pruned_loss=0.01243, audio_tagging_loss=0.008663, over 3055804.81 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:35:30,438 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.89 vs. limit=15.0 2023-11-28 20:35:30,558 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.39 vs. limit=6.0 2023-11-28 20:35:31,895 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=22.5 2023-11-28 20:35:49,661 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549000 2023-11-28 20:35:50,709 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.838e+01 9.083e+01 9.481e+01 1.023e+02 1.467e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 20:36:05,746 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.00 vs. limit=10.0 2023-11-28 20:36:26,836 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 7950, loss[loss=0.0673, simple_loss=0.09153, pruned_loss=0.01068, audio_tagging_loss=0.01085, over 15108.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09033, pruned_loss=0.01253, audio_tagging_loss=0.008725, over 3053654.54 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:36:30,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3660186.6666666665, ans=0.2 2023-11-28 20:36:33,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3660186.6666666665, ans=0.125 2023-11-28 20:36:34,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3660186.6666666665, ans=0.125 2023-11-28 20:36:44,695 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 20:36:52,476 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549050 2023-11-28 20:36:59,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3660320.0, ans=0.1 2023-11-28 20:37:04,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3660386.6666666665, ans=0.125 2023-11-28 20:37:08,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3660386.6666666665, ans=0.2 2023-11-28 20:37:21,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3660453.3333333335, ans=0.125 2023-11-28 20:37:28,990 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8000, loss[loss=0.05152, simple_loss=0.07034, pruned_loss=0.007973, audio_tagging_loss=0.008375, over 14862.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08976, pruned_loss=0.0125, audio_tagging_loss=0.008863, over 3051104.68 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:37:38,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3660520.0, ans=0.0 2023-11-28 20:37:53,762 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549100 2023-11-28 20:37:54,807 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.078e+01 8.902e+01 9.396e+01 1.018e+02 1.315e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-28 20:37:57,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3660653.3333333335, ans=0.125 2023-11-28 20:38:21,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3660786.6666666665, ans=0.125 2023-11-28 20:38:31,172 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8050, loss[loss=0.08472, simple_loss=0.1214, pruned_loss=0.01865, audio_tagging_loss=0.005371, over 17147.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09046, pruned_loss=0.01243, audio_tagging_loss=0.008864, over 3049592.47 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:38:40,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3660853.3333333335, ans=0.125 2023-11-28 20:38:52,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3660920.0, ans=0.125 2023-11-28 20:38:55,130 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549150 2023-11-28 20:38:55,640 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2023-11-28 20:39:32,421 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8100, loss[loss=0.08442, simple_loss=0.1128, pruned_loss=0.01918, audio_tagging_loss=0.008834, over 15075.00 frames. ], tot_loss[loss=0.06652, simple_loss=0.09038, pruned_loss=0.01262, audio_tagging_loss=0.008702, over 3051673.88 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:39:40,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3661186.6666666665, ans=0.0 2023-11-28 20:39:41,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3661186.6666666665, ans=0.0 2023-11-28 20:39:56,909 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549200 2023-11-28 20:40:00,131 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 9.129e+01 9.751e+01 1.053e+02 1.304e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-28 20:40:06,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3661320.0, ans=0.125 2023-11-28 20:40:19,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3661386.6666666665, ans=0.125 2023-11-28 20:40:20,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3661453.3333333335, ans=0.125 2023-11-28 20:40:25,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3661453.3333333335, ans=0.05 2023-11-28 20:40:34,198 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8150, loss[loss=0.04925, simple_loss=0.06248, pruned_loss=0.008368, audio_tagging_loss=0.009643, over 15356.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09002, pruned_loss=0.01255, audio_tagging_loss=0.008633, over 3049009.05 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:40:49,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3661586.6666666665, ans=0.125 2023-11-28 20:40:58,832 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549250 2023-11-28 20:41:04,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3661653.3333333335, ans=0.125 2023-11-28 20:41:06,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3661653.3333333335, ans=0.125 2023-11-28 20:41:11,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3661720.0, ans=0.125 2023-11-28 20:41:17,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3661720.0, ans=0.1 2023-11-28 20:41:35,427 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8200, loss[loss=0.04993, simple_loss=0.06422, pruned_loss=0.007298, audio_tagging_loss=0.01053, over 14052.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09032, pruned_loss=0.01235, audio_tagging_loss=0.008527, over 3056077.45 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:41:36,724 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 20:41:59,778 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549300 2023-11-28 20:42:01,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3661986.6666666665, ans=0.05 2023-11-28 20:42:02,045 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.159e+01 8.719e+01 9.487e+01 1.033e+02 1.453e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 20:42:23,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3662120.0, ans=0.125 2023-11-28 20:42:36,662 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8250, loss[loss=0.06833, simple_loss=0.0922, pruned_loss=0.01486, audio_tagging_loss=0.007372, over 14558.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09, pruned_loss=0.01225, audio_tagging_loss=0.00852, over 3059216.62 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:43:00,543 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549350 2023-11-28 20:43:04,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3662320.0, ans=0.0 2023-11-28 20:43:34,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3662453.3333333335, ans=0.0 2023-11-28 20:43:37,405 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8300, loss[loss=0.08382, simple_loss=0.1193, pruned_loss=0.01594, audio_tagging_loss=0.008217, over 15593.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09039, pruned_loss=0.01226, audio_tagging_loss=0.008454, over 3053157.23 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:43:38,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=3662520.0, ans=0.02 2023-11-28 20:43:38,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3662520.0, ans=0.0 2023-11-28 20:43:40,652 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:43:58,256 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=15.0 2023-11-28 20:44:02,466 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549400 2023-11-28 20:44:05,012 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 9.200e+01 9.674e+01 1.037e+02 1.231e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 20:44:06,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3662653.3333333335, ans=0.125 2023-11-28 20:44:07,904 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.41 vs. limit=10.0 2023-11-28 20:44:16,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3662720.0, ans=0.2 2023-11-28 20:44:24,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3662720.0, ans=0.1 2023-11-28 20:44:39,616 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8350, loss[loss=0.06121, simple_loss=0.08751, pruned_loss=0.00858, audio_tagging_loss=0.008876, over 15103.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.09004, pruned_loss=0.01217, audio_tagging_loss=0.008476, over 3055820.65 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:44:42,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3662853.3333333335, ans=0.125 2023-11-28 20:45:04,334 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549450 2023-11-28 20:45:11,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3662986.6666666665, ans=0.0 2023-11-28 20:45:40,906 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8400, loss[loss=0.05662, simple_loss=0.08069, pruned_loss=0.007921, audio_tagging_loss=0.008356, over 14512.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09018, pruned_loss=0.01216, audio_tagging_loss=0.008502, over 3054173.53 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:45:46,720 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2023-11-28 20:46:05,433 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549500 2023-11-28 20:46:07,779 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.367e+01 8.725e+01 9.379e+01 9.933e+01 1.253e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-28 20:46:11,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3663320.0, ans=0.0 2023-11-28 20:46:29,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3663453.3333333335, ans=0.2 2023-11-28 20:46:42,560 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8450, loss[loss=0.05379, simple_loss=0.07473, pruned_loss=0.005687, audio_tagging_loss=0.01074, over 15105.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.08939, pruned_loss=0.01207, audio_tagging_loss=0.008514, over 3051227.08 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:46:55,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3663586.6666666665, ans=0.1 2023-11-28 20:47:00,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3663586.6666666665, ans=0.125 2023-11-28 20:47:07,146 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549550 2023-11-28 20:47:07,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3663653.3333333335, ans=0.0 2023-11-28 20:47:34,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3663786.6666666665, ans=0.125 2023-11-28 20:47:44,290 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8500, loss[loss=0.06217, simple_loss=0.092, pruned_loss=0.009533, audio_tagging_loss=0.006637, over 14706.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08987, pruned_loss=0.01207, audio_tagging_loss=0.008409, over 3051751.19 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:47:47,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3663853.3333333335, ans=0.125 2023-11-28 20:47:53,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3663853.3333333335, ans=0.125 2023-11-28 20:48:09,432 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549600 2023-11-28 20:48:11,169 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2023-11-28 20:48:11,907 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.752e+01 9.470e+01 1.030e+02 1.283e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 20:48:16,017 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2023-11-28 20:48:29,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3664053.3333333335, ans=0.125 2023-11-28 20:48:36,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3664120.0, ans=0.2 2023-11-28 20:48:46,224 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8550, loss[loss=0.05351, simple_loss=0.07093, pruned_loss=0.01021, audio_tagging_loss=0.007836, over 14711.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08967, pruned_loss=0.01219, audio_tagging_loss=0.008517, over 3046151.61 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:48:50,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3664186.6666666665, ans=0.125 2023-11-28 20:49:02,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3664253.3333333335, ans=0.1 2023-11-28 20:49:03,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3664253.3333333335, ans=0.125 2023-11-28 20:49:10,928 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549650 2023-11-28 20:49:26,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3664386.6666666665, ans=0.125 2023-11-28 20:49:35,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3664453.3333333335, ans=0.125 2023-11-28 20:49:43,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3664453.3333333335, ans=0.0 2023-11-28 20:49:47,962 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8600, loss[loss=0.0675, simple_loss=0.09469, pruned_loss=0.01068, audio_tagging_loss=0.009468, over 14758.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.0896, pruned_loss=0.01203, audio_tagging_loss=0.008513, over 3046872.03 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:49:57,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3664520.0, ans=0.5 2023-11-28 20:50:02,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=3664586.6666666665, ans=0.05 2023-11-28 20:50:04,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3664586.6666666665, ans=0.125 2023-11-28 20:50:04,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3664586.6666666665, ans=0.0 2023-11-28 20:50:12,693 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549700 2023-11-28 20:50:14,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3664653.3333333335, ans=0.2 2023-11-28 20:50:16,006 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.762e+01 8.961e+01 9.460e+01 1.024e+02 1.421e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-28 20:50:17,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3664653.3333333335, ans=0.1 2023-11-28 20:50:19,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3664653.3333333335, ans=0.0 2023-11-28 20:50:49,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3664853.3333333335, ans=0.0 2023-11-28 20:50:49,977 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8650, loss[loss=0.0643, simple_loss=0.08458, pruned_loss=0.0132, audio_tagging_loss=0.008807, over 15027.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09119, pruned_loss=0.01232, audio_tagging_loss=0.008586, over 3048461.68 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:50:51,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3664853.3333333335, ans=0.125 2023-11-28 20:50:53,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3664853.3333333335, ans=0.95 2023-11-28 20:50:59,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3664853.3333333335, ans=0.5 2023-11-28 20:51:07,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3664920.0, ans=0.0 2023-11-28 20:51:08,853 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.02 vs. limit=15.0 2023-11-28 20:51:15,601 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549750 2023-11-28 20:51:31,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3665053.3333333335, ans=0.0 2023-11-28 20:51:43,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3665120.0, ans=0.2 2023-11-28 20:51:51,516 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8700, loss[loss=0.0431, simple_loss=0.05987, pruned_loss=0.004932, audio_tagging_loss=0.008234, over 13856.00 frames. ], tot_loss[loss=0.066, simple_loss=0.09048, pruned_loss=0.01211, audio_tagging_loss=0.008654, over 3050716.79 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:52:04,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3665253.3333333335, ans=0.125 2023-11-28 20:52:04,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3665253.3333333335, ans=0.1 2023-11-28 20:52:16,833 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549800 2023-11-28 20:52:20,545 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 8.859e+01 9.712e+01 1.046e+02 1.344e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 20:52:24,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3665320.0, ans=0.125 2023-11-28 20:52:36,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3665386.6666666665, ans=0.0 2023-11-28 20:52:38,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3665386.6666666665, ans=0.1 2023-11-28 20:52:41,621 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2023-11-28 20:52:42,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3665453.3333333335, ans=0.125 2023-11-28 20:52:53,850 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8750, loss[loss=0.06978, simple_loss=0.09366, pruned_loss=0.01289, audio_tagging_loss=0.01006, over 14437.00 frames. ], tot_loss[loss=0.06654, simple_loss=0.09111, pruned_loss=0.01221, audio_tagging_loss=0.00878, over 3053848.89 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:52:58,150 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2023-11-28 20:53:05,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3665586.6666666665, ans=0.125 2023-11-28 20:53:18,474 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549850 2023-11-28 20:53:45,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3665786.6666666665, ans=0.125 2023-11-28 20:53:55,648 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8800, loss[loss=0.04064, simple_loss=0.04549, pruned_loss=0.005777, audio_tagging_loss=0.01212, over 15564.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09073, pruned_loss=0.01228, audio_tagging_loss=0.008846, over 3052815.51 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 20:54:19,672 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549900 2023-11-28 20:54:23,626 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.083e+01 9.097e+01 9.649e+01 1.028e+02 1.198e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 20:54:53,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3666120.0, ans=0.07 2023-11-28 20:54:56,760 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8850, loss[loss=0.0644, simple_loss=0.07761, pruned_loss=0.01648, audio_tagging_loss=0.009114, over 16089.00 frames. ], tot_loss[loss=0.06621, simple_loss=0.09032, pruned_loss=0.01225, audio_tagging_loss=0.008796, over 3053266.77 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 20:55:03,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3666186.6666666665, ans=0.04949747468305833 2023-11-28 20:55:04,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3666186.6666666665, ans=0.0 2023-11-28 20:55:09,114 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 20:55:09,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3666253.3333333335, ans=0.2 2023-11-28 20:55:21,804 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 549950 2023-11-28 20:55:30,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3666320.0, ans=0.1 2023-11-28 20:55:37,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3666386.6666666665, ans=0.125 2023-11-28 20:55:58,452 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8900, loss[loss=0.08037, simple_loss=0.1176, pruned_loss=0.01527, audio_tagging_loss=0.006283, over 16495.00 frames. ], tot_loss[loss=0.06675, simple_loss=0.09136, pruned_loss=0.0124, audio_tagging_loss=0.008669, over 3060820.66 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 20:56:11,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3666586.6666666665, ans=0.0 2023-11-28 20:56:22,190 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.16 vs. limit=6.0 2023-11-28 20:56:23,513 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550000 2023-11-28 20:56:26,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3666653.3333333335, ans=0.125 2023-11-28 20:56:29,507 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.959e+01 9.608e+01 1.039e+02 1.784e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 20:56:32,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-28 20:56:40,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3666720.0, ans=0.0 2023-11-28 20:56:43,344 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:56:44,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3666720.0, ans=0.1 2023-11-28 20:57:00,005 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 8950, loss[loss=0.06652, simple_loss=0.09719, pruned_loss=0.01143, audio_tagging_loss=0.006495, over 15654.00 frames. ], tot_loss[loss=0.06619, simple_loss=0.09065, pruned_loss=0.01229, audio_tagging_loss=0.008582, over 3050780.70 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 20:57:12,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=3666920.0, ans=12.0 2023-11-28 20:57:20,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3666920.0, ans=0.0 2023-11-28 20:57:21,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3666920.0, ans=0.125 2023-11-28 20:57:21,937 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.20 vs. limit=5.0 2023-11-28 20:57:22,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3666920.0, ans=0.125 2023-11-28 20:57:24,690 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550050 2023-11-28 20:57:37,769 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 20:58:00,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3667120.0, ans=0.0 2023-11-28 20:58:02,944 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9000, loss[loss=0.05614, simple_loss=0.07196, pruned_loss=0.01096, audio_tagging_loss=0.009199, over 15233.00 frames. ], tot_loss[loss=0.06637, simple_loss=0.09072, pruned_loss=0.01246, audio_tagging_loss=0.008558, over 3051371.51 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 20:58:02,945 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 20:58:42,568 INFO [train_asr.py:1267] (2/4) Epoch 46, validation: loss=0.05897, simple_loss=0.05047, pruned_loss=0.005253, audio_tagging_loss=0.02848, over 4681554.00 frames. 2023-11-28 20:58:42,569 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 20:59:07,573 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550100 2023-11-28 20:59:13,416 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.853e+01 9.549e+01 1.044e+02 1.258e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-28 20:59:29,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3667386.6666666665, ans=0.0 2023-11-28 20:59:44,277 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.03 vs. limit=22.5 2023-11-28 20:59:44,793 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9050, loss[loss=0.0851, simple_loss=0.1338, pruned_loss=0.01363, audio_tagging_loss=0.004576, over 15551.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09107, pruned_loss=0.01247, audio_tagging_loss=0.00849, over 3052376.86 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:00:07,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3667653.3333333335, ans=0.1 2023-11-28 21:00:08,943 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550150 2023-11-28 21:00:11,326 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=15.0 2023-11-28 21:00:27,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3667720.0, ans=0.125 2023-11-28 21:00:46,671 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9100, loss[loss=0.05846, simple_loss=0.08448, pruned_loss=0.008697, audio_tagging_loss=0.007518, over 14427.00 frames. ], tot_loss[loss=0.0661, simple_loss=0.09052, pruned_loss=0.01235, audio_tagging_loss=0.00849, over 3051063.83 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:01:12,324 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550200 2023-11-28 21:01:18,386 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.247e+01 8.959e+01 9.673e+01 1.042e+02 1.442e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 21:01:21,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3667986.6666666665, ans=0.125 2023-11-28 21:01:24,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3668053.3333333335, ans=0.2 2023-11-28 21:01:47,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3668186.6666666665, ans=0.0 2023-11-28 21:01:48,501 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9150, loss[loss=0.05437, simple_loss=0.06981, pruned_loss=0.007527, audio_tagging_loss=0.01194, over 15094.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08943, pruned_loss=0.01205, audio_tagging_loss=0.008592, over 3057834.56 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:01:54,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3668186.6666666665, ans=0.125 2023-11-28 21:02:13,326 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550250 2023-11-28 21:02:16,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3668320.0, ans=0.1 2023-11-28 21:02:50,569 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9200, loss[loss=0.07548, simple_loss=0.1002, pruned_loss=0.01519, audio_tagging_loss=0.01018, over 16293.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08882, pruned_loss=0.01188, audio_tagging_loss=0.008618, over 3056056.98 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:03:00,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3668520.0, ans=0.2 2023-11-28 21:03:00,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3668520.0, ans=0.0 2023-11-28 21:03:03,806 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.29 vs. limit=15.0 2023-11-28 21:03:05,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3668586.6666666665, ans=0.125 2023-11-28 21:03:14,994 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550300 2023-11-28 21:03:21,344 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 8.689e+01 9.468e+01 1.009e+02 1.302e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 21:03:38,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3668720.0, ans=0.125 2023-11-28 21:03:49,365 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:03:52,639 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9250, loss[loss=0.06594, simple_loss=0.09439, pruned_loss=0.008906, audio_tagging_loss=0.009837, over 15214.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.09006, pruned_loss=0.01213, audio_tagging_loss=0.008418, over 3057409.91 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:04:07,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3668920.0, ans=0.125 2023-11-28 21:04:17,710 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550350 2023-11-28 21:04:17,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3668986.6666666665, ans=0.0 2023-11-28 21:04:54,340 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9300, loss[loss=0.06327, simple_loss=0.08908, pruned_loss=0.01196, audio_tagging_loss=0.006774, over 14937.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08956, pruned_loss=0.01218, audio_tagging_loss=0.008483, over 3056244.41 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:04:59,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3669186.6666666665, ans=0.2 2023-11-28 21:05:02,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3669186.6666666665, ans=0.125 2023-11-28 21:05:15,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3669253.3333333335, ans=0.125 2023-11-28 21:05:18,874 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550400 2023-11-28 21:05:26,621 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 8.959e+01 9.529e+01 1.028e+02 1.391e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-28 21:05:49,285 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2023-11-28 21:05:51,343 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:05:53,549 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.06 vs. limit=10.0 2023-11-28 21:05:56,372 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9350, loss[loss=0.05909, simple_loss=0.08313, pruned_loss=0.009787, audio_tagging_loss=0.007737, over 15255.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08923, pruned_loss=0.01222, audio_tagging_loss=0.008572, over 3052347.90 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:06:15,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3669586.6666666665, ans=0.0 2023-11-28 21:06:20,853 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550450 2023-11-28 21:06:22,463 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=15.0 2023-11-28 21:06:45,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3669786.6666666665, ans=0.1 2023-11-28 21:06:53,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3669786.6666666665, ans=0.125 2023-11-28 21:06:58,205 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9400, loss[loss=0.08509, simple_loss=0.1108, pruned_loss=0.02113, audio_tagging_loss=0.008587, over 15619.00 frames. ], tot_loss[loss=0.06573, simple_loss=0.08977, pruned_loss=0.01223, audio_tagging_loss=0.008611, over 3056760.46 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:07:11,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3669920.0, ans=0.0 2023-11-28 21:07:19,135 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:07:22,477 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550500 2023-11-28 21:07:30,101 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 9.168e+01 9.669e+01 1.024e+02 1.175e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 21:07:40,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3670053.3333333335, ans=0.0 2023-11-28 21:07:57,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3670120.0, ans=0.1 2023-11-28 21:07:58,072 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:07:59,936 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9450, loss[loss=0.04582, simple_loss=0.05402, pruned_loss=0.0078, audio_tagging_loss=0.01101, over 16304.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08923, pruned_loss=0.01215, audio_tagging_loss=0.008732, over 3058292.92 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:08:02,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3670186.6666666665, ans=0.07 2023-11-28 21:08:15,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3670253.3333333335, ans=0.125 2023-11-28 21:08:23,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3670320.0, ans=0.0 2023-11-28 21:08:24,097 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550550 2023-11-28 21:08:27,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3670320.0, ans=0.0 2023-11-28 21:08:39,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3670386.6666666665, ans=0.5 2023-11-28 21:08:44,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3670386.6666666665, ans=0.95 2023-11-28 21:08:45,459 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=22.5 2023-11-28 21:08:45,659 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2023-11-28 21:08:54,435 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.08 vs. limit=6.0 2023-11-28 21:09:01,330 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9500, loss[loss=0.04869, simple_loss=0.06131, pruned_loss=0.007807, audio_tagging_loss=0.01023, over 14629.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08904, pruned_loss=0.01216, audio_tagging_loss=0.008871, over 3055628.63 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:09:16,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3670586.6666666665, ans=0.1 2023-11-28 21:09:16,679 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.75 vs. limit=15.0 2023-11-28 21:09:19,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3670586.6666666665, ans=0.0 2023-11-28 21:09:25,502 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550600 2023-11-28 21:09:26,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3670653.3333333335, ans=0.125 2023-11-28 21:09:29,784 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=22.5 2023-11-28 21:09:33,454 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 9.097e+01 9.748e+01 1.049e+02 1.377e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-28 21:09:35,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3670653.3333333335, ans=0.125 2023-11-28 21:09:37,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3670720.0, ans=0.1 2023-11-28 21:09:41,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3670720.0, ans=0.125 2023-11-28 21:10:03,510 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9550, loss[loss=0.06784, simple_loss=0.08935, pruned_loss=0.01298, audio_tagging_loss=0.01018, over 15736.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.0894, pruned_loss=0.01227, audio_tagging_loss=0.008861, over 3046615.16 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:10:03,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3670853.3333333335, ans=0.125 2023-11-28 21:10:05,241 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=12.0 2023-11-28 21:10:06,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3670853.3333333335, ans=0.0 2023-11-28 21:10:27,531 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550650 2023-11-28 21:10:28,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3670986.6666666665, ans=0.0 2023-11-28 21:10:35,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3670986.6666666665, ans=0.0 2023-11-28 21:11:01,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3671120.0, ans=0.1 2023-11-28 21:11:02,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3671120.0, ans=0.5 2023-11-28 21:11:04,568 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9600, loss[loss=0.05136, simple_loss=0.06908, pruned_loss=0.007271, audio_tagging_loss=0.009549, over 13710.00 frames. ], tot_loss[loss=0.06614, simple_loss=0.08991, pruned_loss=0.01227, audio_tagging_loss=0.008918, over 3054939.84 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:11:29,117 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550700 2023-11-28 21:11:36,664 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.216e+01 8.932e+01 9.584e+01 1.005e+02 1.481e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 21:11:58,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3671453.3333333335, ans=0.125 2023-11-28 21:11:59,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3671453.3333333335, ans=0.95 2023-11-28 21:12:06,306 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9650, loss[loss=0.05332, simple_loss=0.07606, pruned_loss=0.006431, audio_tagging_loss=0.008861, over 15474.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.08986, pruned_loss=0.01224, audio_tagging_loss=0.008875, over 3050759.85 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:12:18,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3671586.6666666665, ans=0.125 2023-11-28 21:12:23,293 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.16 vs. limit=22.5 2023-11-28 21:12:31,698 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550750 2023-11-28 21:12:48,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3671720.0, ans=0.125 2023-11-28 21:12:49,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3671720.0, ans=0.0 2023-11-28 21:13:07,306 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.00 vs. limit=22.5 2023-11-28 21:13:07,746 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9700, loss[loss=0.0651, simple_loss=0.09351, pruned_loss=0.0103, audio_tagging_loss=0.008046, over 15355.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09054, pruned_loss=0.01225, audio_tagging_loss=0.008697, over 3045241.64 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:13:17,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3671853.3333333335, ans=0.125 2023-11-28 21:13:23,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3671920.0, ans=0.125 2023-11-28 21:13:32,973 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550800 2023-11-28 21:13:40,228 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 9.004e+01 9.586e+01 1.046e+02 1.916e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 21:13:58,597 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.11 vs. limit=22.5 2023-11-28 21:14:07,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3672120.0, ans=0.125 2023-11-28 21:14:10,534 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9750, loss[loss=0.06457, simple_loss=0.08087, pruned_loss=0.01278, audio_tagging_loss=0.01135, over 14515.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08936, pruned_loss=0.01206, audio_tagging_loss=0.008633, over 3038390.14 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:14:18,257 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:14:24,525 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.09 vs. limit=15.0 2023-11-28 21:14:30,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3672253.3333333335, ans=0.125 2023-11-28 21:14:35,337 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550850 2023-11-28 21:14:56,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3672386.6666666665, ans=0.125 2023-11-28 21:15:00,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3672453.3333333335, ans=0.04949747468305833 2023-11-28 21:15:02,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3672453.3333333335, ans=0.125 2023-11-28 21:15:11,891 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9800, loss[loss=0.06864, simple_loss=0.09771, pruned_loss=0.01151, audio_tagging_loss=0.008278, over 15537.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09054, pruned_loss=0.01221, audio_tagging_loss=0.008491, over 3041195.57 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:15:36,359 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550900 2023-11-28 21:15:42,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3672653.3333333335, ans=0.0 2023-11-28 21:15:43,929 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 8.839e+01 9.561e+01 1.020e+02 1.364e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 21:15:56,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3672720.0, ans=0.125 2023-11-28 21:16:07,119 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:16:12,977 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9850, loss[loss=0.07127, simple_loss=0.09656, pruned_loss=0.01366, audio_tagging_loss=0.009324, over 14550.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08994, pruned_loss=0.01204, audio_tagging_loss=0.008528, over 3044047.93 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:16:17,764 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-28 21:16:19,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3672853.3333333335, ans=0.1 2023-11-28 21:16:27,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3672920.0, ans=0.125 2023-11-28 21:16:38,330 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 550950 2023-11-28 21:16:38,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3672986.6666666665, ans=0.0 2023-11-28 21:16:49,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2023-11-28 21:16:53,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3673053.3333333335, ans=0.125 2023-11-28 21:16:57,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3673053.3333333335, ans=0.0 2023-11-28 21:17:14,263 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9900, loss[loss=0.05706, simple_loss=0.06659, pruned_loss=0.01238, audio_tagging_loss=0.01138, over 15982.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08981, pruned_loss=0.01206, audio_tagging_loss=0.008495, over 3044951.37 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:17:27,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3673253.3333333335, ans=0.015 2023-11-28 21:17:33,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3673253.3333333335, ans=0.125 2023-11-28 21:17:39,226 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.69 vs. limit=10.0 2023-11-28 21:17:39,960 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551000 2023-11-28 21:17:48,296 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.969e+01 9.517e+01 1.006e+02 1.259e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-28 21:17:51,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3673386.6666666665, ans=0.125 2023-11-28 21:17:57,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3673386.6666666665, ans=0.95 2023-11-28 21:18:05,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3673453.3333333335, ans=0.125 2023-11-28 21:18:05,583 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=22.5 2023-11-28 21:18:16,917 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 9950, loss[loss=0.07235, simple_loss=0.09367, pruned_loss=0.01669, audio_tagging_loss=0.008826, over 14837.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08975, pruned_loss=0.01211, audio_tagging_loss=0.008501, over 3045783.13 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 8.0 2023-11-28 21:18:28,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3673586.6666666665, ans=0.2 2023-11-28 21:18:30,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3673586.6666666665, ans=0.1 2023-11-28 21:18:33,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3673586.6666666665, ans=0.2 2023-11-28 21:18:41,971 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551050 2023-11-28 21:18:55,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3673720.0, ans=0.1 2023-11-28 21:19:18,494 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10000, loss[loss=0.05373, simple_loss=0.0772, pruned_loss=0.006219, audio_tagging_loss=0.008909, over 15984.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08987, pruned_loss=0.01209, audio_tagging_loss=0.008405, over 3046876.10 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:19:21,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3673853.3333333335, ans=0.1 2023-11-28 21:19:27,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3673853.3333333335, ans=0.2 2023-11-28 21:19:42,972 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551100 2023-11-28 21:19:51,732 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.896e+01 9.118e+01 9.727e+01 1.019e+02 1.264e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-28 21:19:53,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.78 vs. limit=6.0 2023-11-28 21:20:13,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3674120.0, ans=0.1 2023-11-28 21:20:20,072 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10050, loss[loss=0.05734, simple_loss=0.07682, pruned_loss=0.01035, audio_tagging_loss=0.008581, over 14888.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08989, pruned_loss=0.01206, audio_tagging_loss=0.008438, over 3046616.56 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:20:46,145 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551150 2023-11-28 21:20:59,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3674386.6666666665, ans=0.125 2023-11-28 21:21:09,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3674453.3333333335, ans=0.0 2023-11-28 21:21:12,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3674453.3333333335, ans=0.125 2023-11-28 21:21:22,587 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10100, loss[loss=0.06046, simple_loss=0.06683, pruned_loss=0.01425, audio_tagging_loss=0.01279, over 14595.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09031, pruned_loss=0.01211, audio_tagging_loss=0.008468, over 3047913.00 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:21:46,939 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551200 2023-11-28 21:21:47,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3674653.3333333335, ans=0.04949747468305833 2023-11-28 21:21:55,392 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 8.986e+01 9.697e+01 1.060e+02 1.407e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 21:22:07,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3674720.0, ans=0.1 2023-11-28 21:22:08,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3674720.0, ans=0.0 2023-11-28 21:22:08,742 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.84 vs. limit=15.0 2023-11-28 21:22:13,485 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:22:18,007 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.36 vs. limit=10.0 2023-11-28 21:22:21,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3674786.6666666665, ans=0.2 2023-11-28 21:22:24,618 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10150, loss[loss=0.07123, simple_loss=0.1002, pruned_loss=0.01269, audio_tagging_loss=0.008449, over 15120.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.0907, pruned_loss=0.01212, audio_tagging_loss=0.008537, over 3049908.15 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:22:31,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3674853.3333333335, ans=0.2 2023-11-28 21:22:32,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3674853.3333333335, ans=0.2 2023-11-28 21:22:49,240 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551250 2023-11-28 21:22:52,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3674986.6666666665, ans=0.0 2023-11-28 21:22:53,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3674986.6666666665, ans=0.0 2023-11-28 21:22:54,607 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:23:13,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3675120.0, ans=0.2 2023-11-28 21:23:13,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3675120.0, ans=0.0 2023-11-28 21:23:22,828 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=15.0 2023-11-28 21:23:26,853 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10200, loss[loss=0.06319, simple_loss=0.08755, pruned_loss=0.009545, audio_tagging_loss=0.009863, over 15574.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.0898, pruned_loss=0.01189, audio_tagging_loss=0.008595, over 3048936.08 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:23:49,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3675253.3333333335, ans=0.0 2023-11-28 21:23:50,134 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:23:51,968 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551300 2023-11-28 21:24:00,124 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.427e+01 8.789e+01 9.521e+01 1.014e+02 1.270e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 21:24:28,404 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10250, loss[loss=0.05108, simple_loss=0.07397, pruned_loss=0.006407, audio_tagging_loss=0.007692, over 14651.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.0896, pruned_loss=0.01186, audio_tagging_loss=0.008625, over 3052446.12 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:24:42,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3675586.6666666665, ans=0.1 2023-11-28 21:24:53,124 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551350 2023-11-28 21:24:57,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3675653.3333333335, ans=0.1 2023-11-28 21:25:30,783 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10300, loss[loss=0.08186, simple_loss=0.1229, pruned_loss=0.01358, audio_tagging_loss=0.006854, over 16286.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08947, pruned_loss=0.01187, audio_tagging_loss=0.008641, over 3048970.08 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:25:37,069 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2023-11-28 21:25:40,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3675853.3333333335, ans=0.025 2023-11-28 21:25:48,263 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.98 vs. limit=22.5 2023-11-28 21:25:54,961 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551400 2023-11-28 21:26:04,014 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.214e+01 9.006e+01 9.446e+01 1.010e+02 1.376e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-28 21:26:12,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3676053.3333333335, ans=0.125 2023-11-28 21:26:32,628 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10350, loss[loss=0.0641, simple_loss=0.08109, pruned_loss=0.01183, audio_tagging_loss=0.01172, over 13772.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08985, pruned_loss=0.0119, audio_tagging_loss=0.008722, over 3045109.58 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:26:36,903 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=22.5 2023-11-28 21:26:53,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3676253.3333333335, ans=0.125 2023-11-28 21:26:54,691 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.59 vs. limit=22.5 2023-11-28 21:26:56,369 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551450 2023-11-28 21:27:05,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3676320.0, ans=0.125 2023-11-28 21:27:19,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3676386.6666666665, ans=0.0 2023-11-28 21:27:33,567 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10400, loss[loss=0.07851, simple_loss=0.1071, pruned_loss=0.0166, audio_tagging_loss=0.008365, over 14384.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08915, pruned_loss=0.01172, audio_tagging_loss=0.008881, over 3042351.27 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:27:45,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3676586.6666666665, ans=0.0 2023-11-28 21:27:58,427 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551500 2023-11-28 21:28:07,101 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 8.953e+01 9.462e+01 1.003e+02 1.279e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-28 21:28:35,112 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10450, loss[loss=0.07297, simple_loss=0.1038, pruned_loss=0.01484, audio_tagging_loss=0.006234, over 15541.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08901, pruned_loss=0.01178, audio_tagging_loss=0.008815, over 3042702.70 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:28:46,748 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:28:52,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3676920.0, ans=0.125 2023-11-28 21:29:00,194 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551550 2023-11-28 21:29:15,730 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.32 vs. limit=22.5 2023-11-28 21:29:27,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3677120.0, ans=0.125 2023-11-28 21:29:35,200 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:29:37,891 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10500, loss[loss=0.07181, simple_loss=0.1033, pruned_loss=0.01288, audio_tagging_loss=0.007273, over 14439.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08908, pruned_loss=0.01193, audio_tagging_loss=0.008689, over 3042657.84 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:29:38,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3677186.6666666665, ans=0.1 2023-11-28 21:29:41,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3677186.6666666665, ans=0.0 2023-11-28 21:29:44,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3677186.6666666665, ans=0.125 2023-11-28 21:30:02,186 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551600 2023-11-28 21:30:11,091 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.780e+01 9.536e+01 1.016e+02 1.256e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 21:30:32,554 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.39 vs. limit=15.0 2023-11-28 21:30:34,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3677453.3333333335, ans=0.07 2023-11-28 21:30:39,056 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10550, loss[loss=0.04773, simple_loss=0.06459, pruned_loss=0.006777, audio_tagging_loss=0.008662, over 14863.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08803, pruned_loss=0.0118, audio_tagging_loss=0.008684, over 3038167.42 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:30:46,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3677520.0, ans=0.0 2023-11-28 21:31:04,420 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551650 2023-11-28 21:31:06,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3677653.3333333335, ans=0.0 2023-11-28 21:31:09,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3677653.3333333335, ans=0.125 2023-11-28 21:31:09,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3677653.3333333335, ans=0.125 2023-11-28 21:31:40,753 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10600, loss[loss=0.06155, simple_loss=0.07922, pruned_loss=0.01134, audio_tagging_loss=0.0106, over 16376.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08871, pruned_loss=0.01189, audio_tagging_loss=0.008588, over 3038475.73 frames. ], batch size: 64, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:31:57,429 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.47 vs. limit=22.5 2023-11-28 21:32:00,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3677920.0, ans=0.125 2023-11-28 21:32:01,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3677920.0, ans=0.0 2023-11-28 21:32:03,181 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=12.0 2023-11-28 21:32:05,984 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551700 2023-11-28 21:32:14,112 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.474e+01 9.049e+01 9.693e+01 1.043e+02 1.339e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-28 21:32:23,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3678053.3333333335, ans=0.0 2023-11-28 21:32:27,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3678053.3333333335, ans=0.0 2023-11-28 21:32:29,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3678120.0, ans=0.125 2023-11-28 21:32:32,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3678120.0, ans=0.125 2023-11-28 21:32:36,397 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.28 vs. limit=10.0 2023-11-28 21:32:43,483 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10650, loss[loss=0.04168, simple_loss=0.04797, pruned_loss=0.006571, audio_tagging_loss=0.01113, over 15149.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.0878, pruned_loss=0.01181, audio_tagging_loss=0.008578, over 3036863.06 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:32:44,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3678186.6666666665, ans=0.0 2023-11-28 21:32:51,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3678186.6666666665, ans=0.125 2023-11-28 21:32:54,133 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.99 vs. limit=15.0 2023-11-28 21:33:08,129 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551750 2023-11-28 21:33:10,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3678320.0, ans=0.2 2023-11-28 21:33:34,077 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.07 vs. limit=10.0 2023-11-28 21:33:43,709 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-28 21:33:45,308 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10700, loss[loss=0.05801, simple_loss=0.07475, pruned_loss=0.01034, audio_tagging_loss=0.0103, over 14750.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08844, pruned_loss=0.0117, audio_tagging_loss=0.008535, over 3036455.61 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:33:57,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3678586.6666666665, ans=0.1 2023-11-28 21:33:57,383 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.22 vs. limit=15.0 2023-11-28 21:34:10,568 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551800 2023-11-28 21:34:14,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3678653.3333333335, ans=0.125 2023-11-28 21:34:17,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3678653.3333333335, ans=0.125 2023-11-28 21:34:19,738 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.713e+01 9.110e+01 9.632e+01 1.031e+02 2.472e+02, threshold=1.926e+02, percent-clipped=1.0 2023-11-28 21:34:31,427 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2023-11-28 21:34:36,573 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.64 vs. limit=10.0 2023-11-28 21:34:46,728 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.18 vs. limit=15.0 2023-11-28 21:34:48,471 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10750, loss[loss=0.09122, simple_loss=0.1227, pruned_loss=0.02264, audio_tagging_loss=0.007217, over 15112.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08891, pruned_loss=0.01195, audio_tagging_loss=0.008483, over 3047527.33 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:35:12,077 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2023-11-28 21:35:13,722 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551850 2023-11-28 21:35:14,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3678986.6666666665, ans=0.1 2023-11-28 21:35:47,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3679120.0, ans=0.0 2023-11-28 21:35:49,819 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10800, loss[loss=0.07577, simple_loss=0.1068, pruned_loss=0.01433, audio_tagging_loss=0.008062, over 16322.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.08793, pruned_loss=0.01171, audio_tagging_loss=0.008491, over 3056657.76 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:36:09,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.46 vs. limit=22.5 2023-11-28 21:36:11,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3679253.3333333335, ans=0.0 2023-11-28 21:36:15,176 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551900 2023-11-28 21:36:24,355 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 8.859e+01 9.432e+01 1.041e+02 1.593e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 21:36:37,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3679386.6666666665, ans=0.0 2023-11-28 21:36:46,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3679453.3333333335, ans=0.125 2023-11-28 21:36:51,875 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10850, loss[loss=0.05536, simple_loss=0.07618, pruned_loss=0.008252, audio_tagging_loss=0.00902, over 15787.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.08767, pruned_loss=0.01178, audio_tagging_loss=0.008571, over 3053697.89 frames. ], batch size: 64, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:37:04,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3679586.6666666665, ans=0.04949747468305833 2023-11-28 21:37:16,566 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 551950 2023-11-28 21:37:23,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3679653.3333333335, ans=0.0 2023-11-28 21:37:33,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3679720.0, ans=0.0 2023-11-28 21:37:50,016 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:37:53,414 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10900, loss[loss=0.07027, simple_loss=0.09366, pruned_loss=0.01537, audio_tagging_loss=0.008063, over 14839.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08868, pruned_loss=0.01202, audio_tagging_loss=0.008558, over 3052774.20 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:38:17,997 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552000 2023-11-28 21:38:27,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3679986.6666666665, ans=0.0 2023-11-28 21:38:30,860 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.172e+01 9.027e+01 9.612e+01 1.023e+02 1.534e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-28 21:38:38,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3680053.3333333335, ans=0.0 2023-11-28 21:38:43,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3680053.3333333335, ans=0.125 2023-11-28 21:38:52,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3680120.0, ans=0.125 2023-11-28 21:38:54,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3680120.0, ans=0.1 2023-11-28 21:38:57,992 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 10950, loss[loss=0.07191, simple_loss=0.09239, pruned_loss=0.01651, audio_tagging_loss=0.009202, over 14706.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08874, pruned_loss=0.01194, audio_tagging_loss=0.00864, over 3048041.61 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:39:18,361 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-28 21:39:19,483 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.38 vs. limit=12.0 2023-11-28 21:39:23,184 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552050 2023-11-28 21:39:25,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3680320.0, ans=0.125 2023-11-28 21:39:28,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3680320.0, ans=0.07 2023-11-28 21:39:30,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3680320.0, ans=0.125 2023-11-28 21:39:31,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3680320.0, ans=0.125 2023-11-28 21:39:39,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3680386.6666666665, ans=0.0 2023-11-28 21:39:44,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3680386.6666666665, ans=0.1 2023-11-28 21:39:59,046 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11000, loss[loss=0.08237, simple_loss=0.1213, pruned_loss=0.0167, audio_tagging_loss=0.004999, over 15959.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08896, pruned_loss=0.01202, audio_tagging_loss=0.008659, over 3050593.87 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:40:00,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3680520.0, ans=0.015 2023-11-28 21:40:03,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3680520.0, ans=0.125 2023-11-28 21:40:09,699 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:40:16,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3680586.6666666665, ans=0.125 2023-11-28 21:40:19,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3680586.6666666665, ans=0.0 2023-11-28 21:40:23,612 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552100 2023-11-28 21:40:33,347 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.817e+01 9.387e+01 9.947e+01 1.401e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-28 21:40:36,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3680720.0, ans=0.125 2023-11-28 21:40:47,273 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:40:49,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3680786.6666666665, ans=0.125 2023-11-28 21:41:01,329 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11050, loss[loss=0.0853, simple_loss=0.1211, pruned_loss=0.01978, audio_tagging_loss=0.004995, over 15005.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08835, pruned_loss=0.01205, audio_tagging_loss=0.008742, over 3052714.40 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:41:01,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3680853.3333333335, ans=0.125 2023-11-28 21:41:05,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.56 vs. limit=22.5 2023-11-28 21:41:08,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3680853.3333333335, ans=0.0 2023-11-28 21:41:20,898 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2023-11-28 21:41:21,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3680920.0, ans=0.125 2023-11-28 21:41:25,704 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552150 2023-11-28 21:41:40,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3681053.3333333335, ans=0.125 2023-11-28 21:41:46,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3681053.3333333335, ans=0.125 2023-11-28 21:41:51,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3681120.0, ans=0.125 2023-11-28 21:42:00,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3681120.0, ans=0.125 2023-11-28 21:42:02,716 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11100, loss[loss=0.06792, simple_loss=0.0927, pruned_loss=0.00971, audio_tagging_loss=0.01186, over 15588.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08927, pruned_loss=0.01213, audio_tagging_loss=0.008757, over 3052860.02 frames. ], batch size: 57, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:42:25,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3681320.0, ans=0.125 2023-11-28 21:42:27,623 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552200 2023-11-28 21:42:37,751 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.621e+01 8.965e+01 9.771e+01 1.048e+02 1.332e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-28 21:42:42,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3681386.6666666665, ans=0.125 2023-11-28 21:42:56,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3681453.3333333335, ans=0.2 2023-11-28 21:42:57,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3681453.3333333335, ans=0.125 2023-11-28 21:43:00,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3681453.3333333335, ans=0.125 2023-11-28 21:43:04,731 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11150, loss[loss=0.07601, simple_loss=0.1156, pruned_loss=0.01323, audio_tagging_loss=0.004965, over 15479.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08885, pruned_loss=0.01206, audio_tagging_loss=0.008909, over 3049935.25 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:43:08,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3681520.0, ans=0.2 2023-11-28 21:43:23,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3681586.6666666665, ans=0.2 2023-11-28 21:43:25,366 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.66 vs. limit=15.0 2023-11-28 21:43:29,328 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552250 2023-11-28 21:43:34,662 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:43:40,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3681720.0, ans=0.125 2023-11-28 21:43:50,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3681720.0, ans=0.125 2023-11-28 21:43:59,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3681786.6666666665, ans=0.2 2023-11-28 21:44:06,237 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11200, loss[loss=0.06634, simple_loss=0.09485, pruned_loss=0.01103, audio_tagging_loss=0.007886, over 14643.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08897, pruned_loss=0.01204, audio_tagging_loss=0.008976, over 3049740.48 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:44:14,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3681853.3333333335, ans=0.125 2023-11-28 21:44:30,301 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552300 2023-11-28 21:44:41,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.336e+01 8.920e+01 9.511e+01 1.032e+02 1.205e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 21:45:03,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3682120.0, ans=0.125 2023-11-28 21:45:08,003 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11250, loss[loss=0.04464, simple_loss=0.06678, pruned_loss=0.005022, audio_tagging_loss=0.006225, over 15741.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08813, pruned_loss=0.01188, audio_tagging_loss=0.008966, over 3047970.75 frames. ], batch size: 61, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:45:10,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3682186.6666666665, ans=0.0 2023-11-28 21:45:17,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3682186.6666666665, ans=0.2 2023-11-28 21:45:32,179 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552350 2023-11-28 21:45:49,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3682386.6666666665, ans=0.125 2023-11-28 21:45:53,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3682386.6666666665, ans=0.125 2023-11-28 21:45:53,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3682386.6666666665, ans=0.0 2023-11-28 21:46:09,206 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11300, loss[loss=0.05913, simple_loss=0.08826, pruned_loss=0.009151, audio_tagging_loss=0.005849, over 14034.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08822, pruned_loss=0.012, audio_tagging_loss=0.00875, over 3048785.48 frames. ], batch size: 52, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:46:26,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3682586.6666666665, ans=0.2 2023-11-28 21:46:33,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3682653.3333333335, ans=0.0 2023-11-28 21:46:34,653 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552400 2023-11-28 21:46:39,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3682653.3333333335, ans=0.1 2023-11-28 21:46:43,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3682653.3333333335, ans=0.125 2023-11-28 21:46:46,390 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 8.990e+01 9.658e+01 1.057e+02 1.418e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-28 21:46:51,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3682720.0, ans=0.125 2023-11-28 21:47:04,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3682786.6666666665, ans=0.125 2023-11-28 21:47:12,706 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11350, loss[loss=0.06272, simple_loss=0.08958, pruned_loss=0.01206, audio_tagging_loss=0.005868, over 15623.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08853, pruned_loss=0.01194, audio_tagging_loss=0.00861, over 3045537.45 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:47:18,447 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2023-11-28 21:47:33,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3682920.0, ans=0.125 2023-11-28 21:47:37,308 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552450 2023-11-28 21:47:37,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3682986.6666666665, ans=0.125 2023-11-28 21:48:00,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3683120.0, ans=0.1 2023-11-28 21:48:03,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3683120.0, ans=0.125 2023-11-28 21:48:14,203 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11400, loss[loss=0.05597, simple_loss=0.068, pruned_loss=0.01086, audio_tagging_loss=0.01111, over 14267.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08898, pruned_loss=0.01196, audio_tagging_loss=0.008527, over 3046839.10 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:48:22,249 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-28 21:48:38,565 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552500 2023-11-28 21:48:39,049 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.21 vs. limit=12.0 2023-11-28 21:48:43,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3683320.0, ans=0.125 2023-11-28 21:48:47,590 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:48:49,681 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 8.845e+01 9.711e+01 1.056e+02 1.187e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-28 21:48:56,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2023-11-28 21:49:00,299 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.82 vs. limit=22.5 2023-11-28 21:49:07,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3683453.3333333335, ans=0.0 2023-11-28 21:49:09,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3683453.3333333335, ans=0.0 2023-11-28 21:49:11,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3683453.3333333335, ans=0.0 2023-11-28 21:49:16,148 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11450, loss[loss=0.05636, simple_loss=0.07887, pruned_loss=0.009328, audio_tagging_loss=0.007601, over 14697.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08937, pruned_loss=0.0119, audio_tagging_loss=0.008458, over 3049390.41 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:49:40,243 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552550 2023-11-28 21:49:59,623 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.84 vs. limit=22.5 2023-11-28 21:50:11,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3683786.6666666665, ans=0.125 2023-11-28 21:50:16,977 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11500, loss[loss=0.04827, simple_loss=0.06354, pruned_loss=0.007038, audio_tagging_loss=0.009461, over 15497.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08936, pruned_loss=0.01199, audio_tagging_loss=0.008499, over 3048153.15 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:50:22,860 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=15.0 2023-11-28 21:50:40,125 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 21:50:42,242 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552600 2023-11-28 21:50:53,406 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.968e+01 8.799e+01 9.432e+01 1.014e+02 1.264e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-28 21:51:14,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3684120.0, ans=0.0 2023-11-28 21:51:18,545 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11550, loss[loss=0.06154, simple_loss=0.08225, pruned_loss=0.009701, audio_tagging_loss=0.01072, over 15848.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08957, pruned_loss=0.01199, audio_tagging_loss=0.008561, over 3046371.11 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 16.0 2023-11-28 21:51:43,983 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552650 2023-11-28 21:51:57,580 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 21:52:20,630 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11600, loss[loss=0.07378, simple_loss=0.09449, pruned_loss=0.01923, audio_tagging_loss=0.007304, over 14891.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08944, pruned_loss=0.01208, audio_tagging_loss=0.008568, over 3040361.34 frames. ], batch size: 55, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:52:32,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3684586.6666666665, ans=0.125 2023-11-28 21:52:45,066 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552700 2023-11-28 21:52:48,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3684653.3333333335, ans=0.125 2023-11-28 21:52:50,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3684653.3333333335, ans=0.0 2023-11-28 21:52:55,510 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 9.024e+01 9.602e+01 1.030e+02 1.712e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 21:53:07,283 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.11 vs. limit=22.5 2023-11-28 21:53:15,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3684786.6666666665, ans=0.1 2023-11-28 21:53:20,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3684853.3333333335, ans=0.025 2023-11-28 21:53:21,384 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11650, loss[loss=0.09062, simple_loss=0.1245, pruned_loss=0.02167, audio_tagging_loss=0.006702, over 14965.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09017, pruned_loss=0.01226, audio_tagging_loss=0.008519, over 3037747.77 frames. ], batch size: 54, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:53:37,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3684920.0, ans=0.2 2023-11-28 21:53:46,696 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552750 2023-11-28 21:53:50,768 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2023-11-28 21:54:01,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3685053.3333333335, ans=0.025 2023-11-28 21:54:02,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3685053.3333333335, ans=0.1 2023-11-28 21:54:03,135 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.47 vs. limit=22.5 2023-11-28 21:54:22,842 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11700, loss[loss=0.07805, simple_loss=0.1102, pruned_loss=0.01424, audio_tagging_loss=0.008684, over 14346.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08928, pruned_loss=0.01222, audio_tagging_loss=0.008602, over 3034355.42 frames. ], batch size: 53, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:54:27,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3685186.6666666665, ans=0.125 2023-11-28 21:54:27,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3685186.6666666665, ans=0.0 2023-11-28 21:54:48,023 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552800 2023-11-28 21:54:55,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3685320.0, ans=0.2 2023-11-28 21:54:59,267 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 9.206e+01 9.735e+01 1.055e+02 1.331e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-28 21:55:01,294 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=22.5 2023-11-28 21:55:06,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3685386.6666666665, ans=0.125 2023-11-28 21:55:07,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3685386.6666666665, ans=0.1 2023-11-28 21:55:14,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3685453.3333333335, ans=0.05 2023-11-28 21:55:19,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3685453.3333333335, ans=0.0 2023-11-28 21:55:24,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3685520.0, ans=0.0 2023-11-28 21:55:24,991 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11750, loss[loss=0.0496, simple_loss=0.06673, pruned_loss=0.006936, audio_tagging_loss=0.009295, over 15276.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.0888, pruned_loss=0.01204, audio_tagging_loss=0.008634, over 3035547.64 frames. ], batch size: 56, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:55:31,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3685520.0, ans=0.0 2023-11-28 21:55:33,145 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.08 vs. limit=15.0 2023-11-28 21:55:36,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3685586.6666666665, ans=0.125 2023-11-28 21:55:40,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3685586.6666666665, ans=0.2 2023-11-28 21:55:49,492 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552850 2023-11-28 21:55:49,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3685653.3333333335, ans=0.125 2023-11-28 21:56:00,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3685720.0, ans=0.125 2023-11-28 21:56:05,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3685720.0, ans=0.0 2023-11-28 21:56:09,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3685720.0, ans=0.125 2023-11-28 21:56:22,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3685786.6666666665, ans=0.2 2023-11-28 21:56:26,106 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11800, loss[loss=0.06278, simple_loss=0.08394, pruned_loss=0.009953, audio_tagging_loss=0.01086, over 15049.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08868, pruned_loss=0.01202, audio_tagging_loss=0.008682, over 3036166.22 frames. ], batch size: 59, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:56:39,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3685920.0, ans=0.0 2023-11-28 21:56:50,962 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552900 2023-11-28 21:56:51,391 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.86 vs. limit=10.0 2023-11-28 21:56:54,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3685986.6666666665, ans=0.1 2023-11-28 21:56:56,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3685986.6666666665, ans=0.0 2023-11-28 21:57:00,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3685986.6666666665, ans=10.0 2023-11-28 21:57:02,194 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.743e+01 8.813e+01 9.510e+01 1.037e+02 1.447e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-28 21:57:04,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3686053.3333333335, ans=0.1 2023-11-28 21:57:10,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3686053.3333333335, ans=0.0 2023-11-28 21:57:17,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3686120.0, ans=0.125 2023-11-28 21:57:28,207 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11850, loss[loss=0.05633, simple_loss=0.07863, pruned_loss=0.006961, audio_tagging_loss=0.01006, over 13616.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08886, pruned_loss=0.01203, audio_tagging_loss=0.008674, over 3036605.29 frames. ], batch size: 52, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:57:53,536 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 552950 2023-11-28 21:57:56,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3686320.0, ans=0.0 2023-11-28 21:58:29,181 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11900, loss[loss=0.06238, simple_loss=0.08678, pruned_loss=0.008922, audio_tagging_loss=0.01007, over 16157.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08898, pruned_loss=0.01201, audio_tagging_loss=0.008747, over 3037880.79 frames. ], batch size: 58, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:58:42,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3686586.6666666665, ans=0.125 2023-11-28 21:58:54,557 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553000 2023-11-28 21:59:00,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3686653.3333333335, ans=0.125 2023-11-28 21:59:05,468 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.697e+01 9.440e+01 1.029e+02 1.196e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-28 21:59:13,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3686720.0, ans=0.125 2023-11-28 21:59:16,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3686720.0, ans=0.1 2023-11-28 21:59:30,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3686786.6666666665, ans=0.125 2023-11-28 21:59:32,209 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 11950, loss[loss=0.04915, simple_loss=0.06507, pruned_loss=0.005729, audio_tagging_loss=0.01089, over 15947.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08889, pruned_loss=0.01192, audio_tagging_loss=0.008872, over 3048129.73 frames. ], batch size: 60, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 21:59:56,178 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553050 2023-11-28 22:00:05,277 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2023-11-28 22:00:28,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3687120.0, ans=0.125 2023-11-28 22:00:31,711 INFO [train_asr.py:1235] (2/4) Epoch 46, batch 12000, loss[loss=0.06719, simple_loss=0.0916, pruned_loss=0.0116, audio_tagging_loss=0.009791, over 16554.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.089, pruned_loss=0.01197, audio_tagging_loss=0.008979, over 3049502.68 frames. ], batch size: 63, lr: 1.46e-03, grad_scale: 32.0 2023-11-28 22:00:31,712 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 22:01:11,999 INFO [train_asr.py:1267] (2/4) Epoch 46, validation: loss=0.05835, simple_loss=0.05054, pruned_loss=0.005304, audio_tagging_loss=0.02778, over 4681554.00 frames. 2023-11-28 22:01:12,000 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 22:01:23,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3687253.3333333335, ans=0.125 2023-11-28 22:01:34,487 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553100 2023-11-28 22:01:35,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3687320.0, ans=0.125 2023-11-28 22:01:56,250 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 0, loss[loss=0.06685, simple_loss=0.07879, pruned_loss=0.006758, audio_tagging_loss=0.0207, over 14429.00 frames. ], tot_loss[loss=0.06685, simple_loss=0.07879, pruned_loss=0.006758, audio_tagging_loss=0.0207, over 14429.00 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:01:56,251 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 22:02:32,342 INFO [train_asr.py:1267] (2/4) Epoch 47, validation: loss=0.05784, simple_loss=0.05051, pruned_loss=0.005299, audio_tagging_loss=0.02728, over 4681554.00 frames. 2023-11-28 22:02:32,342 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 22:02:39,328 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 9.135e+01 9.831e+01 1.074e+02 1.367e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-28 22:02:40,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3687340.0, ans=0.1 2023-11-28 22:03:21,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3687606.6666666665, ans=0.0 2023-11-28 22:03:29,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3687606.6666666665, ans=0.125 2023-11-28 22:03:30,823 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553150 2023-11-28 22:03:34,260 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 50, loss[loss=0.08769, simple_loss=0.1094, pruned_loss=0.01405, audio_tagging_loss=0.01897, over 15110.00 frames. ], tot_loss[loss=0.07334, simple_loss=0.08931, pruned_loss=0.01164, audio_tagging_loss=0.01704, over 684853.72 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:03:40,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3687673.3333333335, ans=0.0 2023-11-28 22:03:41,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3687673.3333333335, ans=0.125 2023-11-28 22:03:47,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3687740.0, ans=0.125 2023-11-28 22:03:54,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3687740.0, ans=0.125 2023-11-28 22:03:54,342 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.54 vs. limit=15.0 2023-11-28 22:04:07,274 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.52 vs. limit=22.5 2023-11-28 22:04:33,313 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553200 2023-11-28 22:04:37,278 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 100, loss[loss=0.07659, simple_loss=0.09971, pruned_loss=0.01224, audio_tagging_loss=0.0145, over 16792.00 frames. ], tot_loss[loss=0.07272, simple_loss=0.08908, pruned_loss=0.0119, audio_tagging_loss=0.01628, over 1209713.71 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:04:44,873 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.649e+01 9.823e+01 1.051e+02 1.142e+02 1.295e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-28 22:04:55,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3688073.3333333335, ans=0.125 2023-11-28 22:04:55,968 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.16 vs. limit=15.0 2023-11-28 22:05:04,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3688140.0, ans=0.125 2023-11-28 22:05:08,480 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=15.0 2023-11-28 22:05:20,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3688206.6666666665, ans=0.125 2023-11-28 22:05:21,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3688206.6666666665, ans=0.125 2023-11-28 22:05:24,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3688206.6666666665, ans=0.2 2023-11-28 22:05:32,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3688273.3333333335, ans=0.04949747468305833 2023-11-28 22:05:36,203 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553250 2023-11-28 22:05:40,249 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 150, loss[loss=0.07522, simple_loss=0.09773, pruned_loss=0.01497, audio_tagging_loss=0.01138, over 15641.00 frames. ], tot_loss[loss=0.06989, simple_loss=0.08723, pruned_loss=0.01164, audio_tagging_loss=0.01464, over 1618020.20 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:05:49,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3688340.0, ans=0.025 2023-11-28 22:05:49,837 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.19 vs. limit=6.0 2023-11-28 22:05:50,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3688340.0, ans=15.0 2023-11-28 22:05:54,662 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2023-11-28 22:05:59,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3688406.6666666665, ans=0.2 2023-11-28 22:06:00,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3688406.6666666665, ans=0.125 2023-11-28 22:06:04,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3688473.3333333335, ans=0.125 2023-11-28 22:06:39,489 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553300 2023-11-28 22:06:39,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3688606.6666666665, ans=0.0 2023-11-28 22:06:42,855 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 200, loss[loss=0.07754, simple_loss=0.1025, pruned_loss=0.01584, audio_tagging_loss=0.01045, over 14690.00 frames. ], tot_loss[loss=0.06862, simple_loss=0.08808, pruned_loss=0.01172, audio_tagging_loss=0.01286, over 1925197.61 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:06:47,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3688673.3333333335, ans=0.125 2023-11-28 22:06:49,334 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2023-11-28 22:06:51,840 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.129e+01 9.056e+01 9.738e+01 1.064e+02 1.248e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-28 22:06:53,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3688673.3333333335, ans=0.0 2023-11-28 22:07:10,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688806.6666666665, ans=0.1 2023-11-28 22:07:15,816 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=15.0 2023-11-28 22:07:29,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3688873.3333333335, ans=0.125 2023-11-28 22:07:41,053 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553350 2023-11-28 22:07:44,524 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 250, loss[loss=0.08254, simple_loss=0.1191, pruned_loss=0.01562, audio_tagging_loss=0.007368, over 15605.00 frames. ], tot_loss[loss=0.06716, simple_loss=0.08796, pruned_loss=0.01163, audio_tagging_loss=0.01156, over 2170122.72 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:07:50,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3689006.6666666665, ans=0.1 2023-11-28 22:07:53,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3689006.6666666665, ans=0.0 2023-11-28 22:08:09,648 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.38 vs. limit=22.5 2023-11-28 22:08:10,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3689140.0, ans=0.125 2023-11-28 22:08:20,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3689206.6666666665, ans=0.125 2023-11-28 22:08:37,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3689273.3333333335, ans=0.125 2023-11-28 22:08:42,565 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553400 2023-11-28 22:08:46,414 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 300, loss[loss=0.05683, simple_loss=0.07513, pruned_loss=0.009207, audio_tagging_loss=0.01006, over 13528.00 frames. ], tot_loss[loss=0.06677, simple_loss=0.08853, pruned_loss=0.01179, audio_tagging_loss=0.01071, over 2367228.78 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:08:55,130 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.602e+01 9.275e+01 9.937e+01 1.062e+02 1.967e+02, threshold=1.987e+02, percent-clipped=1.0 2023-11-28 22:09:14,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3689473.3333333335, ans=0.0 2023-11-28 22:09:41,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3689606.6666666665, ans=0.0 2023-11-28 22:09:44,086 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553450 2023-11-28 22:09:48,029 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 350, loss[loss=0.05492, simple_loss=0.0778, pruned_loss=0.007576, audio_tagging_loss=0.008441, over 14200.00 frames. ], tot_loss[loss=0.06624, simple_loss=0.08872, pruned_loss=0.01174, audio_tagging_loss=0.01014, over 2517375.26 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:09:48,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3689673.3333333335, ans=0.1 2023-11-28 22:10:04,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3689740.0, ans=0.125 2023-11-28 22:10:16,518 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=12.0 2023-11-28 22:10:23,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3689873.3333333335, ans=0.125 2023-11-28 22:10:24,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3689873.3333333335, ans=0.125 2023-11-28 22:10:29,138 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:10:43,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3689940.0, ans=0.125 2023-11-28 22:10:44,906 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553500 2023-11-28 22:10:46,724 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.28 vs. limit=12.0 2023-11-28 22:10:48,610 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 400, loss[loss=0.05521, simple_loss=0.0694, pruned_loss=0.01126, audio_tagging_loss=0.009243, over 14066.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08837, pruned_loss=0.01184, audio_tagging_loss=0.009827, over 2636929.77 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:10:56,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3690006.6666666665, ans=0.125 2023-11-28 22:10:56,882 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 9.027e+01 9.535e+01 1.022e+02 1.341e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 22:11:12,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3690140.0, ans=0.1 2023-11-28 22:11:12,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3690140.0, ans=0.0 2023-11-28 22:11:26,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3690206.6666666665, ans=0.125 2023-11-28 22:11:31,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3690206.6666666665, ans=0.05 2023-11-28 22:11:45,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3690273.3333333335, ans=0.125 2023-11-28 22:11:47,878 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553550 2023-11-28 22:11:51,255 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 450, loss[loss=0.04473, simple_loss=0.06598, pruned_loss=0.003791, audio_tagging_loss=0.00795, over 16806.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08785, pruned_loss=0.01178, audio_tagging_loss=0.009511, over 2736175.24 frames. ], batch size: 64, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:11:53,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3690340.0, ans=0.07 2023-11-28 22:11:54,125 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.74 vs. limit=10.0 2023-11-28 22:12:02,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3690406.6666666665, ans=0.1 2023-11-28 22:12:17,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3690473.3333333335, ans=0.1 2023-11-28 22:12:23,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3690473.3333333335, ans=0.0 2023-11-28 22:12:45,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3690606.6666666665, ans=0.125 2023-11-28 22:12:45,955 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2023-11-28 22:12:47,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3690606.6666666665, ans=0.1 2023-11-28 22:12:48,985 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553600 2023-11-28 22:12:52,904 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 500, loss[loss=0.05606, simple_loss=0.07531, pruned_loss=0.0115, audio_tagging_loss=0.006898, over 14263.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08797, pruned_loss=0.01193, audio_tagging_loss=0.009327, over 2801683.19 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:12:53,490 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.62 vs. limit=22.5 2023-11-28 22:13:01,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3690673.3333333335, ans=0.2 2023-11-28 22:13:01,822 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 8.926e+01 9.624e+01 1.054e+02 1.218e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 22:13:11,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3690740.0, ans=0.125 2023-11-28 22:13:11,993 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:13:33,619 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.22 vs. limit=10.0 2023-11-28 22:13:38,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3690873.3333333335, ans=0.125 2023-11-28 22:13:48,249 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.11 vs. limit=15.0 2023-11-28 22:13:50,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3690940.0, ans=0.125 2023-11-28 22:13:51,627 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553650 2023-11-28 22:13:55,647 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 550, loss[loss=0.06275, simple_loss=0.08731, pruned_loss=0.01098, audio_tagging_loss=0.008113, over 15311.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.088, pruned_loss=0.01182, audio_tagging_loss=0.009176, over 2856680.33 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:14:07,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3691073.3333333335, ans=0.125 2023-11-28 22:14:34,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3691206.6666666665, ans=0.2 2023-11-28 22:14:45,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3691273.3333333335, ans=0.0 2023-11-28 22:14:47,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3691273.3333333335, ans=0.0 2023-11-28 22:14:53,452 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553700 2023-11-28 22:14:57,476 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 600, loss[loss=0.04058, simple_loss=0.05503, pruned_loss=0.003742, audio_tagging_loss=0.009325, over 14562.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08886, pruned_loss=0.0119, audio_tagging_loss=0.009088, over 2896631.95 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:15:06,285 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.361e+01 8.960e+01 9.634e+01 1.013e+02 1.210e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 22:15:18,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3691406.6666666665, ans=0.125 2023-11-28 22:15:42,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3691540.0, ans=0.125 2023-11-28 22:15:55,420 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553750 2023-11-28 22:15:58,950 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 650, loss[loss=0.07183, simple_loss=0.1009, pruned_loss=0.01317, audio_tagging_loss=0.008205, over 14291.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08859, pruned_loss=0.01193, audio_tagging_loss=0.008988, over 2932594.91 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:16:01,867 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.04 vs. limit=15.0 2023-11-28 22:16:03,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3691673.3333333335, ans=15.0 2023-11-28 22:16:51,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3691940.0, ans=0.2 2023-11-28 22:16:52,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3691940.0, ans=0.125 2023-11-28 22:16:52,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3691940.0, ans=0.125 2023-11-28 22:16:56,067 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553800 2023-11-28 22:17:00,540 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 700, loss[loss=0.04017, simple_loss=0.05574, pruned_loss=0.003393, audio_tagging_loss=0.008913, over 16233.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08817, pruned_loss=0.01183, audio_tagging_loss=0.008912, over 2961756.14 frames. ], batch size: 62, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:17:03,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3692006.6666666665, ans=0.0 2023-11-28 22:17:09,308 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 8.938e+01 9.507e+01 1.029e+02 1.273e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-28 22:17:10,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3692006.6666666665, ans=0.1 2023-11-28 22:17:22,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3692073.3333333335, ans=0.2 2023-11-28 22:17:58,253 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553850 2023-11-28 22:18:02,280 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 750, loss[loss=0.09134, simple_loss=0.124, pruned_loss=0.01765, audio_tagging_loss=0.01168, over 15725.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08914, pruned_loss=0.01199, audio_tagging_loss=0.008825, over 2979754.24 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:18:04,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3692340.0, ans=0.2 2023-11-28 22:18:22,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3692406.6666666665, ans=0.1 2023-11-28 22:18:27,120 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.75 vs. limit=22.5 2023-11-28 22:18:59,103 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.16 vs. limit=15.0 2023-11-28 22:19:00,673 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553900 2023-11-28 22:19:04,213 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 800, loss[loss=0.057, simple_loss=0.07916, pruned_loss=0.009175, audio_tagging_loss=0.008245, over 14248.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08884, pruned_loss=0.0119, audio_tagging_loss=0.008865, over 2984794.43 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:19:12,503 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.995e+01 9.559e+01 1.026e+02 1.353e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 22:19:41,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3692873.3333333335, ans=0.125 2023-11-28 22:19:47,792 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.56 vs. limit=15.0 2023-11-28 22:20:02,043 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 553950 2023-11-28 22:20:05,580 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 850, loss[loss=0.06418, simple_loss=0.08821, pruned_loss=0.01091, audio_tagging_loss=0.009165, over 15795.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08913, pruned_loss=0.01205, audio_tagging_loss=0.008844, over 2999096.76 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:20:08,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3693006.6666666665, ans=0.2 2023-11-28 22:20:24,407 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=22.5 2023-11-28 22:20:32,658 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2023-11-28 22:20:37,002 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=22.5 2023-11-28 22:20:43,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3693206.6666666665, ans=0.125 2023-11-28 22:21:02,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3693273.3333333335, ans=0.125 2023-11-28 22:21:03,744 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554000 2023-11-28 22:21:03,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3693273.3333333335, ans=0.125 2023-11-28 22:21:07,964 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 900, loss[loss=0.0466, simple_loss=0.05081, pruned_loss=0.007836, audio_tagging_loss=0.01336, over 14851.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08941, pruned_loss=0.01207, audio_tagging_loss=0.008905, over 3010066.57 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:21:16,631 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.970e+01 9.672e+01 1.016e+02 1.262e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 22:21:18,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3693340.0, ans=0.125 2023-11-28 22:21:18,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3693340.0, ans=0.125 2023-11-28 22:21:21,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3693406.6666666665, ans=0.125 2023-11-28 22:21:23,509 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2023-11-28 22:21:49,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3693540.0, ans=0.1 2023-11-28 22:22:06,211 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554050 2023-11-28 22:22:10,226 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 950, loss[loss=0.05948, simple_loss=0.07847, pruned_loss=0.0099, audio_tagging_loss=0.01035, over 15368.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.0892, pruned_loss=0.01204, audio_tagging_loss=0.008823, over 3018478.16 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:22:32,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3693740.0, ans=0.125 2023-11-28 22:22:40,737 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.94 vs. limit=22.5 2023-11-28 22:22:42,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3693806.6666666665, ans=0.2 2023-11-28 22:23:07,957 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554100 2023-11-28 22:23:10,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3694006.6666666665, ans=0.0 2023-11-28 22:23:11,514 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1000, loss[loss=0.06745, simple_loss=0.09108, pruned_loss=0.0134, audio_tagging_loss=0.008502, over 15747.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08977, pruned_loss=0.01218, audio_tagging_loss=0.008692, over 3029778.27 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:23:20,402 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 9.063e+01 9.775e+01 1.049e+02 2.458e+02, threshold=1.955e+02, percent-clipped=1.0 2023-11-28 22:23:26,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3694073.3333333335, ans=0.2 2023-11-28 22:23:39,313 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:23:51,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=3694206.6666666665, ans=0.2 2023-11-28 22:23:55,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3694206.6666666665, ans=0.05 2023-11-28 22:24:09,815 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554150 2023-11-28 22:24:13,273 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1050, loss[loss=0.07384, simple_loss=0.09418, pruned_loss=0.01399, audio_tagging_loss=0.01276, over 15502.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08932, pruned_loss=0.0122, audio_tagging_loss=0.008636, over 3035186.74 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:24:13,883 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.37 vs. limit=6.0 2023-11-28 22:24:30,271 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2023-11-28 22:24:31,930 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=12.0 2023-11-28 22:24:44,241 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:24:44,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3694473.3333333335, ans=0.125 2023-11-28 22:25:02,935 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.37 vs. limit=15.0 2023-11-28 22:25:11,914 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554200 2023-11-28 22:25:13,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3694606.6666666665, ans=0.2 2023-11-28 22:25:15,625 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1100, loss[loss=0.06181, simple_loss=0.08727, pruned_loss=0.009557, audio_tagging_loss=0.008617, over 15206.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08824, pruned_loss=0.01208, audio_tagging_loss=0.008637, over 3039974.51 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:25:19,082 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.11 vs. limit=15.0 2023-11-28 22:25:19,623 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:25:24,307 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 9.004e+01 9.578e+01 1.033e+02 1.285e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 22:25:38,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3694806.6666666665, ans=0.125 2023-11-28 22:25:43,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3694806.6666666665, ans=0.1 2023-11-28 22:25:44,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3694806.6666666665, ans=0.0 2023-11-28 22:25:59,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3694873.3333333335, ans=0.2 2023-11-28 22:25:59,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3694873.3333333335, ans=0.2 2023-11-28 22:26:07,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3694940.0, ans=0.0 2023-11-28 22:26:12,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3694940.0, ans=0.0 2023-11-28 22:26:13,888 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554250 2023-11-28 22:26:17,340 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1150, loss[loss=0.05968, simple_loss=0.07926, pruned_loss=0.008668, audio_tagging_loss=0.01138, over 14147.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08833, pruned_loss=0.0121, audio_tagging_loss=0.008623, over 3035329.33 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:26:38,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3695073.3333333335, ans=0.125 2023-11-28 22:26:55,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3695206.6666666665, ans=0.2 2023-11-28 22:26:56,797 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=12.0 2023-11-28 22:27:08,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3695273.3333333335, ans=0.125 2023-11-28 22:27:13,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3695273.3333333335, ans=0.125 2023-11-28 22:27:15,907 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554300 2023-11-28 22:27:17,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3695273.3333333335, ans=0.125 2023-11-28 22:27:19,257 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1200, loss[loss=0.08134, simple_loss=0.1187, pruned_loss=0.01616, audio_tagging_loss=0.005813, over 15925.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.0886, pruned_loss=0.01217, audio_tagging_loss=0.008618, over 3035126.82 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:27:27,985 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.320e+01 8.745e+01 9.451e+01 1.036e+02 1.471e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-28 22:27:43,479 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.82 vs. limit=10.0 2023-11-28 22:27:52,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3695473.3333333335, ans=0.125 2023-11-28 22:28:16,946 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554350 2023-11-28 22:28:20,964 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1250, loss[loss=0.06999, simple_loss=0.1062, pruned_loss=0.01048, audio_tagging_loss=0.006426, over 15332.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08842, pruned_loss=0.0121, audio_tagging_loss=0.008575, over 3031483.05 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:28:25,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3695673.3333333335, ans=0.125 2023-11-28 22:28:35,026 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.73 vs. limit=15.0 2023-11-28 22:28:36,418 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2023-11-28 22:28:43,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3695740.0, ans=0.125 2023-11-28 22:28:50,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3695806.6666666665, ans=0.125 2023-11-28 22:29:00,989 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.88 vs. limit=10.0 2023-11-28 22:29:08,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3695873.3333333335, ans=0.1 2023-11-28 22:29:18,962 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554400 2023-11-28 22:29:22,771 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1300, loss[loss=0.064, simple_loss=0.08423, pruned_loss=0.01265, audio_tagging_loss=0.009235, over 15124.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08873, pruned_loss=0.01194, audio_tagging_loss=0.008555, over 3033087.21 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:29:28,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3696006.6666666665, ans=0.125 2023-11-28 22:29:30,737 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 9.038e+01 9.627e+01 1.019e+02 1.676e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-28 22:30:10,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3696206.6666666665, ans=0.125 2023-11-28 22:30:21,007 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554450 2023-11-28 22:30:24,561 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1350, loss[loss=0.0628, simple_loss=0.07786, pruned_loss=0.01561, audio_tagging_loss=0.008262, over 15250.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08902, pruned_loss=0.01193, audio_tagging_loss=0.008549, over 3037967.82 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:30:36,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3696406.6666666665, ans=0.125 2023-11-28 22:30:48,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3696473.3333333335, ans=0.125 2023-11-28 22:31:10,311 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:31:11,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3696540.0, ans=0.125 2023-11-28 22:31:19,260 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2023-11-28 22:31:21,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3696606.6666666665, ans=0.125 2023-11-28 22:31:22,800 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554500 2023-11-28 22:31:26,138 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1400, loss[loss=0.05618, simple_loss=0.0793, pruned_loss=0.007192, audio_tagging_loss=0.009344, over 15706.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08897, pruned_loss=0.01186, audio_tagging_loss=0.008544, over 3040184.23 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:31:30,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3696673.3333333335, ans=0.0 2023-11-28 22:31:34,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3696673.3333333335, ans=0.0 2023-11-28 22:31:35,455 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.987e+01 9.786e+01 1.046e+02 1.300e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-28 22:31:40,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3696740.0, ans=0.0 2023-11-28 22:31:48,908 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2023-11-28 22:32:24,798 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554550 2023-11-28 22:32:28,201 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1450, loss[loss=0.05543, simple_loss=0.07528, pruned_loss=0.005874, audio_tagging_loss=0.01192, over 14429.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08945, pruned_loss=0.01205, audio_tagging_loss=0.008677, over 3029817.02 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 22:32:30,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3697006.6666666665, ans=0.0 2023-11-28 22:33:01,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3697140.0, ans=0.0 2023-11-28 22:33:22,636 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2023-11-28 22:33:25,685 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554600 2023-11-28 22:33:29,760 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1500, loss[loss=0.05962, simple_loss=0.08196, pruned_loss=0.00994, audio_tagging_loss=0.008694, over 15372.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08989, pruned_loss=0.01207, audio_tagging_loss=0.008694, over 3042218.31 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 22:33:39,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3697340.0, ans=0.125 2023-11-28 22:33:40,200 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.171e+01 8.916e+01 9.599e+01 1.025e+02 1.569e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 22:33:42,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3697406.6666666665, ans=0.125 2023-11-28 22:33:42,298 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:33:43,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3697406.6666666665, ans=0.0 2023-11-28 22:33:51,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3697406.6666666665, ans=0.125 2023-11-28 22:33:54,215 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.34 vs. limit=15.0 2023-11-28 22:34:22,944 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2023-11-28 22:34:27,828 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554650 2023-11-28 22:34:31,285 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1550, loss[loss=0.07478, simple_loss=0.1027, pruned_loss=0.01672, audio_tagging_loss=0.006728, over 14625.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.08974, pruned_loss=0.0121, audio_tagging_loss=0.008778, over 3044054.03 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 22:34:38,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3697673.3333333335, ans=0.0 2023-11-28 22:34:40,315 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=12.0 2023-11-28 22:34:44,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3697740.0, ans=10.0 2023-11-28 22:34:47,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3697740.0, ans=0.125 2023-11-28 22:34:56,795 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.31 vs. limit=22.5 2023-11-28 22:35:15,986 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.26 vs. limit=22.5 2023-11-28 22:35:29,311 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554700 2023-11-28 22:35:32,745 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1600, loss[loss=0.06617, simple_loss=0.09584, pruned_loss=0.01154, audio_tagging_loss=0.006709, over 14865.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08921, pruned_loss=0.01192, audio_tagging_loss=0.008925, over 3039744.75 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:35:44,025 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 8.984e+01 9.580e+01 1.035e+02 1.494e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-28 22:35:44,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3698073.3333333335, ans=6.0 2023-11-28 22:35:54,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3698073.3333333335, ans=0.2 2023-11-28 22:35:54,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3698073.3333333335, ans=0.125 2023-11-28 22:35:59,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3698140.0, ans=0.1 2023-11-28 22:35:59,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=3698140.0, ans=0.1 2023-11-28 22:36:31,677 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554750 2023-11-28 22:36:35,098 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1650, loss[loss=0.07261, simple_loss=0.09897, pruned_loss=0.01148, audio_tagging_loss=0.01165, over 14621.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08905, pruned_loss=0.01187, audio_tagging_loss=0.008931, over 3050872.21 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:36:37,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3698340.0, ans=0.125 2023-11-28 22:36:46,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3698406.6666666665, ans=0.1 2023-11-28 22:36:46,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3698406.6666666665, ans=0.1 2023-11-28 22:36:50,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3698406.6666666665, ans=0.1 2023-11-28 22:36:57,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3698406.6666666665, ans=0.125 2023-11-28 22:37:02,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3698473.3333333335, ans=0.2 2023-11-28 22:37:02,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3698473.3333333335, ans=0.0 2023-11-28 22:37:07,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3698473.3333333335, ans=0.125 2023-11-28 22:37:26,571 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2023-11-28 22:37:33,137 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554800 2023-11-28 22:37:37,082 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1700, loss[loss=0.05295, simple_loss=0.07209, pruned_loss=0.007176, audio_tagging_loss=0.009733, over 15102.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.08959, pruned_loss=0.01194, audio_tagging_loss=0.008988, over 3047246.49 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:37:40,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3698673.3333333335, ans=0.125 2023-11-28 22:37:47,559 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.852e+01 9.479e+01 1.004e+02 1.252e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-28 22:37:49,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3698740.0, ans=0.125 2023-11-28 22:37:55,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3698740.0, ans=0.09899494936611666 2023-11-28 22:38:34,907 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554850 2023-11-28 22:38:37,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3699006.6666666665, ans=0.0 2023-11-28 22:38:38,836 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1750, loss[loss=0.07293, simple_loss=0.1104, pruned_loss=0.01083, audio_tagging_loss=0.006905, over 15728.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08976, pruned_loss=0.01191, audio_tagging_loss=0.008848, over 3046481.98 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:38:54,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3699073.3333333335, ans=0.0 2023-11-28 22:39:09,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3699140.0, ans=0.125 2023-11-28 22:39:22,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3699206.6666666665, ans=0.0 2023-11-28 22:39:32,925 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:39:36,318 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554900 2023-11-28 22:39:38,558 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=12.0 2023-11-28 22:39:40,436 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1800, loss[loss=0.046, simple_loss=0.06169, pruned_loss=0.006444, audio_tagging_loss=0.008713, over 14625.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08941, pruned_loss=0.01176, audio_tagging_loss=0.008717, over 3049631.09 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:39:48,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3699340.0, ans=0.1 2023-11-28 22:39:51,493 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.305e+01 9.095e+01 9.843e+01 1.068e+02 2.957e+02, threshold=1.969e+02, percent-clipped=2.0 2023-11-28 22:39:52,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3699406.6666666665, ans=0.1 2023-11-28 22:40:01,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3699406.6666666665, ans=0.125 2023-11-28 22:40:34,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3699606.6666666665, ans=0.2 2023-11-28 22:40:38,701 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 554950 2023-11-28 22:40:42,249 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1850, loss[loss=0.06397, simple_loss=0.08545, pruned_loss=0.0124, audio_tagging_loss=0.008839, over 14822.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.09021, pruned_loss=0.01192, audio_tagging_loss=0.008607, over 3047507.77 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:40:45,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3699673.3333333335, ans=0.0 2023-11-28 22:41:03,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3699740.0, ans=0.2 2023-11-28 22:41:13,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3699806.6666666665, ans=0.0 2023-11-28 22:41:21,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3699873.3333333335, ans=0.125 2023-11-28 22:41:21,374 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:41:28,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3699873.3333333335, ans=0.5 2023-11-28 22:41:40,471 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555000 2023-11-28 22:41:44,228 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1900, loss[loss=0.06309, simple_loss=0.07945, pruned_loss=0.01279, audio_tagging_loss=0.01058, over 15789.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08995, pruned_loss=0.01191, audio_tagging_loss=0.008587, over 3051803.14 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:41:46,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3700006.6666666665, ans=0.125 2023-11-28 22:41:54,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3700006.6666666665, ans=0.0 2023-11-28 22:41:55,303 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.917e+01 9.676e+01 1.038e+02 1.630e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 22:42:42,007 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555050 2023-11-28 22:42:42,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3700273.3333333335, ans=0.1 2023-11-28 22:42:45,394 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 1950, loss[loss=0.03747, simple_loss=0.04377, pruned_loss=0.002377, audio_tagging_loss=0.01321, over 16408.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08884, pruned_loss=0.01179, audio_tagging_loss=0.008703, over 3053593.86 frames. ], batch size: 65, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:42:50,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3700340.0, ans=0.125 2023-11-28 22:42:54,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3700340.0, ans=0.035 2023-11-28 22:43:01,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3700406.6666666665, ans=0.0 2023-11-28 22:43:03,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3700406.6666666665, ans=0.125 2023-11-28 22:43:10,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3700473.3333333335, ans=0.1 2023-11-28 22:43:20,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3700473.3333333335, ans=0.125 2023-11-28 22:43:43,586 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555100 2023-11-28 22:43:46,964 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2000, loss[loss=0.06287, simple_loss=0.08665, pruned_loss=0.01055, audio_tagging_loss=0.008999, over 14813.00 frames. ], tot_loss[loss=0.06425, simple_loss=0.08786, pruned_loss=0.01161, audio_tagging_loss=0.008716, over 3047842.17 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:43:58,087 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.916e+01 9.601e+01 1.024e+02 1.438e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 22:43:59,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.07 vs. limit=12.0 2023-11-28 22:44:45,214 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555150 2023-11-28 22:44:48,091 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=12.0 2023-11-28 22:44:48,547 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2050, loss[loss=0.05445, simple_loss=0.06798, pruned_loss=0.009798, audio_tagging_loss=0.01066, over 14711.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08842, pruned_loss=0.01181, audio_tagging_loss=0.008688, over 3047866.91 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:44:49,126 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2023-11-28 22:44:56,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3701006.6666666665, ans=0.125 2023-11-28 22:45:14,803 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2023-11-28 22:45:46,320 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555200 2023-11-28 22:45:50,149 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2100, loss[loss=0.06489, simple_loss=0.08699, pruned_loss=0.009428, audio_tagging_loss=0.01197, over 15491.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08863, pruned_loss=0.01189, audio_tagging_loss=0.008658, over 3039621.90 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:45:56,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3701340.0, ans=0.125 2023-11-28 22:45:57,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3701340.0, ans=0.0 2023-11-28 22:46:02,552 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.958e+01 9.568e+01 1.025e+02 1.229e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-28 22:46:05,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3701406.6666666665, ans=0.125 2023-11-28 22:46:23,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3701473.3333333335, ans=0.0 2023-11-28 22:46:47,983 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555250 2023-11-28 22:46:52,234 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2150, loss[loss=0.06479, simple_loss=0.09119, pruned_loss=0.01131, audio_tagging_loss=0.00788, over 15855.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.0882, pruned_loss=0.01182, audio_tagging_loss=0.008629, over 3047295.00 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:46:53,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3701673.3333333335, ans=0.125 2023-11-28 22:47:04,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3701740.0, ans=0.2 2023-11-28 22:47:30,093 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:47:50,679 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555300 2023-11-28 22:47:54,167 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2200, loss[loss=0.05254, simple_loss=0.07496, pruned_loss=0.007068, audio_tagging_loss=0.007993, over 15511.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08843, pruned_loss=0.01184, audio_tagging_loss=0.008556, over 3047197.77 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:48:06,446 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.601e+01 9.046e+01 9.585e+01 1.059e+02 1.446e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-28 22:48:13,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3702073.3333333335, ans=0.1 2023-11-28 22:48:18,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3702140.0, ans=0.125 2023-11-28 22:48:19,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3702140.0, ans=0.125 2023-11-28 22:48:23,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3702140.0, ans=0.0 2023-11-28 22:48:26,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3702140.0, ans=0.125 2023-11-28 22:48:40,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3702206.6666666665, ans=0.0 2023-11-28 22:48:40,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3702206.6666666665, ans=0.2 2023-11-28 22:48:47,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3702273.3333333335, ans=0.0 2023-11-28 22:48:47,431 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.58 vs. limit=22.5 2023-11-28 22:48:52,158 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555350 2023-11-28 22:48:53,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3702273.3333333335, ans=0.125 2023-11-28 22:48:55,611 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2250, loss[loss=0.06671, simple_loss=0.08985, pruned_loss=0.0134, audio_tagging_loss=0.008386, over 15696.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08915, pruned_loss=0.0119, audio_tagging_loss=0.008581, over 3048662.26 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:48:58,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3702340.0, ans=0.125 2023-11-28 22:49:12,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3702406.6666666665, ans=0.1 2023-11-28 22:49:42,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3702540.0, ans=0.0 2023-11-28 22:49:42,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3702540.0, ans=0.2 2023-11-28 22:49:48,274 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 22:49:49,830 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.01 vs. limit=15.0 2023-11-28 22:49:52,816 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555400 2023-11-28 22:49:53,380 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.80 vs. limit=15.0 2023-11-28 22:49:56,848 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2300, loss[loss=0.07869, simple_loss=0.1204, pruned_loss=0.01404, audio_tagging_loss=0.004432, over 15708.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.0897, pruned_loss=0.01211, audio_tagging_loss=0.008523, over 3049142.66 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:49:58,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3702673.3333333335, ans=0.0 2023-11-28 22:50:09,262 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.895e+01 9.474e+01 1.045e+02 1.271e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-28 22:50:12,362 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.50 vs. limit=10.0 2023-11-28 22:50:16,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3702740.0, ans=0.1 2023-11-28 22:50:24,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3702806.6666666665, ans=0.0 2023-11-28 22:50:33,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3702873.3333333335, ans=0.0 2023-11-28 22:50:33,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3702873.3333333335, ans=0.125 2023-11-28 22:50:38,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3702873.3333333335, ans=0.07 2023-11-28 22:50:51,508 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 22:50:55,219 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555450 2023-11-28 22:50:58,632 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2350, loss[loss=0.07497, simple_loss=0.1063, pruned_loss=0.01085, audio_tagging_loss=0.01099, over 14444.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08904, pruned_loss=0.01202, audio_tagging_loss=0.008663, over 3038956.75 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:51:01,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3703006.6666666665, ans=0.125 2023-11-28 22:51:11,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3703073.3333333335, ans=0.0 2023-11-28 22:51:21,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3703140.0, ans=0.0 2023-11-28 22:51:28,319 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.19 vs. limit=6.0 2023-11-28 22:51:36,798 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.94 vs. limit=10.0 2023-11-28 22:51:47,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3703273.3333333335, ans=0.1 2023-11-28 22:51:56,421 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555500 2023-11-28 22:51:59,840 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2400, loss[loss=0.05686, simple_loss=0.07687, pruned_loss=0.008841, audio_tagging_loss=0.009583, over 15899.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08853, pruned_loss=0.01192, audio_tagging_loss=0.008769, over 3042227.85 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:52:11,681 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.887e+01 9.633e+01 1.018e+02 1.587e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-28 22:52:37,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3703540.0, ans=0.125 2023-11-28 22:52:38,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3703540.0, ans=0.2 2023-11-28 22:52:57,409 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555550 2023-11-28 22:52:57,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3703606.6666666665, ans=0.0 2023-11-28 22:53:01,555 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2450, loss[loss=0.06261, simple_loss=0.0841, pruned_loss=0.008125, audio_tagging_loss=0.01243, over 14983.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08768, pruned_loss=0.01169, audio_tagging_loss=0.008847, over 3042958.62 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:53:17,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3703740.0, ans=0.125 2023-11-28 22:53:20,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3703740.0, ans=0.125 2023-11-28 22:53:20,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3703740.0, ans=0.2 2023-11-28 22:53:39,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3703873.3333333335, ans=10.0 2023-11-28 22:53:47,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3703873.3333333335, ans=0.125 2023-11-28 22:53:50,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3703940.0, ans=0.0 2023-11-28 22:53:56,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3703940.0, ans=0.0 2023-11-28 22:53:59,816 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555600 2023-11-28 22:54:01,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3703940.0, ans=0.0 2023-11-28 22:54:04,186 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2500, loss[loss=0.06, simple_loss=0.07612, pruned_loss=0.009995, audio_tagging_loss=0.01194, over 16488.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08813, pruned_loss=0.0119, audio_tagging_loss=0.00886, over 3037836.35 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 22:54:17,251 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 9.036e+01 9.436e+01 1.021e+02 1.491e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-28 22:54:23,616 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.61 vs. limit=22.5 2023-11-28 22:54:24,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3704073.3333333335, ans=0.125 2023-11-28 22:54:36,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3704140.0, ans=0.125 2023-11-28 22:54:39,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3704140.0, ans=0.125 2023-11-28 22:54:42,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3704206.6666666665, ans=0.125 2023-11-28 22:55:03,018 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555650 2023-11-28 22:55:03,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3704273.3333333335, ans=0.125 2023-11-28 22:55:04,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3704273.3333333335, ans=0.0 2023-11-28 22:55:05,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3704340.0, ans=0.125 2023-11-28 22:55:05,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3704340.0, ans=0.125 2023-11-28 22:55:06,521 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2550, loss[loss=0.05841, simple_loss=0.08108, pruned_loss=0.009803, audio_tagging_loss=0.008063, over 14671.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08864, pruned_loss=0.01205, audio_tagging_loss=0.00879, over 3041726.76 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:55:06,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3704340.0, ans=0.0 2023-11-28 22:55:06,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3704340.0, ans=0.1 2023-11-28 22:55:24,558 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=12.0 2023-11-28 22:56:00,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3704606.6666666665, ans=0.025 2023-11-28 22:56:04,089 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555700 2023-11-28 22:56:07,436 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2600, loss[loss=0.065, simple_loss=0.09124, pruned_loss=0.01075, audio_tagging_loss=0.008633, over 15475.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08807, pruned_loss=0.01189, audio_tagging_loss=0.008757, over 3043716.41 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:56:15,696 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=12.0 2023-11-28 22:56:20,788 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.191e+01 8.832e+01 9.497e+01 1.024e+02 1.176e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 22:56:26,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3704740.0, ans=0.2 2023-11-28 22:56:37,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3704806.6666666665, ans=0.125 2023-11-28 22:56:48,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3704873.3333333335, ans=0.0 2023-11-28 22:56:54,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3704873.3333333335, ans=0.0 2023-11-28 22:57:05,586 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555750 2023-11-28 22:57:09,005 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2650, loss[loss=0.06268, simple_loss=0.08339, pruned_loss=0.008549, audio_tagging_loss=0.01244, over 15037.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08793, pruned_loss=0.01187, audio_tagging_loss=0.008675, over 3043443.15 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:57:34,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3705140.0, ans=0.0 2023-11-28 22:57:35,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3705140.0, ans=0.07 2023-11-28 22:57:42,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3705140.0, ans=0.0 2023-11-28 22:57:46,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3705206.6666666665, ans=0.1 2023-11-28 22:58:07,090 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555800 2023-11-28 22:58:11,036 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2700, loss[loss=0.08968, simple_loss=0.13, pruned_loss=0.01834, audio_tagging_loss=0.00637, over 16851.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08848, pruned_loss=0.01188, audio_tagging_loss=0.008619, over 3043710.14 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:58:21,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3705340.0, ans=0.1 2023-11-28 22:58:24,440 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.962e+01 9.013e+01 9.562e+01 1.012e+02 1.188e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-28 22:58:43,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3705473.3333333335, ans=0.125 2023-11-28 22:58:47,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3705540.0, ans=0.125 2023-11-28 22:59:09,238 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555850 2023-11-28 22:59:09,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3705606.6666666665, ans=0.125 2023-11-28 22:59:12,647 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2750, loss[loss=0.06244, simple_loss=0.08647, pruned_loss=0.01039, audio_tagging_loss=0.008819, over 15749.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08904, pruned_loss=0.01203, audio_tagging_loss=0.008536, over 3047779.35 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 22:59:14,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3705673.3333333335, ans=0.1 2023-11-28 22:59:23,125 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.98 vs. limit=15.0 2023-11-28 22:59:35,480 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2023-11-28 22:59:52,938 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:00:03,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3705940.0, ans=0.0 2023-11-28 23:00:07,603 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:00:10,035 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555900 2023-11-28 23:00:13,454 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2800, loss[loss=0.06839, simple_loss=0.09289, pruned_loss=0.01241, audio_tagging_loss=0.009535, over 14930.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08921, pruned_loss=0.01211, audio_tagging_loss=0.008536, over 3043355.96 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:00:19,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.40 vs. limit=15.0 2023-11-28 23:00:22,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3706006.6666666665, ans=0.125 2023-11-28 23:00:27,366 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 8.973e+01 9.470e+01 1.013e+02 1.282e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-28 23:00:35,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3706073.3333333335, ans=0.125 2023-11-28 23:00:47,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3706140.0, ans=0.05 2023-11-28 23:00:50,663 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2023-11-28 23:00:57,169 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2023-11-28 23:01:07,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3706273.3333333335, ans=0.125 2023-11-28 23:01:08,319 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2023-11-28 23:01:12,302 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 555950 2023-11-28 23:01:13,061 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.50 vs. limit=12.0 2023-11-28 23:01:16,213 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2850, loss[loss=0.06286, simple_loss=0.08469, pruned_loss=0.01192, audio_tagging_loss=0.008598, over 15502.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.09, pruned_loss=0.01215, audio_tagging_loss=0.008389, over 3037287.22 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:01:29,549 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2023-11-28 23:02:09,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3706606.6666666665, ans=0.0 2023-11-28 23:02:14,287 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556000 2023-11-28 23:02:21,335 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2900, loss[loss=0.05442, simple_loss=0.07967, pruned_loss=0.006532, audio_tagging_loss=0.008053, over 15103.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08913, pruned_loss=0.0121, audio_tagging_loss=0.008455, over 3036242.14 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:02:34,746 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.253e+01 8.955e+01 9.573e+01 1.059e+02 1.416e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-28 23:02:36,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3706740.0, ans=0.09899494936611666 2023-11-28 23:02:40,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3706740.0, ans=0.0 2023-11-28 23:02:47,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3706806.6666666665, ans=0.1 2023-11-28 23:03:11,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3706940.0, ans=0.125 2023-11-28 23:03:14,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3706940.0, ans=0.125 2023-11-28 23:03:19,307 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556050 2023-11-28 23:03:19,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3706940.0, ans=0.125 2023-11-28 23:03:20,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3706940.0, ans=0.125 2023-11-28 23:03:22,899 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 2950, loss[loss=0.05025, simple_loss=0.06319, pruned_loss=0.007686, audio_tagging_loss=0.01097, over 15500.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08927, pruned_loss=0.01207, audio_tagging_loss=0.008503, over 3036798.62 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:03:49,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3707140.0, ans=0.125 2023-11-28 23:04:13,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3707273.3333333335, ans=0.1 2023-11-28 23:04:14,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3707273.3333333335, ans=0.07 2023-11-28 23:04:21,447 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556100 2023-11-28 23:04:24,910 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3000, loss[loss=0.06625, simple_loss=0.0934, pruned_loss=0.009439, audio_tagging_loss=0.01011, over 15817.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08925, pruned_loss=0.01196, audio_tagging_loss=0.008527, over 3038713.36 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:04:24,911 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-28 23:04:42,446 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.7825, 2.0534, 3.3813, 3.4519, 3.2429, 3.3498, 3.1931, 3.4191], device='cuda:2') 2023-11-28 23:05:04,354 INFO [train_asr.py:1267] (2/4) Epoch 47, validation: loss=0.05749, simple_loss=0.05049, pruned_loss=0.005328, audio_tagging_loss=0.02692, over 4681554.00 frames. 2023-11-28 23:05:04,355 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-28 23:05:12,468 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.60 vs. limit=10.0 2023-11-28 23:05:15,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3707406.6666666665, ans=0.0 2023-11-28 23:05:18,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3707406.6666666665, ans=0.2 2023-11-28 23:05:20,042 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.840e+01 9.232e+01 9.628e+01 1.042e+02 1.260e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-28 23:05:45,554 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.66 vs. limit=15.0 2023-11-28 23:06:02,511 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556150 2023-11-28 23:06:05,931 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3050, loss[loss=0.0458, simple_loss=0.05939, pruned_loss=0.005019, audio_tagging_loss=0.01109, over 14866.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08976, pruned_loss=0.01193, audio_tagging_loss=0.00853, over 3033224.36 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:06:09,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3707673.3333333335, ans=0.2 2023-11-28 23:06:16,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3707673.3333333335, ans=0.125 2023-11-28 23:06:32,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3707806.6666666665, ans=0.125 2023-11-28 23:06:36,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3707806.6666666665, ans=0.125 2023-11-28 23:06:44,906 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:07:01,413 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.32 vs. limit=15.0 2023-11-28 23:07:04,303 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556200 2023-11-28 23:07:08,268 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3100, loss[loss=0.06085, simple_loss=0.08206, pruned_loss=0.01157, audio_tagging_loss=0.008252, over 13853.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.09003, pruned_loss=0.01208, audio_tagging_loss=0.008499, over 3034637.80 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:07:08,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3708006.6666666665, ans=0.0 2023-11-28 23:07:08,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3708006.6666666665, ans=0.125 2023-11-28 23:07:13,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3708006.6666666665, ans=0.1 2023-11-28 23:07:23,337 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 9.064e+01 9.672e+01 1.048e+02 1.274e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-28 23:07:28,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3708073.3333333335, ans=0.125 2023-11-28 23:07:32,190 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2023-11-28 23:08:05,109 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556250 2023-11-28 23:08:08,490 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3150, loss[loss=0.07188, simple_loss=0.09551, pruned_loss=0.01549, audio_tagging_loss=0.008632, over 15101.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08992, pruned_loss=0.01216, audio_tagging_loss=0.008539, over 3035181.26 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:08:22,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3708406.6666666665, ans=0.125 2023-11-28 23:08:28,031 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.56 vs. limit=15.0 2023-11-28 23:08:44,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3708473.3333333335, ans=0.04949747468305833 2023-11-28 23:09:00,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3708606.6666666665, ans=0.05 2023-11-28 23:09:07,556 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556300 2023-11-28 23:09:10,927 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3200, loss[loss=0.08635, simple_loss=0.1141, pruned_loss=0.01761, audio_tagging_loss=0.01167, over 14642.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.08961, pruned_loss=0.01219, audio_tagging_loss=0.008704, over 3037220.43 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:09:12,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3708673.3333333335, ans=0.1 2023-11-28 23:09:22,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3708740.0, ans=0.125 2023-11-28 23:09:26,538 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.731e+01 9.590e+01 1.027e+02 1.409e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-28 23:09:31,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3708740.0, ans=0.0 2023-11-28 23:09:41,705 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.66 vs. limit=10.0 2023-11-28 23:09:54,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3708873.3333333335, ans=0.1 2023-11-28 23:10:04,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=22.5 2023-11-28 23:10:09,028 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556350 2023-11-28 23:10:09,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3708940.0, ans=0.0 2023-11-28 23:10:12,489 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3250, loss[loss=0.05133, simple_loss=0.06228, pruned_loss=0.009297, audio_tagging_loss=0.01089, over 14211.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08917, pruned_loss=0.01215, audio_tagging_loss=0.008758, over 3030457.96 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:10:20,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3709006.6666666665, ans=0.2 2023-11-28 23:10:25,315 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.15 vs. limit=10.0 2023-11-28 23:10:27,163 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:10:29,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3709073.3333333335, ans=0.125 2023-11-28 23:10:36,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3709140.0, ans=0.0 2023-11-28 23:10:38,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3709140.0, ans=0.0 2023-11-28 23:10:42,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3709140.0, ans=0.125 2023-11-28 23:10:58,538 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.79 vs. limit=22.5 2023-11-28 23:10:59,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3709206.6666666665, ans=0.2 2023-11-28 23:11:09,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3709273.3333333335, ans=0.125 2023-11-28 23:11:09,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3709273.3333333335, ans=0.125 2023-11-28 23:11:10,668 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556400 2023-11-28 23:11:14,529 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3300, loss[loss=0.07357, simple_loss=0.1122, pruned_loss=0.01055, audio_tagging_loss=0.006938, over 15288.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08866, pruned_loss=0.01194, audio_tagging_loss=0.008707, over 3036630.84 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:11:27,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3709406.6666666665, ans=0.125 2023-11-28 23:11:31,273 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.994e+01 9.101e+01 9.601e+01 1.014e+02 1.380e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-28 23:11:31,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3709406.6666666665, ans=0.125 2023-11-28 23:11:41,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3709473.3333333335, ans=0.125 2023-11-28 23:12:12,413 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556450 2023-11-28 23:12:16,460 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3350, loss[loss=0.05605, simple_loss=0.08036, pruned_loss=0.004539, audio_tagging_loss=0.01133, over 15331.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08888, pruned_loss=0.01195, audio_tagging_loss=0.008643, over 3040701.71 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:12:17,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3709673.3333333335, ans=0.125 2023-11-28 23:12:24,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3709673.3333333335, ans=0.0 2023-11-28 23:12:49,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=22.5 2023-11-28 23:12:52,155 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=15.0 2023-11-28 23:12:56,746 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.09 vs. limit=22.5 2023-11-28 23:13:11,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3709940.0, ans=0.125 2023-11-28 23:13:14,716 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556500 2023-11-28 23:13:18,068 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3400, loss[loss=0.06644, simple_loss=0.09131, pruned_loss=0.01052, audio_tagging_loss=0.01026, over 16119.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09012, pruned_loss=0.01216, audio_tagging_loss=0.008508, over 3033821.33 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:13:18,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3710006.6666666665, ans=0.125 2023-11-28 23:13:18,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3710006.6666666665, ans=0.1 2023-11-28 23:13:28,373 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:13:33,976 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 8.795e+01 9.500e+01 1.053e+02 1.456e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-28 23:14:05,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3710206.6666666665, ans=0.0 2023-11-28 23:14:11,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3710273.3333333335, ans=0.125 2023-11-28 23:14:14,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3710273.3333333335, ans=0.0 2023-11-28 23:14:16,352 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556550 2023-11-28 23:14:16,583 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:14:19,888 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3450, loss[loss=0.05756, simple_loss=0.08681, pruned_loss=0.009039, audio_tagging_loss=0.005115, over 14485.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08991, pruned_loss=0.01209, audio_tagging_loss=0.008431, over 3039912.19 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:14:20,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3710340.0, ans=0.0 2023-11-28 23:14:28,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3710340.0, ans=0.125 2023-11-28 23:14:46,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3710473.3333333335, ans=0.125 2023-11-28 23:14:57,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3710540.0, ans=0.2 2023-11-28 23:15:01,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3710540.0, ans=0.0 2023-11-28 23:15:17,587 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556600 2023-11-28 23:15:21,985 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3500, loss[loss=0.06295, simple_loss=0.0891, pruned_loss=0.01008, audio_tagging_loss=0.008323, over 14262.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08996, pruned_loss=0.0121, audio_tagging_loss=0.008383, over 3033213.79 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:15:38,394 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 9.007e+01 9.535e+01 1.020e+02 1.277e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-28 23:15:53,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3710806.6666666665, ans=0.0 2023-11-28 23:15:54,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3710806.6666666665, ans=0.0 2023-11-28 23:15:56,582 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:15:59,761 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.25 vs. limit=12.0 2023-11-28 23:16:20,348 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556650 2023-11-28 23:16:24,397 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3550, loss[loss=0.07672, simple_loss=0.1087, pruned_loss=0.01454, audio_tagging_loss=0.007851, over 16636.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08936, pruned_loss=0.01196, audio_tagging_loss=0.008386, over 3038642.82 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:16:44,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3711073.3333333335, ans=0.0 2023-11-28 23:17:00,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3711206.6666666665, ans=0.125 2023-11-28 23:17:01,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3711206.6666666665, ans=0.125 2023-11-28 23:17:13,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3711273.3333333335, ans=0.125 2023-11-28 23:17:23,226 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556700 2023-11-28 23:17:26,711 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3600, loss[loss=0.05783, simple_loss=0.08697, pruned_loss=0.007151, audio_tagging_loss=0.007189, over 15411.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08972, pruned_loss=0.01192, audio_tagging_loss=0.008441, over 3041710.85 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:17:30,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3711340.0, ans=0.5 2023-11-28 23:17:42,439 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.234e+01 8.750e+01 9.399e+01 1.010e+02 1.318e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-28 23:17:48,997 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.50 vs. limit=10.0 2023-11-28 23:17:49,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3711473.3333333335, ans=0.125 2023-11-28 23:17:49,980 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.70 vs. limit=22.5 2023-11-28 23:17:53,025 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:17:53,347 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=22.5 2023-11-28 23:18:17,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3711606.6666666665, ans=0.2 2023-11-28 23:18:18,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3711606.6666666665, ans=0.125 2023-11-28 23:18:23,417 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556750 2023-11-28 23:18:27,652 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3650, loss[loss=0.07153, simple_loss=0.09249, pruned_loss=0.01516, audio_tagging_loss=0.01013, over 14818.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08878, pruned_loss=0.01181, audio_tagging_loss=0.008528, over 3038346.52 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:18:31,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3711673.3333333335, ans=0.0 2023-11-28 23:18:44,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=3711740.0, ans=6.0 2023-11-28 23:18:59,494 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.93 vs. limit=15.0 2023-11-28 23:19:25,322 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556800 2023-11-28 23:19:29,237 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-28 23:19:29,792 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3700, loss[loss=0.06261, simple_loss=0.08465, pruned_loss=0.009585, audio_tagging_loss=0.0107, over 15568.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08968, pruned_loss=0.01194, audio_tagging_loss=0.008478, over 3046024.69 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:19:31,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3712006.6666666665, ans=0.0 2023-11-28 23:19:47,494 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.675e+01 8.931e+01 9.622e+01 1.040e+02 1.365e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-28 23:19:47,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3712073.3333333335, ans=0.2 2023-11-28 23:19:52,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3712073.3333333335, ans=0.125 2023-11-28 23:19:58,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3712140.0, ans=0.0 2023-11-28 23:20:13,966 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:20:19,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3712273.3333333335, ans=0.125 2023-11-28 23:20:28,727 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556850 2023-11-28 23:20:32,145 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3750, loss[loss=0.06426, simple_loss=0.0878, pruned_loss=0.01107, audio_tagging_loss=0.009293, over 15518.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08958, pruned_loss=0.01177, audio_tagging_loss=0.008547, over 3045713.41 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:20:39,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3712340.0, ans=0.0 2023-11-28 23:20:45,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3712406.6666666665, ans=0.0 2023-11-28 23:20:47,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3712406.6666666665, ans=0.0 2023-11-28 23:20:49,884 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2023-11-28 23:21:00,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3712473.3333333335, ans=0.0 2023-11-28 23:21:06,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3712473.3333333335, ans=0.1 2023-11-28 23:21:16,348 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2023-11-28 23:21:16,897 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:21:18,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3712540.0, ans=0.2 2023-11-28 23:21:26,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=3712606.6666666665, ans=0.02 2023-11-28 23:21:30,099 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556900 2023-11-28 23:21:32,997 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.01 vs. limit=22.5 2023-11-28 23:21:33,584 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3800, loss[loss=0.05635, simple_loss=0.07524, pruned_loss=0.01044, audio_tagging_loss=0.008293, over 14410.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08997, pruned_loss=0.01199, audio_tagging_loss=0.008639, over 3050754.71 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:21:35,451 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=12.0 2023-11-28 23:21:52,343 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 9.439e+01 1.001e+02 1.076e+02 2.686e+02, threshold=2.002e+02, percent-clipped=1.0 2023-11-28 23:22:07,255 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.03 vs. limit=22.5 2023-11-28 23:22:14,022 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-11-28 23:22:22,528 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-28 23:22:31,901 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 556950 2023-11-28 23:22:35,415 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3850, loss[loss=0.06532, simple_loss=0.08766, pruned_loss=0.01128, audio_tagging_loss=0.01021, over 15089.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.0903, pruned_loss=0.01202, audio_tagging_loss=0.008567, over 3059325.30 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:22:44,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3713006.6666666665, ans=0.125 2023-11-28 23:22:47,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3713073.3333333335, ans=0.125 2023-11-28 23:22:49,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3713073.3333333335, ans=0.2 2023-11-28 23:22:52,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3713073.3333333335, ans=0.125 2023-11-28 23:23:06,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3713140.0, ans=0.0 2023-11-28 23:23:27,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3713273.3333333335, ans=0.125 2023-11-28 23:23:32,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3713273.3333333335, ans=0.0 2023-11-28 23:23:33,676 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557000 2023-11-28 23:23:38,031 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3900, loss[loss=0.0636, simple_loss=0.0777, pruned_loss=0.0129, audio_tagging_loss=0.01184, over 15839.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.0894, pruned_loss=0.01194, audio_tagging_loss=0.008718, over 3051020.68 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:23:45,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3713340.0, ans=0.125 2023-11-28 23:23:55,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.114e+01 8.938e+01 9.522e+01 1.035e+02 1.409e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 23:24:01,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3713473.3333333335, ans=0.125 2023-11-28 23:24:07,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3713473.3333333335, ans=0.2 2023-11-28 23:24:33,083 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-11-28 23:24:34,885 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557050 2023-11-28 23:24:36,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3713606.6666666665, ans=0.025 2023-11-28 23:24:38,297 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 3950, loss[loss=0.07259, simple_loss=0.097, pruned_loss=0.01484, audio_tagging_loss=0.009244, over 15133.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09015, pruned_loss=0.01206, audio_tagging_loss=0.008743, over 3059055.38 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:24:52,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3713740.0, ans=0.0 2023-11-28 23:24:56,168 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:25:09,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3713806.6666666665, ans=0.0 2023-11-28 23:25:20,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3713873.3333333335, ans=0.07 2023-11-28 23:25:33,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3713940.0, ans=0.125 2023-11-28 23:25:37,811 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557100 2023-11-28 23:25:41,344 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4000, loss[loss=0.06202, simple_loss=0.08333, pruned_loss=0.0105, audio_tagging_loss=0.00986, over 14808.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.09044, pruned_loss=0.01199, audio_tagging_loss=0.008755, over 3053462.39 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:25:48,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3714006.6666666665, ans=0.0 2023-11-28 23:25:50,821 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2023-11-28 23:25:59,961 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.580e+01 8.920e+01 9.493e+01 1.035e+02 1.641e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-28 23:26:07,024 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 2023-11-28 23:26:14,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3714140.0, ans=0.2 2023-11-28 23:26:38,952 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557150 2023-11-28 23:26:42,993 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4050, loss[loss=0.05597, simple_loss=0.08456, pruned_loss=0.007771, audio_tagging_loss=0.005922, over 15755.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.09046, pruned_loss=0.01202, audio_tagging_loss=0.008728, over 3049590.13 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:26:47,689 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:26:51,103 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.21 vs. limit=15.0 2023-11-28 23:26:53,129 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:26:58,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3714406.6666666665, ans=0.0 2023-11-28 23:27:27,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3714540.0, ans=0.1 2023-11-28 23:27:31,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3714606.6666666665, ans=0.125 2023-11-28 23:27:41,467 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557200 2023-11-28 23:27:44,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3714673.3333333335, ans=0.125 2023-11-28 23:27:45,252 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4100, loss[loss=0.09964, simple_loss=0.1382, pruned_loss=0.02387, audio_tagging_loss=0.006668, over 14800.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09049, pruned_loss=0.012, audio_tagging_loss=0.008768, over 3053422.59 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:27:50,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3714673.3333333335, ans=0.2 2023-11-28 23:28:03,219 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.823e+01 9.138e+01 9.541e+01 1.028e+02 1.498e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-28 23:28:21,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3714873.3333333335, ans=0.125 2023-11-28 23:28:27,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3714873.3333333335, ans=0.0 2023-11-28 23:28:28,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3714873.3333333335, ans=0.2 2023-11-28 23:28:32,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3714873.3333333335, ans=0.0 2023-11-28 23:28:43,449 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557250 2023-11-28 23:28:46,822 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4150, loss[loss=0.06262, simple_loss=0.07703, pruned_loss=0.01479, audio_tagging_loss=0.009318, over 14083.00 frames. ], tot_loss[loss=0.06611, simple_loss=0.09059, pruned_loss=0.01211, audio_tagging_loss=0.008704, over 3045768.50 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:28:56,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3715006.6666666665, ans=0.0 2023-11-28 23:29:10,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3715140.0, ans=0.1 2023-11-28 23:29:11,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3715140.0, ans=0.125 2023-11-28 23:29:16,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3715140.0, ans=0.125 2023-11-28 23:29:21,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3715140.0, ans=0.1 2023-11-28 23:29:32,949 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-28 23:29:33,540 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:29:33,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3715206.6666666665, ans=0.1 2023-11-28 23:29:44,724 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557300 2023-11-28 23:29:48,183 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4200, loss[loss=0.06051, simple_loss=0.09046, pruned_loss=0.008952, audio_tagging_loss=0.006324, over 15382.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08925, pruned_loss=0.01185, audio_tagging_loss=0.008633, over 3053577.43 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:29:51,705 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2023-11-28 23:29:57,689 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.22 vs. limit=10.0 2023-11-28 23:29:58,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3715340.0, ans=0.125 2023-11-28 23:30:06,719 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.859e+01 9.416e+01 1.036e+02 1.524e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-28 23:30:20,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3715473.3333333335, ans=0.0 2023-11-28 23:30:20,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3715473.3333333335, ans=0.2 2023-11-28 23:30:34,356 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:30:43,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3715606.6666666665, ans=0.0 2023-11-28 23:30:44,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3715606.6666666665, ans=0.125 2023-11-28 23:30:46,472 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557350 2023-11-28 23:30:49,967 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4250, loss[loss=0.06185, simple_loss=0.08658, pruned_loss=0.009586, audio_tagging_loss=0.008978, over 14892.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08946, pruned_loss=0.01186, audio_tagging_loss=0.008559, over 3054623.69 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:30:59,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3715673.3333333335, ans=0.125 2023-11-28 23:31:01,738 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.24 vs. limit=15.0 2023-11-28 23:31:06,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3715740.0, ans=0.125 2023-11-28 23:31:15,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3715806.6666666665, ans=15.0 2023-11-28 23:31:19,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3715806.6666666665, ans=0.2 2023-11-28 23:31:19,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3715806.6666666665, ans=0.2 2023-11-28 23:31:31,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3715873.3333333335, ans=0.125 2023-11-28 23:31:43,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3715940.0, ans=0.125 2023-11-28 23:31:47,687 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557400 2023-11-28 23:31:51,608 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4300, loss[loss=0.05377, simple_loss=0.07064, pruned_loss=0.009177, audio_tagging_loss=0.009272, over 14666.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08942, pruned_loss=0.01195, audio_tagging_loss=0.008532, over 3051308.34 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:31:54,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3716006.6666666665, ans=0.0 2023-11-28 23:31:59,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3716006.6666666665, ans=0.125 2023-11-28 23:32:07,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3716073.3333333335, ans=0.2 2023-11-28 23:32:07,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3716073.3333333335, ans=0.2 2023-11-28 23:32:09,691 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.291e+01 9.110e+01 9.607e+01 1.023e+02 1.243e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-28 23:32:14,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3716073.3333333335, ans=0.1 2023-11-28 23:32:48,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3716273.3333333335, ans=0.2 2023-11-28 23:32:49,427 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557450 2023-11-28 23:32:53,491 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4350, loss[loss=0.07313, simple_loss=0.09457, pruned_loss=0.01523, audio_tagging_loss=0.01061, over 15237.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08987, pruned_loss=0.01196, audio_tagging_loss=0.008437, over 3050084.41 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:33:13,383 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.00 vs. limit=15.0 2023-11-28 23:33:34,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3716540.0, ans=0.2 2023-11-28 23:33:52,134 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557500 2023-11-28 23:33:52,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3716606.6666666665, ans=0.125 2023-11-28 23:33:55,543 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4400, loss[loss=0.06623, simple_loss=0.08555, pruned_loss=0.01355, audio_tagging_loss=0.009907, over 15435.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09067, pruned_loss=0.01216, audio_tagging_loss=0.00835, over 3050345.99 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:34:15,634 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.931e+01 9.001e+01 9.645e+01 1.064e+02 1.630e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-28 23:34:18,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3716740.0, ans=0.0 2023-11-28 23:34:32,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3716873.3333333335, ans=0.125 2023-11-28 23:34:46,697 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2023-11-28 23:34:53,960 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557550 2023-11-28 23:34:57,356 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4450, loss[loss=0.06128, simple_loss=0.08646, pruned_loss=0.008588, audio_tagging_loss=0.00946, over 15839.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09131, pruned_loss=0.01219, audio_tagging_loss=0.008384, over 3057077.62 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:35:03,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3717006.6666666665, ans=0.0 2023-11-28 23:35:04,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3717006.6666666665, ans=0.0 2023-11-28 23:35:32,654 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2023-11-28 23:35:36,331 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0 2023-11-28 23:35:36,541 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.72 vs. limit=15.0 2023-11-28 23:35:55,801 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557600 2023-11-28 23:36:00,214 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4500, loss[loss=0.09421, simple_loss=0.1397, pruned_loss=0.01965, audio_tagging_loss=0.004731, over 15805.00 frames. ], tot_loss[loss=0.0664, simple_loss=0.09157, pruned_loss=0.01234, audio_tagging_loss=0.008266, over 3059616.51 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:36:19,789 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 8.979e+01 9.760e+01 1.042e+02 1.445e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-28 23:36:24,934 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:36:30,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3717473.3333333335, ans=0.0 2023-11-28 23:36:34,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3717473.3333333335, ans=0.1 2023-11-28 23:36:44,020 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.40 vs. limit=15.0 2023-11-28 23:36:58,515 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557650 2023-11-28 23:37:01,989 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4550, loss[loss=0.07651, simple_loss=0.1058, pruned_loss=0.01849, audio_tagging_loss=0.005145, over 15333.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.0902, pruned_loss=0.01202, audio_tagging_loss=0.008413, over 3058704.20 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:37:20,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3717740.0, ans=15.0 2023-11-28 23:37:31,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3717806.6666666665, ans=0.125 2023-11-28 23:37:32,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3717806.6666666665, ans=0.125 2023-11-28 23:37:38,751 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:37:48,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3717873.3333333335, ans=0.125 2023-11-28 23:37:50,811 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:37:58,116 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:37:58,992 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557700 2023-11-28 23:38:00,234 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:38:02,398 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4600, loss[loss=0.06906, simple_loss=0.09543, pruned_loss=0.01462, audio_tagging_loss=0.006723, over 13750.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08965, pruned_loss=0.01211, audio_tagging_loss=0.008556, over 3050086.63 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:38:16,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3718073.3333333335, ans=0.0 2023-11-28 23:38:22,948 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.797e+01 9.011e+01 9.487e+01 1.017e+02 1.254e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-28 23:38:26,008 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.20 vs. limit=22.5 2023-11-28 23:38:39,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3718206.6666666665, ans=0.2 2023-11-28 23:38:57,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3718273.3333333335, ans=0.2 2023-11-28 23:39:01,155 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557750 2023-11-28 23:39:04,621 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4650, loss[loss=0.07258, simple_loss=0.1067, pruned_loss=0.01232, audio_tagging_loss=0.006933, over 15279.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08931, pruned_loss=0.0119, audio_tagging_loss=0.0087, over 3051966.33 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:39:05,200 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.68 vs. limit=22.5 2023-11-28 23:39:06,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3718340.0, ans=0.125 2023-11-28 23:39:41,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3718540.0, ans=0.125 2023-11-28 23:39:46,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3718540.0, ans=0.125 2023-11-28 23:39:49,828 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:39:55,476 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=22.5 2023-11-28 23:40:00,378 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=15.43 vs. limit=15.0 2023-11-28 23:40:03,747 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557800 2023-11-28 23:40:05,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3718606.6666666665, ans=0.2 2023-11-28 23:40:07,626 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4700, loss[loss=0.06414, simple_loss=0.0872, pruned_loss=0.01167, audio_tagging_loss=0.008873, over 15185.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.0892, pruned_loss=0.0121, audio_tagging_loss=0.008714, over 3051808.41 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:40:15,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3718673.3333333335, ans=0.0 2023-11-28 23:40:25,992 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.766e+01 9.230e+01 9.778e+01 1.067e+02 1.457e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-28 23:40:55,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3718940.0, ans=0.125 2023-11-28 23:41:04,891 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557850 2023-11-28 23:41:05,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3718940.0, ans=0.0 2023-11-28 23:41:08,264 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4750, loss[loss=0.06763, simple_loss=0.08479, pruned_loss=0.01359, audio_tagging_loss=0.01164, over 15661.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08835, pruned_loss=0.01204, audio_tagging_loss=0.008838, over 3042469.41 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:41:15,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3719006.6666666665, ans=0.125 2023-11-28 23:41:23,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3719073.3333333335, ans=0.0 2023-11-28 23:42:06,281 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557900 2023-11-28 23:42:10,509 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4800, loss[loss=0.07715, simple_loss=0.09738, pruned_loss=0.01524, audio_tagging_loss=0.01322, over 15120.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08872, pruned_loss=0.01209, audio_tagging_loss=0.008878, over 3040046.51 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:42:25,024 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.61 vs. limit=15.0 2023-11-28 23:42:28,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3719406.6666666665, ans=0.2 2023-11-28 23:42:29,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2023-11-28 23:42:30,259 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.346e+01 9.011e+01 9.522e+01 1.013e+02 1.336e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-28 23:42:35,898 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.93 vs. limit=15.0 2023-11-28 23:42:55,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3719540.0, ans=0.1 2023-11-28 23:43:09,173 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 557950 2023-11-28 23:43:12,627 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4850, loss[loss=0.07459, simple_loss=0.1112, pruned_loss=0.01183, audio_tagging_loss=0.007136, over 15634.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08779, pruned_loss=0.01177, audio_tagging_loss=0.008981, over 3042010.05 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:43:17,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3719673.3333333335, ans=0.2 2023-11-28 23:43:39,785 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2023-11-28 23:43:44,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3719806.6666666665, ans=0.2 2023-11-28 23:43:52,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3719873.3333333335, ans=0.125 2023-11-28 23:43:54,022 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.48 vs. limit=15.0 2023-11-28 23:44:10,653 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558000 2023-11-28 23:44:14,555 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4900, loss[loss=0.06787, simple_loss=0.09541, pruned_loss=0.01227, audio_tagging_loss=0.007896, over 16973.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08758, pruned_loss=0.01159, audio_tagging_loss=0.008938, over 3043591.35 frames. ], batch size: 63, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:44:19,847 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.69 vs. limit=15.0 2023-11-28 23:44:27,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3720073.3333333335, ans=0.125 2023-11-28 23:44:30,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2023-11-28 23:44:35,707 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.818e+01 9.390e+01 1.021e+02 1.310e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-28 23:44:46,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3720140.0, ans=0.125 2023-11-28 23:45:12,781 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558050 2023-11-28 23:45:12,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3720273.3333333335, ans=0.125 2023-11-28 23:45:12,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3720273.3333333335, ans=0.125 2023-11-28 23:45:16,103 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 4950, loss[loss=0.05031, simple_loss=0.06413, pruned_loss=0.006776, audio_tagging_loss=0.01147, over 14297.00 frames. ], tot_loss[loss=0.06424, simple_loss=0.08787, pruned_loss=0.01154, audio_tagging_loss=0.008767, over 3041862.65 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:45:22,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3720340.0, ans=0.125 2023-11-28 23:45:32,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3720406.6666666665, ans=0.125 2023-11-28 23:46:00,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3720540.0, ans=0.125 2023-11-28 23:46:04,303 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.32 vs. limit=6.0 2023-11-28 23:46:14,348 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558100 2023-11-28 23:46:18,276 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5000, loss[loss=0.06583, simple_loss=0.09124, pruned_loss=0.009586, audio_tagging_loss=0.01063, over 15463.00 frames. ], tot_loss[loss=0.06425, simple_loss=0.08811, pruned_loss=0.0116, audio_tagging_loss=0.008597, over 3039559.18 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:46:19,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3720673.3333333335, ans=0.1 2023-11-28 23:46:35,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3720740.0, ans=0.1 2023-11-28 23:46:38,210 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 8.963e+01 9.566e+01 1.007e+02 2.358e+02, threshold=1.913e+02, percent-clipped=1.0 2023-11-28 23:46:38,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3720740.0, ans=0.05 2023-11-28 23:46:46,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3720806.6666666665, ans=0.0 2023-11-28 23:46:58,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3720873.3333333335, ans=0.1 2023-11-28 23:47:15,724 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558150 2023-11-28 23:47:19,183 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5050, loss[loss=0.0665, simple_loss=0.08978, pruned_loss=0.01515, audio_tagging_loss=0.006461, over 13951.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08861, pruned_loss=0.01167, audio_tagging_loss=0.008501, over 3039419.96 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:47:28,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3721006.6666666665, ans=0.125 2023-11-28 23:47:30,359 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2023-11-28 23:47:38,155 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2023-11-28 23:48:00,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3721206.6666666665, ans=0.2 2023-11-28 23:48:12,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3721273.3333333335, ans=0.125 2023-11-28 23:48:16,738 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558200 2023-11-28 23:48:21,055 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5100, loss[loss=0.04166, simple_loss=0.05279, pruned_loss=0.00559, audio_tagging_loss=0.009673, over 15114.00 frames. ], tot_loss[loss=0.064, simple_loss=0.0878, pruned_loss=0.01156, audio_tagging_loss=0.00853, over 3039714.87 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:48:21,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3721340.0, ans=0.0 2023-11-28 23:48:23,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3721340.0, ans=0.1 2023-11-28 23:48:41,933 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:48:44,096 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.897e+01 9.648e+01 1.044e+02 1.353e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-28 23:48:47,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3721473.3333333335, ans=0.125 2023-11-28 23:48:55,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3721473.3333333335, ans=0.125 2023-11-28 23:49:18,569 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558250 2023-11-28 23:49:21,928 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5150, loss[loss=0.06209, simple_loss=0.09331, pruned_loss=0.008252, audio_tagging_loss=0.007182, over 16154.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08827, pruned_loss=0.01161, audio_tagging_loss=0.008472, over 3035938.12 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 8.0 2023-11-28 23:49:25,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3721673.3333333335, ans=0.125 2023-11-28 23:49:34,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3721740.0, ans=0.0 2023-11-28 23:49:40,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3721740.0, ans=0.04949747468305833 2023-11-28 23:49:50,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3721806.6666666665, ans=0.035 2023-11-28 23:49:53,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3721806.6666666665, ans=0.125 2023-11-28 23:49:56,089 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2023-11-28 23:50:08,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3721873.3333333335, ans=0.125 2023-11-28 23:50:17,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3721940.0, ans=0.125 2023-11-28 23:50:21,676 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558300 2023-11-28 23:50:25,146 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5200, loss[loss=0.05762, simple_loss=0.07254, pruned_loss=0.01351, audio_tagging_loss=0.007839, over 14661.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.08841, pruned_loss=0.01168, audio_tagging_loss=0.00839, over 3043287.39 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:50:32,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3722006.6666666665, ans=0.0 2023-11-28 23:50:46,814 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 9.041e+01 9.653e+01 1.034e+02 1.419e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-28 23:50:50,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3722140.0, ans=0.125 2023-11-28 23:51:00,261 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.19 vs. limit=15.0 2023-11-28 23:51:07,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3722206.6666666665, ans=0.125 2023-11-28 23:51:20,852 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.82 vs. limit=15.0 2023-11-28 23:51:22,532 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558350 2023-11-28 23:51:26,647 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5250, loss[loss=0.05018, simple_loss=0.06617, pruned_loss=0.00931, audio_tagging_loss=0.007786, over 15587.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08866, pruned_loss=0.01187, audio_tagging_loss=0.008399, over 3048469.09 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:51:26,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3722340.0, ans=0.1 2023-11-28 23:51:54,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3722473.3333333335, ans=0.2 2023-11-28 23:52:03,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3722540.0, ans=0.125 2023-11-28 23:52:18,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3722606.6666666665, ans=0.125 2023-11-28 23:52:23,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3722606.6666666665, ans=0.0 2023-11-28 23:52:24,359 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558400 2023-11-28 23:52:28,342 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5300, loss[loss=0.06305, simple_loss=0.08577, pruned_loss=0.01148, audio_tagging_loss=0.008684, over 15475.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08929, pruned_loss=0.01192, audio_tagging_loss=0.008342, over 3045759.80 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:52:29,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3722673.3333333335, ans=0.0 2023-11-28 23:52:43,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3722740.0, ans=0.0 2023-11-28 23:52:45,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3722740.0, ans=0.2 2023-11-28 23:52:50,454 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 9.120e+01 9.836e+01 1.047e+02 1.238e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-28 23:53:03,254 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2023-11-28 23:53:05,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3722873.3333333335, ans=0.0 2023-11-28 23:53:26,169 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558450 2023-11-28 23:53:28,233 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:53:30,159 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5350, loss[loss=0.08011, simple_loss=0.1178, pruned_loss=0.01577, audio_tagging_loss=0.005444, over 15342.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.09039, pruned_loss=0.01205, audio_tagging_loss=0.008315, over 3047074.57 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:53:38,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=3723006.6666666665, ans=0.2 2023-11-28 23:54:05,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3723140.0, ans=0.125 2023-11-28 23:54:27,989 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558500 2023-11-28 23:54:30,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3723340.0, ans=0.125 2023-11-28 23:54:31,506 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5400, loss[loss=0.05558, simple_loss=0.08029, pruned_loss=0.007101, audio_tagging_loss=0.008335, over 14858.00 frames. ], tot_loss[loss=0.06602, simple_loss=0.09107, pruned_loss=0.01208, audio_tagging_loss=0.008402, over 3050140.11 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:54:31,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3723340.0, ans=0.0 2023-11-28 23:54:47,230 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=22.5 2023-11-28 23:54:54,637 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.310e+01 8.983e+01 9.673e+01 1.019e+02 1.246e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-28 23:54:56,580 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.82 vs. limit=15.0 2023-11-28 23:55:12,206 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.89 vs. limit=10.0 2023-11-28 23:55:20,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3723606.6666666665, ans=0.1 2023-11-28 23:55:29,954 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558550 2023-11-28 23:55:32,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3723673.3333333335, ans=0.125 2023-11-28 23:55:33,312 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5450, loss[loss=0.04689, simple_loss=0.05624, pruned_loss=0.008201, audio_tagging_loss=0.01057, over 14401.00 frames. ], tot_loss[loss=0.0662, simple_loss=0.09109, pruned_loss=0.01223, audio_tagging_loss=0.008424, over 3046572.15 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:55:36,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3723673.3333333335, ans=0.125 2023-11-28 23:55:53,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3723740.0, ans=0.125 2023-11-28 23:55:59,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3723806.6666666665, ans=0.125 2023-11-28 23:56:00,033 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.93 vs. limit=15.0 2023-11-28 23:56:12,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3723873.3333333335, ans=0.0 2023-11-28 23:56:12,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3723873.3333333335, ans=0.1 2023-11-28 23:56:13,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3723873.3333333335, ans=0.125 2023-11-28 23:56:18,196 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2023-11-28 23:56:31,902 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558600 2023-11-28 23:56:35,654 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5500, loss[loss=0.0562, simple_loss=0.06373, pruned_loss=0.0104, audio_tagging_loss=0.01393, over 15175.00 frames. ], tot_loss[loss=0.06666, simple_loss=0.09181, pruned_loss=0.01235, audio_tagging_loss=0.008407, over 3052983.07 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:56:49,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3724073.3333333335, ans=0.125 2023-11-28 23:56:57,821 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 8.978e+01 9.679e+01 1.033e+02 1.249e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-28 23:57:13,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3724206.6666666665, ans=0.0 2023-11-28 23:57:25,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3724273.3333333335, ans=0.125 2023-11-28 23:57:33,435 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558650 2023-11-28 23:57:36,769 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5550, loss[loss=0.04571, simple_loss=0.0573, pruned_loss=0.006829, audio_tagging_loss=0.01023, over 17018.00 frames. ], tot_loss[loss=0.06644, simple_loss=0.09126, pruned_loss=0.01223, audio_tagging_loss=0.008582, over 3058585.67 frames. ], batch size: 66, lr: 1.44e-03, grad_scale: 16.0 2023-11-28 23:57:59,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3724406.6666666665, ans=10.0 2023-11-28 23:58:29,346 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:58:35,085 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558700 2023-11-28 23:58:37,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3724673.3333333335, ans=0.0 2023-11-28 23:58:38,524 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5600, loss[loss=0.09172, simple_loss=0.1235, pruned_loss=0.02066, audio_tagging_loss=0.009337, over 15889.00 frames. ], tot_loss[loss=0.06668, simple_loss=0.09153, pruned_loss=0.01228, audio_tagging_loss=0.008633, over 3053874.50 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:58:45,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3724673.3333333335, ans=0.125 2023-11-28 23:58:50,850 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=15.0 2023-11-28 23:58:54,376 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=15.0 2023-11-28 23:59:00,843 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 9.219e+01 9.778e+01 1.037e+02 1.295e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-28 23:59:05,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3724806.6666666665, ans=0.1 2023-11-28 23:59:08,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3724806.6666666665, ans=0.125 2023-11-28 23:59:14,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3724873.3333333335, ans=0.1 2023-11-28 23:59:18,436 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-28 23:59:19,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3724873.3333333335, ans=0.125 2023-11-28 23:59:24,208 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-28 23:59:36,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3724940.0, ans=0.125 2023-11-28 23:59:36,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=15.0 2023-11-28 23:59:37,047 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558750 2023-11-28 23:59:40,484 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5650, loss[loss=0.05477, simple_loss=0.066, pruned_loss=0.01224, audio_tagging_loss=0.00953, over 14965.00 frames. ], tot_loss[loss=0.06643, simple_loss=0.09124, pruned_loss=0.01225, audio_tagging_loss=0.008561, over 3056963.10 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-28 23:59:59,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3725073.3333333335, ans=0.125 2023-11-29 00:00:21,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3725206.6666666665, ans=0.125 2023-11-29 00:00:37,906 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558800 2023-11-29 00:00:41,891 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5700, loss[loss=0.07534, simple_loss=0.1014, pruned_loss=0.01488, audio_tagging_loss=0.009765, over 14591.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09015, pruned_loss=0.01214, audio_tagging_loss=0.008673, over 3061735.30 frames. ], batch size: 54, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:00:54,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.95 vs. limit=15.0 2023-11-29 00:01:04,876 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.325e+01 8.841e+01 9.405e+01 1.014e+02 1.366e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-29 00:01:18,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3725540.0, ans=0.1 2023-11-29 00:01:29,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3725540.0, ans=0.125 2023-11-29 00:01:29,556 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2023-11-29 00:01:41,165 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558850 2023-11-29 00:01:44,644 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5750, loss[loss=0.06444, simple_loss=0.08963, pruned_loss=0.01097, audio_tagging_loss=0.008651, over 15201.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08945, pruned_loss=0.01197, audio_tagging_loss=0.008646, over 3058638.44 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:02:09,478 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-29 00:02:10,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3725806.6666666665, ans=0.5 2023-11-29 00:02:26,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3725873.3333333335, ans=0.0 2023-11-29 00:02:27,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3725873.3333333335, ans=0.125 2023-11-29 00:02:29,384 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=12.0 2023-11-29 00:02:42,725 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558900 2023-11-29 00:02:46,190 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5800, loss[loss=0.05703, simple_loss=0.07265, pruned_loss=0.01021, audio_tagging_loss=0.01049, over 15246.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08876, pruned_loss=0.0119, audio_tagging_loss=0.008634, over 3050808.88 frames. ], batch size: 58, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:03:06,666 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=15.0 2023-11-29 00:03:08,463 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.900e+01 8.867e+01 9.470e+01 1.000e+02 1.681e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-29 00:03:15,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3726140.0, ans=0.1 2023-11-29 00:03:20,269 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2023-11-29 00:03:43,103 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 558950 2023-11-29 00:03:46,493 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5850, loss[loss=0.06013, simple_loss=0.08867, pruned_loss=0.009215, audio_tagging_loss=0.00658, over 15249.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08944, pruned_loss=0.0119, audio_tagging_loss=0.008505, over 3050496.65 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:03:46,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3726340.0, ans=0.1 2023-11-29 00:04:04,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3726406.6666666665, ans=0.0 2023-11-29 00:04:24,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3726540.0, ans=0.125 2023-11-29 00:04:44,546 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559000 2023-11-29 00:04:49,139 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5900, loss[loss=0.0621, simple_loss=0.09236, pruned_loss=0.01007, audio_tagging_loss=0.005855, over 15712.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08922, pruned_loss=0.01197, audio_tagging_loss=0.008428, over 3047891.84 frames. ], batch size: 59, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:04:56,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3726673.3333333335, ans=0.1 2023-11-29 00:04:58,839 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=15.0 2023-11-29 00:05:02,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3726740.0, ans=0.125 2023-11-29 00:05:03,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3726740.0, ans=0.125 2023-11-29 00:05:10,915 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.16 vs. limit=22.5 2023-11-29 00:05:12,418 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 8.979e+01 9.571e+01 1.024e+02 1.288e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 00:05:15,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3726806.6666666665, ans=0.0 2023-11-29 00:05:30,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3726873.3333333335, ans=0.125 2023-11-29 00:05:47,310 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559050 2023-11-29 00:05:51,228 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 5950, loss[loss=0.05902, simple_loss=0.08667, pruned_loss=0.007329, audio_tagging_loss=0.008358, over 15524.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08946, pruned_loss=0.01199, audio_tagging_loss=0.008426, over 3051829.62 frames. ], batch size: 61, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:05:51,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3727006.6666666665, ans=0.125 2023-11-29 00:06:02,225 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.71 vs. limit=15.0 2023-11-29 00:06:11,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3727073.3333333335, ans=0.125 2023-11-29 00:06:40,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3727273.3333333335, ans=0.125 2023-11-29 00:06:48,331 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559100 2023-11-29 00:06:51,725 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6000, loss[loss=0.08701, simple_loss=0.1304, pruned_loss=0.01692, audio_tagging_loss=0.004905, over 15178.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08969, pruned_loss=0.01201, audio_tagging_loss=0.008394, over 3050318.66 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:06:51,726 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 00:07:31,879 INFO [train_asr.py:1267] (2/4) Epoch 47, validation: loss=0.05752, simple_loss=0.05049, pruned_loss=0.005333, audio_tagging_loss=0.02694, over 4681554.00 frames. 2023-11-29 00:07:31,879 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 00:07:40,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3727340.0, ans=0.125 2023-11-29 00:07:56,051 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.777e+01 9.062e+01 9.671e+01 1.050e+02 2.392e+02, threshold=1.934e+02, percent-clipped=1.0 2023-11-29 00:08:16,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3727540.0, ans=0.125 2023-11-29 00:08:16,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3727540.0, ans=0.2 2023-11-29 00:08:17,545 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 00:08:31,005 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559150 2023-11-29 00:08:34,400 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6050, loss[loss=0.07689, simple_loss=0.1093, pruned_loss=0.01516, audio_tagging_loss=0.0071, over 16192.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08957, pruned_loss=0.01194, audio_tagging_loss=0.008456, over 3056025.26 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:08:35,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3727673.3333333335, ans=0.125 2023-11-29 00:09:23,461 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.11 vs. limit=22.5 2023-11-29 00:09:30,595 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=12.0 2023-11-29 00:09:31,120 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559200 2023-11-29 00:09:34,933 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6100, loss[loss=0.05324, simple_loss=0.06571, pruned_loss=0.009905, audio_tagging_loss=0.01048, over 15545.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08944, pruned_loss=0.01192, audio_tagging_loss=0.008462, over 3051999.94 frames. ], batch size: 60, lr: 1.44e-03, grad_scale: 32.0 2023-11-29 00:09:37,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3728006.6666666665, ans=0.125 2023-11-29 00:09:37,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3728006.6666666665, ans=0.125 2023-11-29 00:09:42,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3728006.6666666665, ans=0.0 2023-11-29 00:09:48,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3728073.3333333335, ans=0.125 2023-11-29 00:09:51,565 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=22.5 2023-11-29 00:09:58,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.264e+01 8.989e+01 9.555e+01 1.035e+02 1.326e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-29 00:09:58,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3728140.0, ans=0.2 2023-11-29 00:10:06,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3728140.0, ans=0.2 2023-11-29 00:10:17,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3728206.6666666665, ans=0.2 2023-11-29 00:10:31,606 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559250 2023-11-29 00:10:35,648 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6150, loss[loss=0.05458, simple_loss=0.07211, pruned_loss=0.008245, audio_tagging_loss=0.01028, over 13862.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.0892, pruned_loss=0.0119, audio_tagging_loss=0.008443, over 3047745.11 frames. ], batch size: 53, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:10:54,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3728406.6666666665, ans=0.2 2023-11-29 00:11:20,875 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.53 vs. limit=15.0 2023-11-29 00:11:23,628 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.94 vs. limit=15.0 2023-11-29 00:11:33,889 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559300 2023-11-29 00:11:37,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3728673.3333333335, ans=0.1 2023-11-29 00:11:38,032 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6200, loss[loss=0.04948, simple_loss=0.06395, pruned_loss=0.007544, audio_tagging_loss=0.00996, over 14772.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08849, pruned_loss=0.01178, audio_tagging_loss=0.008569, over 3041006.44 frames. ], batch size: 56, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:11:39,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3728673.3333333335, ans=0.125 2023-11-29 00:11:56,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3728740.0, ans=0.1 2023-11-29 00:12:01,265 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.926e+01 8.945e+01 9.631e+01 1.031e+02 1.323e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 00:12:15,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3728873.3333333335, ans=0.125 2023-11-29 00:12:35,568 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559350 2023-11-29 00:12:39,046 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6250, loss[loss=0.07099, simple_loss=0.08581, pruned_loss=0.0168, audio_tagging_loss=0.01129, over 16381.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08796, pruned_loss=0.01179, audio_tagging_loss=0.008705, over 3041322.19 frames. ], batch size: 66, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:12:40,509 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:12:44,279 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.55 vs. limit=15.0 2023-11-29 00:12:46,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3729006.6666666665, ans=0.125 2023-11-29 00:13:02,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3729140.0, ans=0.0 2023-11-29 00:13:12,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3729140.0, ans=0.0 2023-11-29 00:13:22,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3729206.6666666665, ans=0.2 2023-11-29 00:13:22,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3729206.6666666665, ans=0.0 2023-11-29 00:13:29,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3729273.3333333335, ans=0.1 2023-11-29 00:13:32,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3729273.3333333335, ans=0.0 2023-11-29 00:13:36,034 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559400 2023-11-29 00:13:39,767 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6300, loss[loss=0.04515, simple_loss=0.06225, pruned_loss=0.004614, audio_tagging_loss=0.009408, over 14364.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08784, pruned_loss=0.01171, audio_tagging_loss=0.008689, over 3039788.18 frames. ], batch size: 55, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:13:40,165 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2023-11-29 00:13:44,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3729340.0, ans=0.125 2023-11-29 00:14:06,086 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 8.910e+01 9.740e+01 1.040e+02 1.205e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 00:14:12,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-29 00:14:19,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3729540.0, ans=0.125 2023-11-29 00:14:25,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3729540.0, ans=0.5 2023-11-29 00:14:29,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3729606.6666666665, ans=0.0 2023-11-29 00:14:30,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3729606.6666666665, ans=0.1 2023-11-29 00:14:39,781 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559450 2023-11-29 00:14:43,940 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6350, loss[loss=0.07253, simple_loss=0.1043, pruned_loss=0.0128, audio_tagging_loss=0.007578, over 15369.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08763, pruned_loss=0.0118, audio_tagging_loss=0.008708, over 3038720.01 frames. ], batch size: 57, lr: 1.44e-03, grad_scale: 16.0 2023-11-29 00:14:46,176 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.41 vs. limit=8.0 2023-11-29 00:14:49,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3729673.3333333335, ans=0.125 2023-11-29 00:15:20,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3729873.3333333335, ans=0.0 2023-11-29 00:15:23,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3729873.3333333335, ans=0.0 2023-11-29 00:15:30,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3729873.3333333335, ans=0.2 2023-11-29 00:15:42,266 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559500 2023-11-29 00:15:42,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3729940.0, ans=0.125 2023-11-29 00:15:45,778 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6400, loss[loss=0.0644, simple_loss=0.09041, pruned_loss=0.01097, audio_tagging_loss=0.00823, over 14662.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08781, pruned_loss=0.01178, audio_tagging_loss=0.008766, over 3038466.88 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:15:57,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3730073.3333333335, ans=0.0 2023-11-29 00:16:08,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3730140.0, ans=0.0 2023-11-29 00:16:10,787 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 8.936e+01 9.646e+01 1.038e+02 1.369e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 00:16:43,598 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559550 2023-11-29 00:16:47,000 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6450, loss[loss=0.07284, simple_loss=0.09327, pruned_loss=0.01621, audio_tagging_loss=0.009996, over 14427.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08818, pruned_loss=0.01186, audio_tagging_loss=0.008868, over 3036250.15 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:16:55,087 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.50 vs. limit=22.5 2023-11-29 00:17:16,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3730473.3333333335, ans=0.2 2023-11-29 00:17:16,658 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=12.0 2023-11-29 00:17:18,310 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2023-11-29 00:17:23,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3730473.3333333335, ans=10.0 2023-11-29 00:17:31,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3730540.0, ans=0.0 2023-11-29 00:17:38,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3730606.6666666665, ans=0.125 2023-11-29 00:17:45,626 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.22 vs. limit=10.0 2023-11-29 00:17:46,894 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559600 2023-11-29 00:17:50,655 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6500, loss[loss=0.05931, simple_loss=0.08137, pruned_loss=0.01115, audio_tagging_loss=0.00748, over 15046.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.0877, pruned_loss=0.01175, audio_tagging_loss=0.008936, over 3044072.09 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:17:55,241 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=22.5 2023-11-29 00:18:16,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.806e+01 9.149e+01 9.988e+01 1.072e+02 1.426e+02, threshold=1.998e+02, percent-clipped=0.0 2023-11-29 00:18:19,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3730806.6666666665, ans=0.125 2023-11-29 00:18:30,529 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.85 vs. limit=10.0 2023-11-29 00:18:36,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3730873.3333333335, ans=0.125 2023-11-29 00:18:49,058 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559650 2023-11-29 00:18:52,567 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6550, loss[loss=0.0518, simple_loss=0.07921, pruned_loss=0.004649, audio_tagging_loss=0.00755, over 15586.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08844, pruned_loss=0.01196, audio_tagging_loss=0.008737, over 3040445.59 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:19:05,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3731073.3333333335, ans=0.04949747468305833 2023-11-29 00:19:13,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3731073.3333333335, ans=0.125 2023-11-29 00:19:30,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3731206.6666666665, ans=0.125 2023-11-29 00:19:36,819 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.29 vs. limit=12.0 2023-11-29 00:19:38,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3731206.6666666665, ans=0.0 2023-11-29 00:19:51,037 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559700 2023-11-29 00:19:54,482 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6600, loss[loss=0.08256, simple_loss=0.1245, pruned_loss=0.01496, audio_tagging_loss=0.005346, over 16238.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08923, pruned_loss=0.01205, audio_tagging_loss=0.008567, over 3043111.20 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:20:14,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3731406.6666666665, ans=0.125 2023-11-29 00:20:20,707 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.919e+01 9.465e+01 1.014e+02 1.286e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-29 00:20:52,597 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559750 2023-11-29 00:20:56,682 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6650, loss[loss=0.0647, simple_loss=0.09476, pruned_loss=0.009642, audio_tagging_loss=0.007675, over 14951.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08911, pruned_loss=0.01215, audio_tagging_loss=0.008535, over 3040774.33 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:20:59,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3731673.3333333335, ans=0.1 2023-11-29 00:21:11,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3731740.0, ans=0.125 2023-11-29 00:21:16,603 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.46 vs. limit=12.0 2023-11-29 00:21:19,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3731806.6666666665, ans=0.125 2023-11-29 00:21:23,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3731806.6666666665, ans=0.0 2023-11-29 00:21:27,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3731806.6666666665, ans=0.2 2023-11-29 00:21:38,695 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.40 vs. limit=22.5 2023-11-29 00:21:52,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3731940.0, ans=0.125 2023-11-29 00:21:52,937 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.21 vs. limit=22.5 2023-11-29 00:21:54,819 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559800 2023-11-29 00:21:58,732 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6700, loss[loss=0.05453, simple_loss=0.07258, pruned_loss=0.007819, audio_tagging_loss=0.01042, over 15672.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.08978, pruned_loss=0.0122, audio_tagging_loss=0.008504, over 3040593.92 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:22:03,976 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.08 vs. limit=22.5 2023-11-29 00:22:08,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3732006.6666666665, ans=0.125 2023-11-29 00:22:24,737 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.843e+01 9.131e+01 9.664e+01 1.036e+02 1.396e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-29 00:22:36,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3732206.6666666665, ans=10.0 2023-11-29 00:22:56,144 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559850 2023-11-29 00:22:56,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3732273.3333333335, ans=0.2 2023-11-29 00:22:59,585 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6750, loss[loss=0.06171, simple_loss=0.0863, pruned_loss=0.006618, audio_tagging_loss=0.01195, over 15941.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08922, pruned_loss=0.01205, audio_tagging_loss=0.008633, over 3040477.22 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:23:13,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3732406.6666666665, ans=0.0 2023-11-29 00:23:31,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3732473.3333333335, ans=0.125 2023-11-29 00:23:38,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3732540.0, ans=0.125 2023-11-29 00:23:50,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3732606.6666666665, ans=0.1 2023-11-29 00:23:51,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3732606.6666666665, ans=0.125 2023-11-29 00:23:57,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3732606.6666666665, ans=0.0 2023-11-29 00:23:58,369 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559900 2023-11-29 00:24:01,786 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6800, loss[loss=0.06202, simple_loss=0.08874, pruned_loss=0.008207, audio_tagging_loss=0.009446, over 14343.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.08989, pruned_loss=0.0124, audio_tagging_loss=0.008597, over 3036295.70 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:24:15,090 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=22.5 2023-11-29 00:24:24,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3732740.0, ans=0.0 2023-11-29 00:24:26,969 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.13 vs. limit=12.0 2023-11-29 00:24:27,571 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.798e+01 9.091e+01 9.725e+01 1.038e+02 3.036e+02, threshold=1.945e+02, percent-clipped=1.0 2023-11-29 00:24:40,357 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-29 00:24:44,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3732873.3333333335, ans=0.125 2023-11-29 00:24:46,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3732873.3333333335, ans=0.125 2023-11-29 00:24:48,245 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=15.0 2023-11-29 00:24:50,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3732940.0, ans=0.0 2023-11-29 00:24:52,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3732940.0, ans=0.1 2023-11-29 00:25:00,660 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 559950 2023-11-29 00:25:02,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3732940.0, ans=0.125 2023-11-29 00:25:04,115 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6850, loss[loss=0.05009, simple_loss=0.06333, pruned_loss=0.009794, audio_tagging_loss=0.008627, over 15844.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08981, pruned_loss=0.01226, audio_tagging_loss=0.008509, over 3035870.68 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:25:09,782 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.17 vs. limit=15.0 2023-11-29 00:25:19,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3733073.3333333335, ans=0.125 2023-11-29 00:25:23,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3733073.3333333335, ans=0.05 2023-11-29 00:25:39,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3733140.0, ans=0.125 2023-11-29 00:25:44,900 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=12.0 2023-11-29 00:25:55,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3733273.3333333335, ans=0.0 2023-11-29 00:26:02,175 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560000 2023-11-29 00:26:02,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3733273.3333333335, ans=0.0 2023-11-29 00:26:08,419 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6900, loss[loss=0.0619, simple_loss=0.09378, pruned_loss=0.00664, audio_tagging_loss=0.008367, over 15428.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08942, pruned_loss=0.01202, audio_tagging_loss=0.008433, over 3037503.61 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:26:36,700 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.618e+01 8.734e+01 9.536e+01 1.026e+02 1.241e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-29 00:26:37,451 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2023-11-29 00:26:47,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3733540.0, ans=0.09899494936611666 2023-11-29 00:26:51,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3733540.0, ans=0.0 2023-11-29 00:26:56,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3733606.6666666665, ans=0.0 2023-11-29 00:26:57,714 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 00:27:06,839 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560050 2023-11-29 00:27:10,752 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 6950, loss[loss=0.06603, simple_loss=0.08622, pruned_loss=0.01311, audio_tagging_loss=0.009807, over 14493.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08883, pruned_loss=0.01186, audio_tagging_loss=0.008472, over 3030939.16 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:27:47,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3733873.3333333335, ans=0.2 2023-11-29 00:28:09,807 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560100 2023-11-29 00:28:13,152 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7000, loss[loss=0.06718, simple_loss=0.08708, pruned_loss=0.01375, audio_tagging_loss=0.009884, over 16199.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08818, pruned_loss=0.01187, audio_tagging_loss=0.008527, over 3032603.36 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:28:32,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3734073.3333333335, ans=0.2 2023-11-29 00:28:38,963 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 8.937e+01 9.480e+01 1.049e+02 1.230e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-29 00:28:40,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3734140.0, ans=0.0 2023-11-29 00:28:50,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3734206.6666666665, ans=0.0 2023-11-29 00:28:53,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3734206.6666666665, ans=0.0 2023-11-29 00:29:06,353 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.78 vs. limit=15.0 2023-11-29 00:29:10,483 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560150 2023-11-29 00:29:13,828 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7050, loss[loss=0.06777, simple_loss=0.09037, pruned_loss=0.01395, audio_tagging_loss=0.008634, over 15505.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08883, pruned_loss=0.01196, audio_tagging_loss=0.008583, over 3033469.63 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:29:17,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3734340.0, ans=0.09899494936611666 2023-11-29 00:29:19,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3734340.0, ans=0.125 2023-11-29 00:29:23,586 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.76 vs. limit=15.0 2023-11-29 00:29:43,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3734473.3333333335, ans=0.0 2023-11-29 00:29:47,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3734473.3333333335, ans=0.0 2023-11-29 00:30:02,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3734606.6666666665, ans=0.2 2023-11-29 00:30:11,686 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560200 2023-11-29 00:30:11,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3734606.6666666665, ans=0.125 2023-11-29 00:30:16,165 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7100, loss[loss=0.05218, simple_loss=0.06949, pruned_loss=0.007546, audio_tagging_loss=0.009886, over 15514.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08836, pruned_loss=0.01192, audio_tagging_loss=0.008701, over 3035854.27 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:30:32,600 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=12.0 2023-11-29 00:30:43,658 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.839e+01 8.863e+01 9.578e+01 1.032e+02 1.275e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-29 00:30:49,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3734806.6666666665, ans=0.1 2023-11-29 00:30:49,921 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=12.0 2023-11-29 00:30:56,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3734873.3333333335, ans=0.125 2023-11-29 00:31:01,775 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2023-11-29 00:31:14,661 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560250 2023-11-29 00:31:18,592 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7150, loss[loss=0.05728, simple_loss=0.07445, pruned_loss=0.01325, audio_tagging_loss=0.006793, over 15348.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08801, pruned_loss=0.01178, audio_tagging_loss=0.008732, over 3038514.65 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:31:20,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3735006.6666666665, ans=0.125 2023-11-29 00:31:21,572 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.37 vs. limit=22.5 2023-11-29 00:32:16,474 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560300 2023-11-29 00:32:19,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3735340.0, ans=0.125 2023-11-29 00:32:19,860 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7200, loss[loss=0.04843, simple_loss=0.06097, pruned_loss=0.007698, audio_tagging_loss=0.01025, over 14693.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08842, pruned_loss=0.0118, audio_tagging_loss=0.008726, over 3044215.68 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:32:40,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3735406.6666666665, ans=0.125 2023-11-29 00:32:42,375 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.15 vs. limit=22.5 2023-11-29 00:32:47,171 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 8.965e+01 9.449e+01 1.037e+02 1.518e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-29 00:32:58,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3735540.0, ans=0.09899494936611666 2023-11-29 00:33:03,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3735540.0, ans=0.125 2023-11-29 00:33:09,778 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.79 vs. limit=15.0 2023-11-29 00:33:11,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3735606.6666666665, ans=0.1 2023-11-29 00:33:13,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3735606.6666666665, ans=0.1 2023-11-29 00:33:17,220 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560350 2023-11-29 00:33:20,699 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7250, loss[loss=0.06887, simple_loss=0.0915, pruned_loss=0.01126, audio_tagging_loss=0.01185, over 15390.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08886, pruned_loss=0.01182, audio_tagging_loss=0.008715, over 3035413.61 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:33:24,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3735673.3333333335, ans=0.125 2023-11-29 00:33:38,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3735740.0, ans=0.1 2023-11-29 00:33:46,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3735806.6666666665, ans=0.0 2023-11-29 00:33:46,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3735806.6666666665, ans=0.0 2023-11-29 00:33:54,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3735806.6666666665, ans=0.1 2023-11-29 00:33:57,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3735873.3333333335, ans=0.04949747468305833 2023-11-29 00:34:04,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3735873.3333333335, ans=0.1 2023-11-29 00:34:05,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3735873.3333333335, ans=0.125 2023-11-29 00:34:19,897 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560400 2023-11-29 00:34:23,691 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7300, loss[loss=0.0732, simple_loss=0.09815, pruned_loss=0.01571, audio_tagging_loss=0.008423, over 15132.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.0881, pruned_loss=0.01185, audio_tagging_loss=0.008791, over 3034289.39 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:34:37,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3736073.3333333335, ans=0.0 2023-11-29 00:34:41,477 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=22.5 2023-11-29 00:34:42,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3736073.3333333335, ans=0.125 2023-11-29 00:34:51,205 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.804e+01 9.526e+01 1.009e+02 1.275e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-29 00:34:55,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3736140.0, ans=0.2 2023-11-29 00:35:18,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3736273.3333333335, ans=0.0 2023-11-29 00:35:18,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3736273.3333333335, ans=0.125 2023-11-29 00:35:21,706 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560450 2023-11-29 00:35:25,210 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7350, loss[loss=0.06989, simple_loss=0.09996, pruned_loss=0.01207, audio_tagging_loss=0.007837, over 16004.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08876, pruned_loss=0.01208, audio_tagging_loss=0.008583, over 3032391.07 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:35:47,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3736406.6666666665, ans=0.125 2023-11-29 00:35:49,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3736473.3333333335, ans=0.125 2023-11-29 00:35:54,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3736473.3333333335, ans=0.125 2023-11-29 00:36:12,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3736540.0, ans=0.0 2023-11-29 00:36:23,110 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560500 2023-11-29 00:36:26,684 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7400, loss[loss=0.06021, simple_loss=0.08367, pruned_loss=0.007292, audio_tagging_loss=0.01108, over 14946.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08866, pruned_loss=0.01194, audio_tagging_loss=0.00847, over 3030848.72 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:36:32,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3736673.3333333335, ans=0.025 2023-11-29 00:36:46,092 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=15.0 2023-11-29 00:36:46,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3736740.0, ans=0.125 2023-11-29 00:36:56,333 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.891e+01 9.052e+01 9.894e+01 1.069e+02 1.258e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 00:37:02,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3736806.6666666665, ans=0.125 2023-11-29 00:37:25,104 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560550 2023-11-29 00:37:29,076 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7450, loss[loss=0.07581, simple_loss=0.1061, pruned_loss=0.01183, audio_tagging_loss=0.01096, over 15252.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.0883, pruned_loss=0.01188, audio_tagging_loss=0.008517, over 3035678.87 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:37:30,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3737006.6666666665, ans=0.05 2023-11-29 00:37:30,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3737006.6666666665, ans=0.125 2023-11-29 00:37:59,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3737140.0, ans=0.125 2023-11-29 00:38:04,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3737206.6666666665, ans=0.2 2023-11-29 00:38:21,225 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.91 vs. limit=10.0 2023-11-29 00:38:26,403 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560600 2023-11-29 00:38:30,337 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7500, loss[loss=0.08237, simple_loss=0.1162, pruned_loss=0.01722, audio_tagging_loss=0.00703, over 15195.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.0893, pruned_loss=0.01194, audio_tagging_loss=0.008426, over 3042872.94 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:38:43,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3737406.6666666665, ans=0.0 2023-11-29 00:38:50,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3737406.6666666665, ans=0.125 2023-11-29 00:38:56,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3737473.3333333335, ans=0.125 2023-11-29 00:38:58,469 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 8.888e+01 9.650e+01 1.039e+02 1.258e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 00:39:11,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3737540.0, ans=0.1 2023-11-29 00:39:28,985 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560650 2023-11-29 00:39:32,460 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7550, loss[loss=0.04802, simple_loss=0.06505, pruned_loss=0.009134, audio_tagging_loss=0.006361, over 15261.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08929, pruned_loss=0.01197, audio_tagging_loss=0.008406, over 3042774.33 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:39:50,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3737740.0, ans=0.125 2023-11-29 00:39:52,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3737740.0, ans=0.125 2023-11-29 00:40:01,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=3737806.6666666665, ans=0.025 2023-11-29 00:40:05,013 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.37 vs. limit=15.0 2023-11-29 00:40:06,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3737806.6666666665, ans=0.125 2023-11-29 00:40:06,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3737806.6666666665, ans=0.125 2023-11-29 00:40:08,479 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2023-11-29 00:40:13,315 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2023-11-29 00:40:19,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3737873.3333333335, ans=0.2 2023-11-29 00:40:25,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3737940.0, ans=0.0 2023-11-29 00:40:30,926 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560700 2023-11-29 00:40:34,516 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7600, loss[loss=0.06551, simple_loss=0.08982, pruned_loss=0.01357, audio_tagging_loss=0.007026, over 14511.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08801, pruned_loss=0.01197, audio_tagging_loss=0.008508, over 3048731.31 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:40:52,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3738073.3333333335, ans=0.95 2023-11-29 00:40:58,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3738140.0, ans=0.125 2023-11-29 00:41:03,012 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.868e+01 9.023e+01 9.623e+01 1.078e+02 1.517e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-29 00:41:03,545 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.30 vs. limit=22.5 2023-11-29 00:41:05,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3738140.0, ans=0.125 2023-11-29 00:41:06,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3738140.0, ans=0.0 2023-11-29 00:41:08,442 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.53 vs. limit=15.0 2023-11-29 00:41:11,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3738206.6666666665, ans=0.0 2023-11-29 00:41:11,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2023-11-29 00:41:12,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3738206.6666666665, ans=0.125 2023-11-29 00:41:18,402 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2023-11-29 00:41:20,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3738206.6666666665, ans=0.125 2023-11-29 00:41:33,042 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560750 2023-11-29 00:41:37,285 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7650, loss[loss=0.07568, simple_loss=0.1109, pruned_loss=0.01097, audio_tagging_loss=0.009272, over 15301.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08878, pruned_loss=0.0121, audio_tagging_loss=0.008527, over 3049675.36 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:42:09,680 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=22.5 2023-11-29 00:42:18,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3738540.0, ans=0.1 2023-11-29 00:42:26,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3738606.6666666665, ans=0.125 2023-11-29 00:42:28,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3738606.6666666665, ans=0.2 2023-11-29 00:42:30,638 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=15.0 2023-11-29 00:42:32,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3738606.6666666665, ans=0.0 2023-11-29 00:42:34,816 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560800 2023-11-29 00:42:38,567 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7700, loss[loss=0.07518, simple_loss=0.09849, pruned_loss=0.01855, audio_tagging_loss=0.00739, over 15957.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08907, pruned_loss=0.01206, audio_tagging_loss=0.008547, over 3050515.31 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:42:43,773 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=12.0 2023-11-29 00:43:06,681 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2023-11-29 00:43:08,239 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.715e+01 9.118e+01 9.854e+01 1.042e+02 1.331e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-29 00:43:08,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3738806.6666666665, ans=0.0 2023-11-29 00:43:36,704 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560850 2023-11-29 00:43:40,118 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7750, loss[loss=0.05385, simple_loss=0.07183, pruned_loss=0.011, audio_tagging_loss=0.006937, over 14718.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.089, pruned_loss=0.01219, audio_tagging_loss=0.008564, over 3047127.85 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:43:40,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3739006.6666666665, ans=0.0 2023-11-29 00:44:04,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3739140.0, ans=0.09899494936611666 2023-11-29 00:44:38,681 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560900 2023-11-29 00:44:42,113 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7800, loss[loss=0.07023, simple_loss=0.1024, pruned_loss=0.01195, audio_tagging_loss=0.007095, over 14528.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08911, pruned_loss=0.01209, audio_tagging_loss=0.008629, over 3042082.38 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:44:42,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3739340.0, ans=0.2 2023-11-29 00:44:42,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3739340.0, ans=0.0 2023-11-29 00:45:00,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3739406.6666666665, ans=0.125 2023-11-29 00:45:11,333 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.680e+01 8.840e+01 9.485e+01 1.019e+02 1.348e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-29 00:45:14,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3739473.3333333335, ans=0.125 2023-11-29 00:45:19,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3739540.0, ans=0.0 2023-11-29 00:45:41,167 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 560950 2023-11-29 00:45:44,481 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7850, loss[loss=0.07186, simple_loss=0.09328, pruned_loss=0.01715, audio_tagging_loss=0.00807, over 15782.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.0889, pruned_loss=0.01208, audio_tagging_loss=0.008635, over 3043455.03 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:45:49,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3739673.3333333335, ans=0.125 2023-11-29 00:46:23,154 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=15.0 2023-11-29 00:46:33,128 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.76 vs. limit=6.0 2023-11-29 00:46:36,535 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.22 vs. limit=15.0 2023-11-29 00:46:36,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3739940.0, ans=15.0 2023-11-29 00:46:41,915 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561000 2023-11-29 00:46:46,331 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7900, loss[loss=0.07408, simple_loss=0.1045, pruned_loss=0.01644, audio_tagging_loss=0.005366, over 14836.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08968, pruned_loss=0.01205, audio_tagging_loss=0.008665, over 3050505.67 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:46:48,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3740006.6666666665, ans=0.0 2023-11-29 00:47:16,124 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.755e+01 9.161e+01 9.815e+01 1.045e+02 1.564e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-29 00:47:34,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3740273.3333333335, ans=0.0 2023-11-29 00:47:44,267 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561050 2023-11-29 00:47:48,117 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 7950, loss[loss=0.05105, simple_loss=0.06612, pruned_loss=0.006346, audio_tagging_loss=0.01165, over 14537.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08825, pruned_loss=0.0119, audio_tagging_loss=0.008787, over 3047382.95 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:48:00,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3740406.6666666665, ans=0.0 2023-11-29 00:48:05,048 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 00:48:17,994 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=15.0 2023-11-29 00:48:40,075 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=15.0 2023-11-29 00:48:45,452 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561100 2023-11-29 00:48:48,853 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8000, loss[loss=0.05906, simple_loss=0.07399, pruned_loss=0.01305, audio_tagging_loss=0.009008, over 16498.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08836, pruned_loss=0.01196, audio_tagging_loss=0.008841, over 3039913.71 frames. ], batch size: 63, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:48:54,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3740673.3333333335, ans=0.1 2023-11-29 00:48:54,362 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:48:55,989 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2023-11-29 00:49:01,682 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-11-29 00:49:02,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3740740.0, ans=0.2 2023-11-29 00:49:09,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3740740.0, ans=0.1 2023-11-29 00:49:11,973 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.18 vs. limit=22.5 2023-11-29 00:49:14,712 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=22.5 2023-11-29 00:49:18,853 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.404e+01 8.940e+01 9.510e+01 1.008e+02 1.278e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-29 00:49:32,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3740873.3333333335, ans=0.125 2023-11-29 00:49:39,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3740940.0, ans=0.0 2023-11-29 00:49:46,724 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561150 2023-11-29 00:49:51,186 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8050, loss[loss=0.07535, simple_loss=0.0941, pruned_loss=0.0179, audio_tagging_loss=0.0104, over 15450.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08823, pruned_loss=0.01188, audio_tagging_loss=0.008896, over 3039636.58 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:50:44,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3741273.3333333335, ans=0.0 2023-11-29 00:50:44,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3741273.3333333335, ans=0.5 2023-11-29 00:50:48,712 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561200 2023-11-29 00:50:52,567 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8100, loss[loss=0.07145, simple_loss=0.1042, pruned_loss=0.009599, audio_tagging_loss=0.009757, over 15241.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08827, pruned_loss=0.01177, audio_tagging_loss=0.008813, over 3036399.59 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:51:08,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3741406.6666666665, ans=0.1 2023-11-29 00:51:08,846 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2023-11-29 00:51:13,328 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.61 vs. limit=15.0 2023-11-29 00:51:18,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3741473.3333333335, ans=0.07 2023-11-29 00:51:22,824 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.038e+01 9.545e+01 1.047e+02 1.336e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-29 00:51:50,131 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561250 2023-11-29 00:51:51,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3741606.6666666665, ans=0.1 2023-11-29 00:51:53,571 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8150, loss[loss=0.06005, simple_loss=0.08318, pruned_loss=0.009056, audio_tagging_loss=0.009401, over 13932.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08857, pruned_loss=0.01186, audio_tagging_loss=0.00865, over 3028876.04 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:52:15,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3741740.0, ans=0.0 2023-11-29 00:52:17,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3741806.6666666665, ans=0.1 2023-11-29 00:52:17,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3741806.6666666665, ans=0.125 2023-11-29 00:52:38,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3741873.3333333335, ans=0.0 2023-11-29 00:52:46,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3741940.0, ans=0.1 2023-11-29 00:52:51,069 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561300 2023-11-29 00:52:55,176 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8200, loss[loss=0.06128, simple_loss=0.07954, pruned_loss=0.01378, audio_tagging_loss=0.00773, over 15081.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08934, pruned_loss=0.01205, audio_tagging_loss=0.008567, over 3035194.62 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:52:58,133 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 00:52:58,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2023-11-29 00:53:25,822 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.695e+01 9.120e+01 9.757e+01 1.054e+02 1.290e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 00:53:27,558 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.24 vs. limit=15.0 2023-11-29 00:53:28,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3742140.0, ans=0.0 2023-11-29 00:53:48,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3742273.3333333335, ans=0.125 2023-11-29 00:53:49,983 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 00:53:53,489 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561350 2023-11-29 00:53:53,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3742273.3333333335, ans=0.1 2023-11-29 00:53:57,503 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8250, loss[loss=0.05754, simple_loss=0.08008, pruned_loss=0.008319, audio_tagging_loss=0.009185, over 14815.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.0889, pruned_loss=0.01169, audio_tagging_loss=0.008499, over 3034695.66 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:54:04,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3742340.0, ans=0.0 2023-11-29 00:54:04,393 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2023-11-29 00:54:14,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3742406.6666666665, ans=0.2 2023-11-29 00:54:23,219 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2023-11-29 00:54:38,552 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.76 vs. limit=10.0 2023-11-29 00:54:51,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3742606.6666666665, ans=0.2 2023-11-29 00:54:55,589 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561400 2023-11-29 00:54:59,459 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8300, loss[loss=0.05516, simple_loss=0.07029, pruned_loss=0.01153, audio_tagging_loss=0.008487, over 16430.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.0892, pruned_loss=0.01174, audio_tagging_loss=0.008477, over 3035766.45 frames. ], batch size: 64, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:55:08,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.20 vs. limit=22.5 2023-11-29 00:55:23,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3742806.6666666665, ans=0.1 2023-11-29 00:55:29,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.656e+01 9.057e+01 9.720e+01 1.032e+02 1.351e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-29 00:55:34,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3742873.3333333335, ans=0.2 2023-11-29 00:55:56,145 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561450 2023-11-29 00:55:59,605 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8350, loss[loss=0.06296, simple_loss=0.07968, pruned_loss=0.01113, audio_tagging_loss=0.01199, over 15269.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08863, pruned_loss=0.01158, audio_tagging_loss=0.008413, over 3038009.50 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:56:06,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3743006.6666666665, ans=0.1 2023-11-29 00:56:12,992 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2023-11-29 00:56:24,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3743140.0, ans=0.125 2023-11-29 00:56:37,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3743206.6666666665, ans=0.125 2023-11-29 00:56:57,982 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561500 2023-11-29 00:57:01,971 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8400, loss[loss=0.05866, simple_loss=0.07897, pruned_loss=0.00796, audio_tagging_loss=0.01122, over 15295.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08853, pruned_loss=0.01186, audio_tagging_loss=0.008416, over 3041898.61 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:57:07,444 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=12.0 2023-11-29 00:57:08,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3743340.0, ans=0.2 2023-11-29 00:57:26,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3743473.3333333335, ans=0.125 2023-11-29 00:57:31,747 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.851e+01 9.361e+01 9.943e+01 1.259e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-29 00:57:40,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3743540.0, ans=0.2 2023-11-29 00:57:42,323 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.45 vs. limit=22.5 2023-11-29 00:57:43,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3743540.0, ans=0.2 2023-11-29 00:57:45,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3743540.0, ans=0.125 2023-11-29 00:57:52,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3743606.6666666665, ans=0.0 2023-11-29 00:58:00,024 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561550 2023-11-29 00:58:03,560 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8450, loss[loss=0.06536, simple_loss=0.0912, pruned_loss=0.01212, audio_tagging_loss=0.00764, over 15369.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08864, pruned_loss=0.01192, audio_tagging_loss=0.008503, over 3041322.18 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 00:58:28,638 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=12.0 2023-11-29 00:58:29,768 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.71 vs. limit=22.5 2023-11-29 00:58:55,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3743940.0, ans=0.125 2023-11-29 00:59:01,228 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561600 2023-11-29 00:59:04,918 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8500, loss[loss=0.05232, simple_loss=0.07414, pruned_loss=0.00821, audio_tagging_loss=0.007038, over 15420.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08794, pruned_loss=0.01166, audio_tagging_loss=0.008494, over 3046021.01 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 00:59:06,831 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.11 vs. limit=15.0 2023-11-29 00:59:32,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3744140.0, ans=0.125 2023-11-29 00:59:38,004 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.884e+01 8.998e+01 9.683e+01 1.039e+02 1.237e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-29 01:00:03,090 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561650 2023-11-29 01:00:06,540 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8550, loss[loss=0.07057, simple_loss=0.1013, pruned_loss=0.01298, audio_tagging_loss=0.006938, over 15705.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.0887, pruned_loss=0.01178, audio_tagging_loss=0.008492, over 3052630.01 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:00:15,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3744340.0, ans=0.125 2023-11-29 01:00:17,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3744340.0, ans=0.1 2023-11-29 01:00:20,442 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=22.5 2023-11-29 01:00:57,561 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:01:05,778 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561700 2023-11-29 01:01:05,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3744606.6666666665, ans=10.0 2023-11-29 01:01:09,160 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8600, loss[loss=0.05747, simple_loss=0.07787, pruned_loss=0.01024, audio_tagging_loss=0.008285, over 15224.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08841, pruned_loss=0.01177, audio_tagging_loss=0.008602, over 3046321.41 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:01:20,647 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.36 vs. limit=15.0 2023-11-29 01:01:21,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3744740.0, ans=0.2 2023-11-29 01:01:30,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3744740.0, ans=0.125 2023-11-29 01:01:37,329 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2023-11-29 01:01:40,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.679e+01 8.937e+01 9.624e+01 1.044e+02 4.545e+02, threshold=1.925e+02, percent-clipped=1.0 2023-11-29 01:02:06,587 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561750 2023-11-29 01:02:09,994 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8650, loss[loss=0.08293, simple_loss=0.1245, pruned_loss=0.01454, audio_tagging_loss=0.006152, over 15068.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.0887, pruned_loss=0.01182, audio_tagging_loss=0.008699, over 3041992.71 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:02:52,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3745206.6666666665, ans=0.07 2023-11-29 01:03:06,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3745273.3333333335, ans=0.0 2023-11-29 01:03:07,235 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561800 2023-11-29 01:03:11,069 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8700, loss[loss=0.06261, simple_loss=0.08604, pruned_loss=0.01159, audio_tagging_loss=0.007999, over 16970.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08888, pruned_loss=0.01173, audio_tagging_loss=0.008693, over 3047814.47 frames. ], batch size: 64, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:03:11,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3745340.0, ans=0.125 2023-11-29 01:03:11,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3745340.0, ans=0.1 2023-11-29 01:03:12,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3745340.0, ans=0.125 2023-11-29 01:03:17,004 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=22.5 2023-11-29 01:03:36,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3745473.3333333335, ans=15.0 2023-11-29 01:03:38,457 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.87 vs. limit=15.0 2023-11-29 01:03:40,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3745473.3333333335, ans=0.035 2023-11-29 01:03:44,729 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 8.978e+01 9.587e+01 1.045e+02 1.358e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 01:04:04,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3745606.6666666665, ans=0.125 2023-11-29 01:04:10,033 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561850 2023-11-29 01:04:14,715 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8750, loss[loss=0.05922, simple_loss=0.08453, pruned_loss=0.008014, audio_tagging_loss=0.008941, over 14335.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08939, pruned_loss=0.01185, audio_tagging_loss=0.008773, over 3039172.61 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 8.0 2023-11-29 01:04:25,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3745740.0, ans=0.0 2023-11-29 01:04:32,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3745740.0, ans=0.1 2023-11-29 01:04:53,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3745873.3333333335, ans=0.0 2023-11-29 01:05:09,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3745940.0, ans=0.125 2023-11-29 01:05:11,942 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561900 2023-11-29 01:05:15,320 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8800, loss[loss=0.06172, simple_loss=0.08531, pruned_loss=0.01078, audio_tagging_loss=0.008283, over 15160.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.09005, pruned_loss=0.01188, audio_tagging_loss=0.008784, over 3043250.15 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:05:26,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3746073.3333333335, ans=0.1 2023-11-29 01:05:42,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3746140.0, ans=0.0 2023-11-29 01:05:46,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3746140.0, ans=0.0 2023-11-29 01:05:49,572 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.820e+01 9.250e+01 9.769e+01 1.064e+02 1.773e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 01:05:50,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3746140.0, ans=0.95 2023-11-29 01:05:54,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3746206.6666666665, ans=0.125 2023-11-29 01:06:07,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3746273.3333333335, ans=0.025 2023-11-29 01:06:12,802 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 561950 2023-11-29 01:06:15,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3746340.0, ans=0.125 2023-11-29 01:06:16,800 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8850, loss[loss=0.0925, simple_loss=0.125, pruned_loss=0.0207, audio_tagging_loss=0.009323, over 15074.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.0905, pruned_loss=0.01195, audio_tagging_loss=0.008783, over 3046688.72 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:06:30,484 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:06:49,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3746473.3333333335, ans=0.2 2023-11-29 01:07:14,931 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562000 2023-11-29 01:07:19,278 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8900, loss[loss=0.06498, simple_loss=0.08828, pruned_loss=0.008667, audio_tagging_loss=0.01217, over 14925.00 frames. ], tot_loss[loss=0.06605, simple_loss=0.09094, pruned_loss=0.01194, audio_tagging_loss=0.008645, over 3047746.15 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:07:20,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3746673.3333333335, ans=0.1 2023-11-29 01:07:24,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3746673.3333333335, ans=0.0 2023-11-29 01:07:52,081 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.613e+01 8.904e+01 9.438e+01 1.016e+02 1.537e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-29 01:07:53,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3746806.6666666665, ans=0.0 2023-11-29 01:08:02,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3746873.3333333335, ans=0.05 2023-11-29 01:08:07,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3746940.0, ans=0.2 2023-11-29 01:08:16,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3746940.0, ans=0.125 2023-11-29 01:08:17,320 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562050 2023-11-29 01:08:17,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3746940.0, ans=0.125 2023-11-29 01:08:20,636 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 8950, loss[loss=0.05998, simple_loss=0.08199, pruned_loss=0.009793, audio_tagging_loss=0.009194, over 15137.00 frames. ], tot_loss[loss=0.06586, simple_loss=0.09072, pruned_loss=0.01196, audio_tagging_loss=0.00854, over 3042010.16 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:08:50,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3747140.0, ans=0.0 2023-11-29 01:08:56,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3747206.6666666665, ans=0.1 2023-11-29 01:09:08,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3747273.3333333335, ans=0.125 2023-11-29 01:09:17,841 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562100 2023-11-29 01:09:21,837 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9000, loss[loss=0.06926, simple_loss=0.09328, pruned_loss=0.01382, audio_tagging_loss=0.008792, over 15355.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09114, pruned_loss=0.01188, audio_tagging_loss=0.008493, over 3046913.69 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:09:21,838 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 01:10:02,101 INFO [train_asr.py:1267] (2/4) Epoch 47, validation: loss=0.05855, simple_loss=0.05046, pruned_loss=0.005347, audio_tagging_loss=0.02798, over 4681554.00 frames. 2023-11-29 01:10:02,102 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 01:10:06,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3747340.0, ans=0.5 2023-11-29 01:10:13,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3747406.6666666665, ans=0.2 2023-11-29 01:10:15,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3747406.6666666665, ans=0.0 2023-11-29 01:10:25,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3747473.3333333335, ans=0.125 2023-11-29 01:10:34,514 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.906e+01 9.620e+01 1.037e+02 1.250e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 01:10:49,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3747606.6666666665, ans=0.0 2023-11-29 01:10:59,232 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562150 2023-11-29 01:11:03,479 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9050, loss[loss=0.08151, simple_loss=0.1138, pruned_loss=0.01674, audio_tagging_loss=0.007884, over 14952.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.09056, pruned_loss=0.01192, audio_tagging_loss=0.008439, over 3043751.38 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:11:18,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3747740.0, ans=0.1 2023-11-29 01:11:34,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-11-29 01:11:52,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3747940.0, ans=0.125 2023-11-29 01:11:59,224 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=12.0 2023-11-29 01:12:01,681 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562200 2023-11-29 01:12:05,348 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9100, loss[loss=0.05408, simple_loss=0.07282, pruned_loss=0.009077, audio_tagging_loss=0.008599, over 15108.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.09027, pruned_loss=0.01184, audio_tagging_loss=0.00843, over 3045753.82 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:12:19,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3748073.3333333335, ans=0.125 2023-11-29 01:12:34,717 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2023-11-29 01:12:38,333 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.957e+01 8.933e+01 9.563e+01 1.020e+02 1.667e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 01:12:42,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3748206.6666666665, ans=0.125 2023-11-29 01:12:43,884 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2023-11-29 01:12:52,229 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:12:55,258 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.40 vs. limit=5.0 2023-11-29 01:13:02,987 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562250 2023-11-29 01:13:06,501 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9150, loss[loss=0.05812, simple_loss=0.08176, pruned_loss=0.00969, audio_tagging_loss=0.00755, over 15045.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.0889, pruned_loss=0.01171, audio_tagging_loss=0.008482, over 3047639.40 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:13:08,220 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2023-11-29 01:13:17,190 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.62 vs. limit=22.5 2023-11-29 01:13:36,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3748473.3333333335, ans=0.0 2023-11-29 01:13:40,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3748473.3333333335, ans=0.0 2023-11-29 01:13:45,407 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.01 vs. limit=12.0 2023-11-29 01:13:47,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3748540.0, ans=0.125 2023-11-29 01:13:52,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3748540.0, ans=0.95 2023-11-29 01:14:04,744 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562300 2023-11-29 01:14:08,047 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9200, loss[loss=0.06987, simple_loss=0.08782, pruned_loss=0.01932, audio_tagging_loss=0.006636, over 14996.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08855, pruned_loss=0.01178, audio_tagging_loss=0.008505, over 3050883.23 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:14:17,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3748673.3333333335, ans=0.1 2023-11-29 01:14:40,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3748806.6666666665, ans=15.0 2023-11-29 01:14:41,263 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 9.164e+01 9.710e+01 1.033e+02 1.295e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-29 01:14:43,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3748806.6666666665, ans=0.125 2023-11-29 01:15:04,429 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2023-11-29 01:15:06,302 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562350 2023-11-29 01:15:10,316 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9250, loss[loss=0.07527, simple_loss=0.1012, pruned_loss=0.01673, audio_tagging_loss=0.007919, over 15568.00 frames. ], tot_loss[loss=0.06416, simple_loss=0.08805, pruned_loss=0.01168, audio_tagging_loss=0.008455, over 3043997.81 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:15:21,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3749073.3333333335, ans=0.0 2023-11-29 01:15:22,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3749073.3333333335, ans=0.0 2023-11-29 01:16:07,659 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562400 2023-11-29 01:16:11,602 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9300, loss[loss=0.0643, simple_loss=0.09168, pruned_loss=0.01318, audio_tagging_loss=0.005275, over 16442.00 frames. ], tot_loss[loss=0.06423, simple_loss=0.08789, pruned_loss=0.01177, audio_tagging_loss=0.008508, over 3044669.56 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:16:17,564 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=15.0 2023-11-29 01:16:32,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3749406.6666666665, ans=0.0 2023-11-29 01:16:45,404 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.732e+01 9.046e+01 9.645e+01 1.038e+02 1.624e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 01:16:50,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3749540.0, ans=0.2 2023-11-29 01:17:09,969 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562450 2023-11-29 01:17:13,311 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9350, loss[loss=0.07075, simple_loss=0.09442, pruned_loss=0.01562, audio_tagging_loss=0.007918, over 14996.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08876, pruned_loss=0.01202, audio_tagging_loss=0.00853, over 3051647.15 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:17:18,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3749673.3333333335, ans=0.2 2023-11-29 01:17:42,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3749806.6666666665, ans=0.0 2023-11-29 01:17:53,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3749873.3333333335, ans=0.125 2023-11-29 01:17:56,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3749873.3333333335, ans=0.125 2023-11-29 01:18:03,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3749940.0, ans=0.125 2023-11-29 01:18:10,410 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562500 2023-11-29 01:18:15,243 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9400, loss[loss=0.05915, simple_loss=0.08666, pruned_loss=0.008289, audio_tagging_loss=0.007528, over 14393.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08904, pruned_loss=0.01191, audio_tagging_loss=0.008538, over 3052471.75 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:18:15,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3750006.6666666665, ans=0.125 2023-11-29 01:18:19,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3750006.6666666665, ans=0.125 2023-11-29 01:18:35,209 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.93 vs. limit=12.0 2023-11-29 01:18:48,144 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.900e+01 9.214e+01 9.788e+01 1.040e+02 1.202e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-29 01:18:54,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3750206.6666666665, ans=0.125 2023-11-29 01:19:12,668 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562550 2023-11-29 01:19:16,057 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9450, loss[loss=0.04867, simple_loss=0.05889, pruned_loss=0.008358, audio_tagging_loss=0.01087, over 15081.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09031, pruned_loss=0.0121, audio_tagging_loss=0.008524, over 3052917.61 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:19:17,783 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:19:32,174 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.88 vs. limit=6.0 2023-11-29 01:19:40,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3750473.3333333335, ans=0.125 2023-11-29 01:19:46,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3750473.3333333335, ans=0.125 2023-11-29 01:19:50,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3750473.3333333335, ans=0.1 2023-11-29 01:19:51,842 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:20:02,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3750540.0, ans=0.1 2023-11-29 01:20:07,036 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2023-11-29 01:20:15,313 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562600 2023-11-29 01:20:18,989 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9500, loss[loss=0.0641, simple_loss=0.07977, pruned_loss=0.01252, audio_tagging_loss=0.0117, over 14859.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08963, pruned_loss=0.01202, audio_tagging_loss=0.008676, over 3050549.09 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:20:33,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3750740.0, ans=0.05 2023-11-29 01:20:36,612 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2023-11-29 01:20:39,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3750740.0, ans=0.05 2023-11-29 01:20:52,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3750806.6666666665, ans=0.05 2023-11-29 01:20:53,736 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.404e+01 8.884e+01 9.617e+01 1.043e+02 1.271e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-29 01:21:17,093 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562650 2023-11-29 01:21:20,484 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9550, loss[loss=0.0598, simple_loss=0.08651, pruned_loss=0.007781, audio_tagging_loss=0.008767, over 15013.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08958, pruned_loss=0.01177, audio_tagging_loss=0.008679, over 3053644.71 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:21:39,515 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-29 01:21:58,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3751206.6666666665, ans=0.1 2023-11-29 01:22:19,369 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562700 2023-11-29 01:22:22,943 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9600, loss[loss=0.05737, simple_loss=0.08018, pruned_loss=0.009933, audio_tagging_loss=0.007344, over 15019.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.09015, pruned_loss=0.01189, audio_tagging_loss=0.008647, over 3056753.08 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:22:57,404 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.720e+01 8.993e+01 9.667e+01 1.037e+02 1.328e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-29 01:23:14,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3751606.6666666665, ans=0.0 2023-11-29 01:23:21,934 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562750 2023-11-29 01:23:25,380 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9650, loss[loss=0.05439, simple_loss=0.06726, pruned_loss=0.009467, audio_tagging_loss=0.0113, over 14570.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08992, pruned_loss=0.01191, audio_tagging_loss=0.008694, over 3053137.61 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:23:26,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3751673.3333333335, ans=0.2 2023-11-29 01:23:26,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3751673.3333333335, ans=0.07 2023-11-29 01:23:30,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3751673.3333333335, ans=0.0 2023-11-29 01:24:01,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3751873.3333333335, ans=0.125 2023-11-29 01:24:07,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3751873.3333333335, ans=0.0 2023-11-29 01:24:11,867 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.28 vs. limit=6.0 2023-11-29 01:24:20,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3751940.0, ans=0.0 2023-11-29 01:24:22,976 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562800 2023-11-29 01:24:23,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3751940.0, ans=0.125 2023-11-29 01:24:23,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3751940.0, ans=0.125 2023-11-29 01:24:23,361 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.28 vs. limit=10.0 2023-11-29 01:24:25,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3752006.6666666665, ans=0.1 2023-11-29 01:24:26,711 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9700, loss[loss=0.066, simple_loss=0.09467, pruned_loss=0.01091, audio_tagging_loss=0.007762, over 15438.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.0889, pruned_loss=0.01178, audio_tagging_loss=0.008614, over 3042645.85 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:25:01,797 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.640e+01 9.050e+01 9.542e+01 1.032e+02 1.533e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-29 01:25:10,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3752206.6666666665, ans=0.025 2023-11-29 01:25:22,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3752273.3333333335, ans=0.125 2023-11-29 01:25:24,958 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562850 2023-11-29 01:25:25,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3752273.3333333335, ans=0.125 2023-11-29 01:25:28,347 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9750, loss[loss=0.04751, simple_loss=0.06461, pruned_loss=0.00616, audio_tagging_loss=0.009043, over 14692.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.0889, pruned_loss=0.01186, audio_tagging_loss=0.00858, over 3045375.73 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:25:31,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3752340.0, ans=0.0 2023-11-29 01:25:34,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3752340.0, ans=0.0 2023-11-29 01:25:40,133 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-29 01:25:47,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3752406.6666666665, ans=0.1 2023-11-29 01:26:01,528 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2023-11-29 01:26:13,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3752540.0, ans=0.1 2023-11-29 01:26:18,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3752606.6666666665, ans=0.1 2023-11-29 01:26:28,023 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562900 2023-11-29 01:26:31,444 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9800, loss[loss=0.06206, simple_loss=0.09003, pruned_loss=0.00849, audio_tagging_loss=0.008556, over 14873.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08943, pruned_loss=0.01189, audio_tagging_loss=0.008453, over 3042027.42 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:26:39,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3752673.3333333335, ans=0.125 2023-11-29 01:26:44,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3752740.0, ans=0.125 2023-11-29 01:26:51,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3752740.0, ans=0.125 2023-11-29 01:26:58,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3752806.6666666665, ans=0.2 2023-11-29 01:27:03,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3752806.6666666665, ans=0.125 2023-11-29 01:27:04,745 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 8.999e+01 9.540e+01 1.035e+02 1.290e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-29 01:27:27,925 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:27:27,995 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 562950 2023-11-29 01:27:31,214 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9850, loss[loss=0.07404, simple_loss=0.1082, pruned_loss=0.0119, audio_tagging_loss=0.008057, over 17561.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.09008, pruned_loss=0.01209, audio_tagging_loss=0.008378, over 3041281.08 frames. ], batch size: 66, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:27:52,647 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2023-11-29 01:27:58,390 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=8.0 2023-11-29 01:28:08,376 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2023-11-29 01:28:12,768 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.66 vs. limit=15.0 2023-11-29 01:28:16,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3753206.6666666665, ans=0.125 2023-11-29 01:28:18,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3753206.6666666665, ans=0.0 2023-11-29 01:28:29,706 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563000 2023-11-29 01:28:33,639 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9900, loss[loss=0.05267, simple_loss=0.07481, pruned_loss=0.007827, audio_tagging_loss=0.007438, over 15922.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09037, pruned_loss=0.0122, audio_tagging_loss=0.008377, over 3044793.30 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:28:35,359 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=15.0 2023-11-29 01:28:45,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3753406.6666666665, ans=0.0 2023-11-29 01:29:09,347 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 8.970e+01 9.713e+01 1.037e+02 1.358e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-29 01:29:14,634 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2023-11-29 01:29:18,669 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.60 vs. limit=15.0 2023-11-29 01:29:31,925 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563050 2023-11-29 01:29:35,958 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 9950, loss[loss=0.05681, simple_loss=0.08121, pruned_loss=0.008061, audio_tagging_loss=0.008147, over 15282.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09077, pruned_loss=0.01221, audio_tagging_loss=0.008299, over 3047028.16 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:29:36,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3753673.3333333335, ans=0.125 2023-11-29 01:29:41,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3753673.3333333335, ans=0.0 2023-11-29 01:29:51,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3753740.0, ans=0.2 2023-11-29 01:30:00,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3753806.6666666665, ans=0.125 2023-11-29 01:30:00,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3753806.6666666665, ans=0.1 2023-11-29 01:30:13,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3753873.3333333335, ans=0.1 2023-11-29 01:30:31,929 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.36 vs. limit=6.0 2023-11-29 01:30:33,870 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563100 2023-11-29 01:30:37,319 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10000, loss[loss=0.06387, simple_loss=0.08087, pruned_loss=0.01429, audio_tagging_loss=0.009149, over 14764.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08967, pruned_loss=0.01194, audio_tagging_loss=0.00832, over 3046455.80 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:30:58,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3754073.3333333335, ans=0.0 2023-11-29 01:30:59,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=22.5 2023-11-29 01:31:07,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3754140.0, ans=0.125 2023-11-29 01:31:13,958 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.380e+01 9.037e+01 9.619e+01 1.035e+02 1.339e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 01:31:18,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3754206.6666666665, ans=0.0 2023-11-29 01:31:26,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3754273.3333333335, ans=0.0 2023-11-29 01:31:29,761 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-11-29 01:31:35,288 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563150 2023-11-29 01:31:39,211 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10050, loss[loss=0.07281, simple_loss=0.1038, pruned_loss=0.01373, audio_tagging_loss=0.007165, over 15530.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08977, pruned_loss=0.01193, audio_tagging_loss=0.008318, over 3053513.64 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:31:58,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3754406.6666666665, ans=0.125 2023-11-29 01:31:59,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3754406.6666666665, ans=0.1 2023-11-29 01:32:03,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3754473.3333333335, ans=0.125 2023-11-29 01:32:04,905 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.39 vs. limit=15.0 2023-11-29 01:32:12,690 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:32:22,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3754540.0, ans=0.1 2023-11-29 01:32:34,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3754606.6666666665, ans=0.09899494936611666 2023-11-29 01:32:37,208 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563200 2023-11-29 01:32:41,625 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10100, loss[loss=0.06784, simple_loss=0.09562, pruned_loss=0.01209, audio_tagging_loss=0.007942, over 14945.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08924, pruned_loss=0.01183, audio_tagging_loss=0.008426, over 3054746.18 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:33:11,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3754806.6666666665, ans=0.0 2023-11-29 01:33:17,886 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 9.162e+01 9.791e+01 1.075e+02 1.682e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-29 01:33:33,290 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:33:38,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3754940.0, ans=10.0 2023-11-29 01:33:39,899 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563250 2023-11-29 01:33:43,344 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10150, loss[loss=0.059, simple_loss=0.07896, pruned_loss=0.009815, audio_tagging_loss=0.00971, over 14744.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.0898, pruned_loss=0.01201, audio_tagging_loss=0.008504, over 3050503.92 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:33:45,208 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.13 vs. limit=10.0 2023-11-29 01:34:05,388 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.74 vs. limit=15.0 2023-11-29 01:34:13,482 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:34:16,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3755140.0, ans=0.125 2023-11-29 01:34:29,620 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=22.5 2023-11-29 01:34:38,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3755273.3333333335, ans=0.1 2023-11-29 01:34:40,690 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563300 2023-11-29 01:34:44,794 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10200, loss[loss=0.07288, simple_loss=0.09861, pruned_loss=0.01409, audio_tagging_loss=0.009485, over 14613.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08996, pruned_loss=0.01194, audio_tagging_loss=0.008627, over 3049131.04 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:34:45,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3755340.0, ans=0.125 2023-11-29 01:34:53,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3755340.0, ans=0.1 2023-11-29 01:34:53,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3755340.0, ans=0.0 2023-11-29 01:34:58,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3755406.6666666665, ans=0.125 2023-11-29 01:34:59,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3755406.6666666665, ans=0.125 2023-11-29 01:35:05,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3755406.6666666665, ans=0.2 2023-11-29 01:35:09,592 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:35:13,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3755473.3333333335, ans=10.0 2023-11-29 01:35:21,762 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.706e+01 9.157e+01 9.642e+01 1.029e+02 1.501e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-29 01:35:36,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3755606.6666666665, ans=0.125 2023-11-29 01:35:40,291 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2023-11-29 01:35:42,253 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563350 2023-11-29 01:35:46,268 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10250, loss[loss=0.05994, simple_loss=0.08311, pruned_loss=0.01093, audio_tagging_loss=0.007455, over 14521.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09039, pruned_loss=0.01202, audio_tagging_loss=0.0086, over 3049850.12 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:36:38,256 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:36:39,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3755940.0, ans=0.125 2023-11-29 01:36:43,820 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563400 2023-11-29 01:36:47,534 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10300, loss[loss=0.05914, simple_loss=0.06571, pruned_loss=0.01262, audio_tagging_loss=0.01367, over 15950.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08976, pruned_loss=0.01199, audio_tagging_loss=0.008662, over 3053516.55 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:36:53,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3756006.6666666665, ans=0.125 2023-11-29 01:36:57,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3756006.6666666665, ans=0.1 2023-11-29 01:37:02,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3756073.3333333335, ans=0.09899494936611666 2023-11-29 01:37:20,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3756140.0, ans=0.125 2023-11-29 01:37:25,109 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 9.072e+01 9.694e+01 1.048e+02 1.558e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-29 01:37:32,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3756206.6666666665, ans=0.0 2023-11-29 01:37:32,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3756206.6666666665, ans=10.0 2023-11-29 01:37:35,367 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.27 vs. limit=22.5 2023-11-29 01:37:36,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3756273.3333333335, ans=0.125 2023-11-29 01:37:44,226 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-29 01:37:45,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3756273.3333333335, ans=0.0 2023-11-29 01:37:46,186 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563450 2023-11-29 01:37:49,656 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10350, loss[loss=0.04727, simple_loss=0.06488, pruned_loss=0.006886, audio_tagging_loss=0.007946, over 14120.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08909, pruned_loss=0.01184, audio_tagging_loss=0.008722, over 3050446.66 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:37:55,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3756340.0, ans=0.1 2023-11-29 01:37:58,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3756340.0, ans=0.2 2023-11-29 01:38:10,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3756406.6666666665, ans=0.125 2023-11-29 01:38:21,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3756473.3333333335, ans=0.0 2023-11-29 01:38:47,886 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563500 2023-11-29 01:38:49,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3756606.6666666665, ans=0.125 2023-11-29 01:38:51,251 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10400, loss[loss=0.05205, simple_loss=0.06036, pruned_loss=0.01063, audio_tagging_loss=0.01124, over 15711.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08853, pruned_loss=0.01184, audio_tagging_loss=0.008828, over 3047088.02 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:39:00,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3756673.3333333335, ans=0.125 2023-11-29 01:39:04,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3756740.0, ans=0.125 2023-11-29 01:39:04,775 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2023-11-29 01:39:16,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3756806.6666666665, ans=0.125 2023-11-29 01:39:28,488 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 9.225e+01 9.627e+01 1.037e+02 1.431e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-29 01:39:45,657 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.15 vs. limit=15.0 2023-11-29 01:39:49,675 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563550 2023-11-29 01:39:53,059 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10450, loss[loss=0.07842, simple_loss=0.1093, pruned_loss=0.01515, audio_tagging_loss=0.008626, over 16254.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08867, pruned_loss=0.01182, audio_tagging_loss=0.008838, over 3041623.47 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:40:03,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3757006.6666666665, ans=0.0 2023-11-29 01:40:09,431 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.85 vs. limit=8.0 2023-11-29 01:40:16,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3757140.0, ans=0.2 2023-11-29 01:40:21,158 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-11-29 01:40:21,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3757140.0, ans=0.0 2023-11-29 01:40:23,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3757140.0, ans=0.125 2023-11-29 01:40:40,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3757273.3333333335, ans=0.125 2023-11-29 01:40:49,987 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563600 2023-11-29 01:40:53,888 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.87 vs. limit=22.5 2023-11-29 01:40:54,486 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10500, loss[loss=0.07156, simple_loss=0.1045, pruned_loss=0.01415, audio_tagging_loss=0.005131, over 15537.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08845, pruned_loss=0.01179, audio_tagging_loss=0.008681, over 3041909.44 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:41:11,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3757406.6666666665, ans=0.125 2023-11-29 01:41:11,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3757406.6666666665, ans=0.125 2023-11-29 01:41:31,331 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 8.902e+01 9.602e+01 1.050e+02 1.360e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 01:41:37,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3757540.0, ans=0.0 2023-11-29 01:41:45,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3757606.6666666665, ans=0.125 2023-11-29 01:41:52,587 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563650 2023-11-29 01:41:55,920 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10550, loss[loss=0.07371, simple_loss=0.1023, pruned_loss=0.0127, audio_tagging_loss=0.009849, over 16182.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08835, pruned_loss=0.01159, audio_tagging_loss=0.008693, over 3045747.73 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:41:56,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3757673.3333333335, ans=0.5 2023-11-29 01:42:11,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3757740.0, ans=0.0 2023-11-29 01:42:17,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3757740.0, ans=0.0 2023-11-29 01:42:17,642 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0 2023-11-29 01:42:19,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3757806.6666666665, ans=0.2 2023-11-29 01:42:35,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3757873.3333333335, ans=0.0 2023-11-29 01:42:37,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3757873.3333333335, ans=0.2 2023-11-29 01:42:43,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3757940.0, ans=0.1 2023-11-29 01:42:54,225 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563700 2023-11-29 01:42:57,612 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10600, loss[loss=0.04938, simple_loss=0.06888, pruned_loss=0.007678, audio_tagging_loss=0.007262, over 14781.00 frames. ], tot_loss[loss=0.06369, simple_loss=0.08728, pruned_loss=0.01142, audio_tagging_loss=0.008628, over 3046279.01 frames. ], batch size: 56, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:43:14,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3758073.3333333335, ans=0.0 2023-11-29 01:43:34,207 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 9.127e+01 9.716e+01 1.043e+02 1.257e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-29 01:43:53,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3758273.3333333335, ans=0.125 2023-11-29 01:43:54,665 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563750 2023-11-29 01:43:58,025 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10650, loss[loss=0.06643, simple_loss=0.09931, pruned_loss=0.009122, audio_tagging_loss=0.007656, over 15542.00 frames. ], tot_loss[loss=0.0641, simple_loss=0.08792, pruned_loss=0.01152, audio_tagging_loss=0.008617, over 3047109.04 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:44:08,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3758340.0, ans=0.125 2023-11-29 01:44:30,131 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:44:33,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3758473.3333333335, ans=0.125 2023-11-29 01:44:40,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3758540.0, ans=0.0 2023-11-29 01:44:41,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3758540.0, ans=0.07 2023-11-29 01:44:56,320 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563800 2023-11-29 01:45:00,050 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10700, loss[loss=0.06423, simple_loss=0.08702, pruned_loss=0.01259, audio_tagging_loss=0.008132, over 15749.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08871, pruned_loss=0.01152, audio_tagging_loss=0.00854, over 3052126.26 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:45:00,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3758673.3333333335, ans=0.125 2023-11-29 01:45:04,157 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-11-29 01:45:05,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3758673.3333333335, ans=0.0 2023-11-29 01:45:06,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3758673.3333333335, ans=0.2 2023-11-29 01:45:16,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3758740.0, ans=0.0 2023-11-29 01:45:22,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3758740.0, ans=0.0 2023-11-29 01:45:37,224 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 8.610e+01 9.369e+01 1.025e+02 1.277e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-29 01:45:58,447 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563850 2023-11-29 01:46:01,876 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10750, loss[loss=0.06467, simple_loss=0.08799, pruned_loss=0.01045, audio_tagging_loss=0.01023, over 15251.00 frames. ], tot_loss[loss=0.06423, simple_loss=0.08847, pruned_loss=0.01147, audio_tagging_loss=0.008528, over 3048258.34 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:46:03,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3759006.6666666665, ans=0.05 2023-11-29 01:46:07,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3759006.6666666665, ans=0.0 2023-11-29 01:46:13,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3759073.3333333335, ans=0.1 2023-11-29 01:46:32,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3759140.0, ans=0.1 2023-11-29 01:46:37,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3759206.6666666665, ans=0.125 2023-11-29 01:46:52,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3759273.3333333335, ans=0.1 2023-11-29 01:46:59,027 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563900 2023-11-29 01:47:02,469 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10800, loss[loss=0.07108, simple_loss=0.1002, pruned_loss=0.01545, audio_tagging_loss=0.005514, over 15674.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08911, pruned_loss=0.01157, audio_tagging_loss=0.008384, over 3046593.92 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:47:35,922 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=15.0 2023-11-29 01:47:41,318 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 9.091e+01 9.540e+01 1.017e+02 1.841e+02, threshold=1.908e+02, percent-clipped=0.0 2023-11-29 01:47:54,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3759606.6666666665, ans=0.1 2023-11-29 01:48:00,133 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 563950 2023-11-29 01:48:04,373 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10850, loss[loss=0.06366, simple_loss=0.08649, pruned_loss=0.01189, audio_tagging_loss=0.008524, over 16931.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.09018, pruned_loss=0.01165, audio_tagging_loss=0.008435, over 3052662.62 frames. ], batch size: 62, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:48:35,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3759806.6666666665, ans=10.0 2023-11-29 01:48:38,813 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.83 vs. limit=15.0 2023-11-29 01:48:42,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3759873.3333333335, ans=0.1 2023-11-29 01:48:43,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3759873.3333333335, ans=0.1 2023-11-29 01:49:03,690 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564000 2023-11-29 01:49:09,313 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:49:10,540 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10900, loss[loss=0.05521, simple_loss=0.07131, pruned_loss=0.007015, audio_tagging_loss=0.01254, over 15344.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.09, pruned_loss=0.01167, audio_tagging_loss=0.008404, over 3055326.27 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:49:14,621 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.63 vs. limit=15.0 2023-11-29 01:49:17,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3760006.6666666665, ans=0.1 2023-11-29 01:49:23,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3760073.3333333335, ans=0.125 2023-11-29 01:49:27,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3760073.3333333335, ans=0.125 2023-11-29 01:49:36,529 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2023-11-29 01:49:48,905 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.916e+01 9.098e+01 9.720e+01 1.048e+02 1.470e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-29 01:50:04,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3760273.3333333335, ans=0.05 2023-11-29 01:50:07,939 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564050 2023-11-29 01:50:08,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3760273.3333333335, ans=0.2 2023-11-29 01:50:11,354 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 10950, loss[loss=0.06897, simple_loss=0.09155, pruned_loss=0.01524, audio_tagging_loss=0.007954, over 14022.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08892, pruned_loss=0.01165, audio_tagging_loss=0.008505, over 3051572.73 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:50:13,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3760340.0, ans=0.125 2023-11-29 01:50:21,303 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.99 vs. limit=22.5 2023-11-29 01:50:38,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3760473.3333333335, ans=0.0 2023-11-29 01:51:03,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3760606.6666666665, ans=0.0 2023-11-29 01:51:08,309 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564100 2023-11-29 01:51:12,305 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11000, loss[loss=0.05432, simple_loss=0.07035, pruned_loss=0.008237, audio_tagging_loss=0.01091, over 14803.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.09004, pruned_loss=0.01181, audio_tagging_loss=0.008473, over 3054817.82 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:51:22,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.54 vs. limit=6.0 2023-11-29 01:51:23,471 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 01:51:26,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3760740.0, ans=0.125 2023-11-29 01:51:36,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3760806.6666666665, ans=0.125 2023-11-29 01:51:43,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3760806.6666666665, ans=0.5 2023-11-29 01:51:51,424 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.918e+01 9.080e+01 9.821e+01 1.045e+02 1.365e+02, threshold=1.964e+02, percent-clipped=0.0 2023-11-29 01:51:51,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3760873.3333333335, ans=0.1 2023-11-29 01:51:55,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3760873.3333333335, ans=0.0 2023-11-29 01:52:03,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3760940.0, ans=0.0 2023-11-29 01:52:06,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3760940.0, ans=0.125 2023-11-29 01:52:09,335 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564150 2023-11-29 01:52:14,028 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11050, loss[loss=0.06789, simple_loss=0.09247, pruned_loss=0.01413, audio_tagging_loss=0.00753, over 14535.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09095, pruned_loss=0.01205, audio_tagging_loss=0.00856, over 3053631.85 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:52:32,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3761073.3333333335, ans=0.125 2023-11-29 01:52:51,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3761206.6666666665, ans=0.125 2023-11-29 01:53:12,839 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564200 2023-11-29 01:53:16,664 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11100, loss[loss=0.06207, simple_loss=0.07919, pruned_loss=0.0156, audio_tagging_loss=0.006876, over 14963.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08949, pruned_loss=0.01194, audio_tagging_loss=0.008722, over 3044215.77 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:53:46,432 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.19 vs. limit=15.0 2023-11-29 01:53:54,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3761540.0, ans=0.09899494936611666 2023-11-29 01:53:56,285 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.539e+01 9.017e+01 9.701e+01 1.046e+02 1.396e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 01:54:02,420 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:54:03,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3761540.0, ans=0.125 2023-11-29 01:54:13,909 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564250 2023-11-29 01:54:17,352 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11150, loss[loss=0.05306, simple_loss=0.07356, pruned_loss=0.00798, audio_tagging_loss=0.0083, over 15532.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08929, pruned_loss=0.01188, audio_tagging_loss=0.008716, over 3037473.83 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:54:39,170 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.78 vs. limit=10.0 2023-11-29 01:54:51,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3761806.6666666665, ans=0.2 2023-11-29 01:54:51,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3761806.6666666665, ans=0.07 2023-11-29 01:54:55,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3761873.3333333335, ans=0.125 2023-11-29 01:55:00,253 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2023-11-29 01:55:15,802 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564300 2023-11-29 01:55:19,852 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11200, loss[loss=0.08568, simple_loss=0.115, pruned_loss=0.02004, audio_tagging_loss=0.008128, over 15757.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.0902, pruned_loss=0.0121, audio_tagging_loss=0.008696, over 3040738.40 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:55:24,588 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=12.0 2023-11-29 01:55:27,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3762006.6666666665, ans=0.0 2023-11-29 01:55:54,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3762140.0, ans=0.2 2023-11-29 01:55:58,724 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.132e+01 8.938e+01 9.585e+01 1.042e+02 1.236e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 01:56:08,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3762273.3333333335, ans=0.125 2023-11-29 01:56:11,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3762273.3333333335, ans=0.2 2023-11-29 01:56:12,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3762273.3333333335, ans=0.125 2023-11-29 01:56:14,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3762273.3333333335, ans=0.0 2023-11-29 01:56:17,912 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564350 2023-11-29 01:56:21,338 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11250, loss[loss=0.06611, simple_loss=0.08519, pruned_loss=0.01356, audio_tagging_loss=0.00996, over 14907.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08949, pruned_loss=0.01211, audio_tagging_loss=0.008828, over 3037971.02 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:56:25,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3762340.0, ans=0.125 2023-11-29 01:56:37,768 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:56:44,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3762473.3333333335, ans=0.5 2023-11-29 01:56:51,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3762473.3333333335, ans=0.1 2023-11-29 01:57:02,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3762540.0, ans=0.0 2023-11-29 01:57:19,347 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564400 2023-11-29 01:57:23,244 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11300, loss[loss=0.05707, simple_loss=0.08224, pruned_loss=0.00748, audio_tagging_loss=0.008468, over 16005.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08903, pruned_loss=0.01202, audio_tagging_loss=0.008749, over 3032861.38 frames. ], batch size: 61, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 01:57:23,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3762673.3333333335, ans=0.0 2023-11-29 01:57:37,074 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:57:48,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3762806.6666666665, ans=0.1 2023-11-29 01:58:01,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3762873.3333333335, ans=0.2 2023-11-29 01:58:04,516 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 9.082e+01 9.505e+01 1.023e+02 1.248e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-29 01:58:10,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3762873.3333333335, ans=0.0 2023-11-29 01:58:21,584 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564450 2023-11-29 01:58:24,963 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11350, loss[loss=0.07116, simple_loss=0.09937, pruned_loss=0.01294, audio_tagging_loss=0.008529, over 13406.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08937, pruned_loss=0.01207, audio_tagging_loss=0.008591, over 3033874.62 frames. ], batch size: 53, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:58:29,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3763006.6666666665, ans=0.5 2023-11-29 01:58:41,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3763073.3333333335, ans=0.0 2023-11-29 01:58:49,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3763140.0, ans=0.125 2023-11-29 01:58:52,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3763140.0, ans=0.125 2023-11-29 01:58:53,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3763140.0, ans=0.125 2023-11-29 01:59:13,569 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=22.5 2023-11-29 01:59:15,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3763273.3333333335, ans=0.125 2023-11-29 01:59:22,720 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564500 2023-11-29 01:59:22,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3763273.3333333335, ans=0.0 2023-11-29 01:59:26,244 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11400, loss[loss=0.04701, simple_loss=0.06053, pruned_loss=0.007071, audio_tagging_loss=0.009678, over 15636.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08869, pruned_loss=0.01199, audio_tagging_loss=0.00852, over 3037145.39 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 01:59:31,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3763340.0, ans=0.0 2023-11-29 01:59:31,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3763340.0, ans=0.125 2023-11-29 01:59:33,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3763340.0, ans=0.125 2023-11-29 01:59:39,958 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 01:59:47,655 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=22.5 2023-11-29 01:59:48,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3763406.6666666665, ans=15.0 2023-11-29 02:00:06,799 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.845e+01 9.645e+01 1.031e+02 1.502e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 02:00:13,184 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.78 vs. limit=15.0 2023-11-29 02:00:22,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3763606.6666666665, ans=0.2 2023-11-29 02:00:23,910 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564550 2023-11-29 02:00:27,281 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11450, loss[loss=0.07627, simple_loss=0.098, pruned_loss=0.01628, audio_tagging_loss=0.01099, over 15472.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08902, pruned_loss=0.01208, audio_tagging_loss=0.00848, over 3037082.01 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:00:32,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3763673.3333333335, ans=0.125 2023-11-29 02:00:44,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3763740.0, ans=0.0 2023-11-29 02:00:44,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3763740.0, ans=0.0 2023-11-29 02:01:04,403 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.35 vs. limit=22.5 2023-11-29 02:01:06,006 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.32 vs. limit=15.0 2023-11-29 02:01:09,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3763873.3333333335, ans=0.0 2023-11-29 02:01:11,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3763873.3333333335, ans=0.125 2023-11-29 02:01:24,789 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564600 2023-11-29 02:01:28,644 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11500, loss[loss=0.07779, simple_loss=0.1116, pruned_loss=0.01628, audio_tagging_loss=0.0057, over 15442.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08878, pruned_loss=0.01188, audio_tagging_loss=0.008503, over 3042542.84 frames. ], batch size: 55, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:01:39,964 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.90 vs. limit=15.0 2023-11-29 02:02:09,659 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 8.942e+01 9.581e+01 1.022e+02 1.259e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-29 02:02:09,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3764206.6666666665, ans=0.035 2023-11-29 02:02:14,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3764206.6666666665, ans=0.125 2023-11-29 02:02:27,419 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564650 2023-11-29 02:02:30,882 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11550, loss[loss=0.06348, simple_loss=0.08292, pruned_loss=0.01172, audio_tagging_loss=0.01029, over 15094.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08934, pruned_loss=0.01205, audio_tagging_loss=0.008451, over 3045891.73 frames. ], batch size: 59, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:02:37,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3764340.0, ans=0.1 2023-11-29 02:02:39,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3764340.0, ans=0.125 2023-11-29 02:02:46,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3764406.6666666665, ans=0.1 2023-11-29 02:02:50,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3764406.6666666665, ans=0.0 2023-11-29 02:02:51,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3764406.6666666665, ans=0.125 2023-11-29 02:02:54,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3764473.3333333335, ans=0.2 2023-11-29 02:02:57,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3764473.3333333335, ans=0.125 2023-11-29 02:03:03,186 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2023-11-29 02:03:10,440 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:03:27,306 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.76 vs. limit=6.0 2023-11-29 02:03:28,024 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564700 2023-11-29 02:03:29,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3764606.6666666665, ans=0.0 2023-11-29 02:03:32,102 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11600, loss[loss=0.03813, simple_loss=0.04489, pruned_loss=0.004096, audio_tagging_loss=0.01159, over 14808.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.0889, pruned_loss=0.01186, audio_tagging_loss=0.008504, over 3042446.35 frames. ], batch size: 57, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 02:04:13,365 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.962e+01 8.912e+01 9.637e+01 1.036e+02 1.418e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-29 02:04:13,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3764873.3333333335, ans=10.0 2023-11-29 02:04:20,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3764940.0, ans=0.0 2023-11-29 02:04:29,762 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564750 2023-11-29 02:04:32,558 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2023-11-29 02:04:33,232 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11650, loss[loss=0.07211, simple_loss=0.1036, pruned_loss=0.01101, audio_tagging_loss=0.009301, over 15639.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08944, pruned_loss=0.01201, audio_tagging_loss=0.008503, over 3048581.88 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:04:34,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3765006.6666666665, ans=0.125 2023-11-29 02:04:56,283 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.71 vs. limit=15.0 2023-11-29 02:05:21,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3765273.3333333335, ans=0.0 2023-11-29 02:05:22,876 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-11-29 02:05:23,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3765273.3333333335, ans=0.125 2023-11-29 02:05:29,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3765273.3333333335, ans=0.0 2023-11-29 02:05:31,009 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564800 2023-11-29 02:05:32,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3765273.3333333335, ans=0.125 2023-11-29 02:05:33,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3765340.0, ans=0.1 2023-11-29 02:05:34,758 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11700, loss[loss=0.05195, simple_loss=0.06787, pruned_loss=0.008224, audio_tagging_loss=0.009785, over 15605.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08935, pruned_loss=0.01212, audio_tagging_loss=0.008494, over 3050862.47 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:05:38,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3765340.0, ans=0.2 2023-11-29 02:05:51,612 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2023-11-29 02:05:56,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3765406.6666666665, ans=0.125 2023-11-29 02:06:01,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3765473.3333333335, ans=0.2 2023-11-29 02:06:01,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3765473.3333333335, ans=0.125 2023-11-29 02:06:09,011 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:06:16,826 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.400e+01 8.764e+01 9.502e+01 1.022e+02 1.260e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-29 02:06:20,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3765540.0, ans=0.5 2023-11-29 02:06:30,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3765606.6666666665, ans=0.2 2023-11-29 02:06:32,074 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564850 2023-11-29 02:06:33,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3765606.6666666665, ans=0.0 2023-11-29 02:06:35,541 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11750, loss[loss=0.09113, simple_loss=0.1307, pruned_loss=0.01768, audio_tagging_loss=0.008113, over 16548.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08961, pruned_loss=0.01211, audio_tagging_loss=0.0086, over 3054999.49 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:06:37,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3765673.3333333335, ans=0.0 2023-11-29 02:06:37,258 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2023-11-29 02:07:18,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3765873.3333333335, ans=0.0 2023-11-29 02:07:19,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3765873.3333333335, ans=0.0 2023-11-29 02:07:33,465 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564900 2023-11-29 02:07:37,678 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11800, loss[loss=0.07157, simple_loss=0.1083, pruned_loss=0.01113, audio_tagging_loss=0.00627, over 15994.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08976, pruned_loss=0.01205, audio_tagging_loss=0.00871, over 3055314.02 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:08:03,135 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2023-11-29 02:08:12,778 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2023-11-29 02:08:17,927 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.655e+01 9.243e+01 9.839e+01 1.045e+02 1.336e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-29 02:08:20,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3766206.6666666665, ans=0.0 2023-11-29 02:08:24,634 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=15.0 2023-11-29 02:08:28,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3766273.3333333335, ans=0.1 2023-11-29 02:08:31,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3766273.3333333335, ans=0.125 2023-11-29 02:08:35,170 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 564950 2023-11-29 02:08:38,656 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11850, loss[loss=0.07183, simple_loss=0.09292, pruned_loss=0.0152, audio_tagging_loss=0.01017, over 14002.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08916, pruned_loss=0.01191, audio_tagging_loss=0.008766, over 3047397.71 frames. ], batch size: 54, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:08:38,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3766340.0, ans=0.125 2023-11-29 02:09:04,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3766473.3333333335, ans=0.125 2023-11-29 02:09:21,212 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=22.5 2023-11-29 02:09:34,989 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565000 2023-11-29 02:09:36,641 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=22.5 2023-11-29 02:09:38,742 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11900, loss[loss=0.06329, simple_loss=0.08478, pruned_loss=0.01274, audio_tagging_loss=0.008155, over 15596.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08917, pruned_loss=0.01197, audio_tagging_loss=0.00886, over 3049789.03 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:09:41,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3766673.3333333335, ans=0.5 2023-11-29 02:09:44,209 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.80 vs. limit=22.5 2023-11-29 02:09:50,949 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.01 vs. limit=12.0 2023-11-29 02:10:19,334 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 8.853e+01 9.543e+01 1.009e+02 1.340e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-29 02:10:19,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3766873.3333333335, ans=0.125 2023-11-29 02:10:24,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3766873.3333333335, ans=0.125 2023-11-29 02:10:34,500 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565050 2023-11-29 02:10:36,089 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2023-11-29 02:10:37,871 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 11950, loss[loss=0.04553, simple_loss=0.06236, pruned_loss=0.004911, audio_tagging_loss=0.009443, over 15340.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.08838, pruned_loss=0.01194, audio_tagging_loss=0.008911, over 3046882.03 frames. ], batch size: 58, lr: 1.43e-03, grad_scale: 16.0 2023-11-29 02:10:38,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3767006.6666666665, ans=10.0 2023-11-29 02:10:51,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3767073.3333333335, ans=0.0 2023-11-29 02:11:02,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3767140.0, ans=0.125 2023-11-29 02:11:09,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3767140.0, ans=0.125 2023-11-29 02:11:11,366 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:11:14,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3767206.6666666665, ans=0.1 2023-11-29 02:11:24,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3767273.3333333335, ans=0.125 2023-11-29 02:11:24,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3767273.3333333335, ans=0.5 2023-11-29 02:11:24,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3767273.3333333335, ans=0.09899494936611666 2023-11-29 02:11:33,268 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565100 2023-11-29 02:11:36,068 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.71 vs. limit=15.0 2023-11-29 02:11:36,580 INFO [train_asr.py:1235] (2/4) Epoch 47, batch 12000, loss[loss=0.0839, simple_loss=0.1159, pruned_loss=0.01835, audio_tagging_loss=0.007593, over 16574.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08954, pruned_loss=0.01215, audio_tagging_loss=0.008959, over 3049538.57 frames. ], batch size: 60, lr: 1.43e-03, grad_scale: 32.0 2023-11-29 02:11:36,581 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 02:12:00,205 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4909, 3.5044, 3.9102, 3.7191], device='cuda:2') 2023-11-29 02:12:14,243 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.9316, 1.7583, 3.5063, 2.9114, 3.0140, 3.0359, 2.9278, 3.0923], device='cuda:2') 2023-11-29 02:12:16,766 INFO [train_asr.py:1267] (2/4) Epoch 47, validation: loss=0.05799, simple_loss=0.0505, pruned_loss=0.005391, audio_tagging_loss=0.02735, over 4681554.00 frames. 2023-11-29 02:12:16,767 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 02:13:00,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3767493.3333333335, ans=0.125 2023-11-29 02:13:01,511 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 0, loss[loss=0.08964, simple_loss=0.1133, pruned_loss=0.01532, audio_tagging_loss=0.01765, over 16248.00 frames. ], tot_loss[loss=0.08964, simple_loss=0.1133, pruned_loss=0.01532, audio_tagging_loss=0.01765, over 16248.00 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:13:01,512 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 02:13:22,555 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1689, 2.4381, 5.0412, 3.0609], device='cuda:2') 2023-11-29 02:13:28,289 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4648, 3.8493, 3.2039, 3.8549], device='cuda:2') 2023-11-29 02:13:34,041 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9515, 3.1261, 2.9559, 3.0788, 3.3471, 2.7812, 3.3957, 2.5964], device='cuda:2') 2023-11-29 02:13:36,860 INFO [train_asr.py:1267] (2/4) Epoch 48, validation: loss=0.05814, simple_loss=0.05045, pruned_loss=0.005317, audio_tagging_loss=0.02759, over 4681554.00 frames. 2023-11-29 02:13:36,860 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 02:13:43,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3767493.3333333335, ans=0.2 2023-11-29 02:13:50,680 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.137e+01 9.361e+01 1.012e+02 1.115e+02 1.422e+02, threshold=2.023e+02, percent-clipped=0.0 2023-11-29 02:14:08,487 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565150 2023-11-29 02:14:09,032 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=22.5 2023-11-29 02:14:23,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3767693.3333333335, ans=0.2 2023-11-29 02:14:32,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3767760.0, ans=0.2 2023-11-29 02:14:40,287 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 50, loss[loss=0.06453, simple_loss=0.08141, pruned_loss=0.009876, audio_tagging_loss=0.01395, over 15452.00 frames. ], tot_loss[loss=0.07177, simple_loss=0.08707, pruned_loss=0.01159, audio_tagging_loss=0.01665, over 683402.56 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:14:48,461 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.01 vs. limit=15.0 2023-11-29 02:14:52,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3767893.3333333335, ans=0.2 2023-11-29 02:15:03,417 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=12.0 2023-11-29 02:15:09,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3767960.0, ans=0.0 2023-11-29 02:15:10,135 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565200 2023-11-29 02:15:16,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3768026.6666666665, ans=0.125 2023-11-29 02:15:21,139 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2023-11-29 02:15:43,430 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 100, loss[loss=0.0695, simple_loss=0.09191, pruned_loss=0.01135, audio_tagging_loss=0.01219, over 14832.00 frames. ], tot_loss[loss=0.07079, simple_loss=0.08701, pruned_loss=0.01142, audio_tagging_loss=0.01587, over 1199254.72 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:15:47,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3768160.0, ans=0.125 2023-11-29 02:15:50,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3768160.0, ans=0.125 2023-11-29 02:15:56,406 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.236e+01 9.896e+01 1.062e+02 1.155e+02 1.316e+02, threshold=2.123e+02, percent-clipped=0.0 2023-11-29 02:16:02,893 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.80 vs. limit=15.0 2023-11-29 02:16:11,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3768293.3333333335, ans=0.125 2023-11-29 02:16:11,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3768293.3333333335, ans=0.0 2023-11-29 02:16:12,196 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565250 2023-11-29 02:16:33,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3768426.6666666665, ans=0.1 2023-11-29 02:16:33,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3768426.6666666665, ans=0.0 2023-11-29 02:16:37,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3768426.6666666665, ans=0.1 2023-11-29 02:16:43,669 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 150, loss[loss=0.06713, simple_loss=0.09742, pruned_loss=0.01179, audio_tagging_loss=0.006632, over 16374.00 frames. ], tot_loss[loss=0.06965, simple_loss=0.08737, pruned_loss=0.01156, audio_tagging_loss=0.0144, over 1609502.27 frames. ], batch size: 63, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:16:47,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3768493.3333333335, ans=0.0 2023-11-29 02:17:03,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3768560.0, ans=0.0 2023-11-29 02:17:08,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3768626.6666666665, ans=0.09899494936611666 2023-11-29 02:17:14,161 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565300 2023-11-29 02:17:22,505 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-29 02:17:26,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3768693.3333333335, ans=0.125 2023-11-29 02:17:32,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3768760.0, ans=0.0 2023-11-29 02:17:40,818 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2023-11-29 02:17:46,577 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 200, loss[loss=0.0775, simple_loss=0.1019, pruned_loss=0.01652, audio_tagging_loss=0.01003, over 15360.00 frames. ], tot_loss[loss=0.06952, simple_loss=0.08987, pruned_loss=0.012, audio_tagging_loss=0.01258, over 1931902.48 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:18:02,096 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.150e+01 9.879e+01 1.074e+02 1.273e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 02:18:03,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3768893.3333333335, ans=0.125 2023-11-29 02:18:11,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3768960.0, ans=0.95 2023-11-29 02:18:16,749 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565350 2023-11-29 02:18:35,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3769093.3333333335, ans=0.125 2023-11-29 02:18:39,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3769093.3333333335, ans=0.125 2023-11-29 02:18:49,017 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 250, loss[loss=0.06969, simple_loss=0.09644, pruned_loss=0.01388, audio_tagging_loss=0.007588, over 15077.00 frames. ], tot_loss[loss=0.06823, simple_loss=0.08995, pruned_loss=0.01197, audio_tagging_loss=0.01128, over 2183761.22 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:18:52,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3769160.0, ans=0.125 2023-11-29 02:18:52,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3769160.0, ans=0.125 2023-11-29 02:19:08,929 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.70 vs. limit=22.5 2023-11-29 02:19:18,396 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565400 2023-11-29 02:19:22,515 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:19:24,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3769360.0, ans=0.0 2023-11-29 02:19:51,084 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 300, loss[loss=0.04672, simple_loss=0.06445, pruned_loss=0.006726, audio_tagging_loss=0.007767, over 14356.00 frames. ], tot_loss[loss=0.06788, simple_loss=0.09055, pruned_loss=0.01213, audio_tagging_loss=0.01048, over 2375963.61 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:19:59,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3769493.3333333335, ans=0.0 2023-11-29 02:20:05,685 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.057e+01 9.202e+01 9.824e+01 1.066e+02 1.297e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-29 02:20:19,701 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565450 2023-11-29 02:20:52,707 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 350, loss[loss=0.05423, simple_loss=0.07738, pruned_loss=0.008794, audio_tagging_loss=0.00675, over 15085.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09083, pruned_loss=0.01218, audio_tagging_loss=0.009984, over 2532062.01 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:20:52,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3769826.6666666665, ans=0.125 2023-11-29 02:21:03,661 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2023-11-29 02:21:22,213 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565500 2023-11-29 02:21:45,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3770093.3333333335, ans=0.0 2023-11-29 02:21:53,366 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 400, loss[loss=0.05642, simple_loss=0.07194, pruned_loss=0.01148, audio_tagging_loss=0.00897, over 14776.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09092, pruned_loss=0.01209, audio_tagging_loss=0.009607, over 2644638.62 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:22:09,119 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.365e+01 8.956e+01 9.429e+01 1.009e+02 1.369e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-29 02:22:10,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3770226.6666666665, ans=0.2 2023-11-29 02:22:10,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3770226.6666666665, ans=0.125 2023-11-29 02:22:23,929 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565550 2023-11-29 02:22:40,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3770360.0, ans=0.1 2023-11-29 02:22:56,246 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 450, loss[loss=0.06385, simple_loss=0.08632, pruned_loss=0.0135, audio_tagging_loss=0.00718, over 14740.00 frames. ], tot_loss[loss=0.06653, simple_loss=0.09029, pruned_loss=0.01211, audio_tagging_loss=0.009272, over 2729009.47 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:23:14,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3770560.0, ans=0.125 2023-11-29 02:23:24,849 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565600 2023-11-29 02:23:25,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3770626.6666666665, ans=0.125 2023-11-29 02:23:57,788 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 500, loss[loss=0.05665, simple_loss=0.07753, pruned_loss=0.008554, audio_tagging_loss=0.00933, over 14441.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.08971, pruned_loss=0.01197, audio_tagging_loss=0.00909, over 2797458.07 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:24:00,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3770826.6666666665, ans=0.125 2023-11-29 02:24:05,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3770826.6666666665, ans=0.09899494936611666 2023-11-29 02:24:12,844 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.822e+01 8.944e+01 9.480e+01 1.026e+02 1.531e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-29 02:24:13,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3770893.3333333335, ans=0.1 2023-11-29 02:24:26,892 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565650 2023-11-29 02:24:26,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3770960.0, ans=0.2 2023-11-29 02:24:29,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3770960.0, ans=0.125 2023-11-29 02:24:31,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3770960.0, ans=0.0 2023-11-29 02:24:50,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3771093.3333333335, ans=0.125 2023-11-29 02:24:56,843 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.06 vs. limit=15.0 2023-11-29 02:24:58,592 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 550, loss[loss=0.07245, simple_loss=0.09924, pruned_loss=0.01311, audio_tagging_loss=0.009716, over 16057.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08963, pruned_loss=0.01197, audio_tagging_loss=0.00896, over 2849288.36 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:25:28,899 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565700 2023-11-29 02:25:43,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3771360.0, ans=0.125 2023-11-29 02:25:59,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3771493.3333333335, ans=0.2 2023-11-29 02:26:00,425 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 600, loss[loss=0.06795, simple_loss=0.1012, pruned_loss=0.01076, audio_tagging_loss=0.006588, over 14831.00 frames. ], tot_loss[loss=0.06623, simple_loss=0.09049, pruned_loss=0.0122, audio_tagging_loss=0.008781, over 2894571.54 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:26:08,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3771493.3333333335, ans=0.0 2023-11-29 02:26:16,978 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 8.960e+01 9.657e+01 1.065e+02 1.783e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 02:26:30,066 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565750 2023-11-29 02:26:34,178 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.11 vs. limit=22.5 2023-11-29 02:26:34,259 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.64 vs. limit=22.5 2023-11-29 02:27:02,370 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 650, loss[loss=0.06067, simple_loss=0.08267, pruned_loss=0.01003, audio_tagging_loss=0.009306, over 15016.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09007, pruned_loss=0.01212, audio_tagging_loss=0.008785, over 2930364.34 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:27:13,571 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.44 vs. limit=10.0 2023-11-29 02:27:31,197 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565800 2023-11-29 02:27:41,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3772026.6666666665, ans=0.125 2023-11-29 02:27:53,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3772093.3333333335, ans=0.125 2023-11-29 02:28:03,563 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 700, loss[loss=0.05059, simple_loss=0.06748, pruned_loss=0.009673, audio_tagging_loss=0.007178, over 15071.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.08972, pruned_loss=0.01215, audio_tagging_loss=0.008778, over 2960680.12 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:28:11,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3772160.0, ans=0.1 2023-11-29 02:28:19,004 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.15 vs. limit=15.0 2023-11-29 02:28:19,247 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.910e+01 9.498e+01 1.006e+02 1.347e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-29 02:28:32,945 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565850 2023-11-29 02:28:34,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3772293.3333333335, ans=0.1 2023-11-29 02:28:36,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3772293.3333333335, ans=0.125 2023-11-29 02:28:45,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3772360.0, ans=0.1 2023-11-29 02:29:00,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3772426.6666666665, ans=0.04949747468305833 2023-11-29 02:29:04,566 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 750, loss[loss=0.0675, simple_loss=0.09746, pruned_loss=0.01055, audio_tagging_loss=0.008225, over 15217.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08971, pruned_loss=0.0119, audio_tagging_loss=0.008767, over 2976746.37 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:29:19,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3772560.0, ans=0.1 2023-11-29 02:29:22,910 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2023-11-29 02:29:23,870 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2023-11-29 02:29:30,781 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.24 vs. limit=15.0 2023-11-29 02:29:33,921 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565900 2023-11-29 02:29:34,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3772626.6666666665, ans=0.0 2023-11-29 02:29:40,416 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.47 vs. limit=22.5 2023-11-29 02:29:52,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3772760.0, ans=0.125 2023-11-29 02:30:04,187 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.78 vs. limit=10.0 2023-11-29 02:30:05,610 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 800, loss[loss=0.05222, simple_loss=0.07453, pruned_loss=0.00572, audio_tagging_loss=0.009238, over 14388.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08952, pruned_loss=0.01173, audio_tagging_loss=0.00872, over 2991947.73 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:30:09,246 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2023-11-29 02:30:13,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3772826.6666666665, ans=0.125 2023-11-29 02:30:21,571 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 9.034e+01 9.772e+01 1.029e+02 1.331e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 02:30:35,161 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 565950 2023-11-29 02:30:53,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3773093.3333333335, ans=0.125 2023-11-29 02:31:06,985 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 850, loss[loss=0.05642, simple_loss=0.0722, pruned_loss=0.01028, audio_tagging_loss=0.01004, over 14989.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08945, pruned_loss=0.01173, audio_tagging_loss=0.008862, over 3003338.57 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:31:15,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3773160.0, ans=0.125 2023-11-29 02:31:36,517 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566000 2023-11-29 02:31:41,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3773293.3333333335, ans=0.125 2023-11-29 02:31:50,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3773360.0, ans=0.125 2023-11-29 02:31:51,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3773360.0, ans=0.0 2023-11-29 02:31:51,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3773360.0, ans=0.125 2023-11-29 02:31:51,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.47 vs. limit=15.0 2023-11-29 02:31:55,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3773426.6666666665, ans=0.0 2023-11-29 02:31:57,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3773426.6666666665, ans=0.0 2023-11-29 02:32:06,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3773426.6666666665, ans=10.0 2023-11-29 02:32:07,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3773426.6666666665, ans=0.05 2023-11-29 02:32:09,879 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 900, loss[loss=0.0834, simple_loss=0.1195, pruned_loss=0.01704, audio_tagging_loss=0.00661, over 16162.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08964, pruned_loss=0.01181, audio_tagging_loss=0.008844, over 3018690.23 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:32:12,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3773493.3333333335, ans=0.0 2023-11-29 02:32:18,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3773493.3333333335, ans=0.125 2023-11-29 02:32:26,362 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.248e+01 9.124e+01 9.810e+01 1.032e+02 1.259e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 02:32:36,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3773626.6666666665, ans=0.05 2023-11-29 02:32:39,591 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566050 2023-11-29 02:32:51,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3773693.3333333335, ans=0.125 2023-11-29 02:32:51,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3773693.3333333335, ans=0.2 2023-11-29 02:33:05,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3773760.0, ans=0.2 2023-11-29 02:33:11,553 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 950, loss[loss=0.0666, simple_loss=0.09053, pruned_loss=0.01226, audio_tagging_loss=0.009076, over 15482.00 frames. ], tot_loss[loss=0.065, simple_loss=0.0895, pruned_loss=0.01156, audio_tagging_loss=0.008687, over 3029880.65 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:33:24,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3773893.3333333335, ans=0.125 2023-11-29 02:33:27,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3773893.3333333335, ans=0.0 2023-11-29 02:33:42,059 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566100 2023-11-29 02:33:42,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3773960.0, ans=0.125 2023-11-29 02:33:44,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3773960.0, ans=0.125 2023-11-29 02:33:46,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3773960.0, ans=0.125 2023-11-29 02:34:07,597 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.99 vs. limit=6.0 2023-11-29 02:34:13,541 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1000, loss[loss=0.06289, simple_loss=0.08789, pruned_loss=0.01002, audio_tagging_loss=0.008919, over 14336.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08911, pruned_loss=0.01149, audio_tagging_loss=0.008575, over 3030892.72 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:34:24,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3774226.6666666665, ans=0.125 2023-11-29 02:34:30,797 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 9.000e+01 9.678e+01 1.023e+02 1.395e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 02:34:33,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3774226.6666666665, ans=0.125 2023-11-29 02:34:39,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3774293.3333333335, ans=0.0 2023-11-29 02:34:41,576 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:34:42,748 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566150 2023-11-29 02:34:55,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3774360.0, ans=0.125 2023-11-29 02:35:10,867 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.33 vs. limit=22.5 2023-11-29 02:35:15,286 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1050, loss[loss=0.05819, simple_loss=0.08237, pruned_loss=0.008598, audio_tagging_loss=0.008413, over 15065.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08943, pruned_loss=0.01157, audio_tagging_loss=0.00847, over 3035570.47 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:35:16,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3774493.3333333335, ans=0.2 2023-11-29 02:35:29,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3774560.0, ans=0.0 2023-11-29 02:35:33,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3774560.0, ans=0.1 2023-11-29 02:35:34,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3774560.0, ans=0.125 2023-11-29 02:35:39,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3774626.6666666665, ans=0.125 2023-11-29 02:35:44,119 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566200 2023-11-29 02:35:53,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3774693.3333333335, ans=0.125 2023-11-29 02:36:13,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3774760.0, ans=0.07 2023-11-29 02:36:13,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3774760.0, ans=0.0 2023-11-29 02:36:16,930 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1100, loss[loss=0.06164, simple_loss=0.08611, pruned_loss=0.01215, audio_tagging_loss=0.006433, over 14579.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.0886, pruned_loss=0.01154, audio_tagging_loss=0.008425, over 3037331.20 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:36:20,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3774826.6666666665, ans=0.05 2023-11-29 02:36:21,666 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:36:34,686 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.921e+01 9.429e+01 9.964e+01 1.346e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-29 02:36:47,023 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566250 2023-11-29 02:36:54,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3775026.6666666665, ans=0.035 2023-11-29 02:37:19,252 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1150, loss[loss=0.05767, simple_loss=0.08069, pruned_loss=0.009442, audio_tagging_loss=0.007889, over 15264.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08924, pruned_loss=0.01162, audio_tagging_loss=0.008325, over 3039577.31 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:37:23,044 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=22.5 2023-11-29 02:37:26,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3775160.0, ans=0.125 2023-11-29 02:37:29,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3775160.0, ans=0.125 2023-11-29 02:37:49,337 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566300 2023-11-29 02:38:21,966 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1200, loss[loss=0.07406, simple_loss=0.1012, pruned_loss=0.01534, audio_tagging_loss=0.008128, over 15819.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08941, pruned_loss=0.01169, audio_tagging_loss=0.008303, over 3041048.33 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:38:28,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3775493.3333333335, ans=0.125 2023-11-29 02:38:35,031 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2023-11-29 02:38:39,027 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.292e+01 9.106e+01 9.655e+01 1.032e+02 1.347e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 02:38:45,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3775626.6666666665, ans=0.125 2023-11-29 02:38:51,445 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566350 2023-11-29 02:39:03,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3775693.3333333335, ans=0.1 2023-11-29 02:39:05,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3775693.3333333335, ans=10.0 2023-11-29 02:39:05,371 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2023-11-29 02:39:08,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3775693.3333333335, ans=0.0 2023-11-29 02:39:13,438 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2023-11-29 02:39:16,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3775760.0, ans=0.0 2023-11-29 02:39:23,529 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1250, loss[loss=0.07035, simple_loss=0.1046, pruned_loss=0.009615, audio_tagging_loss=0.008446, over 15223.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.09011, pruned_loss=0.0119, audio_tagging_loss=0.008313, over 3039563.52 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:39:50,131 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.47 vs. limit=10.0 2023-11-29 02:39:50,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3775960.0, ans=0.0 2023-11-29 02:39:53,687 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566400 2023-11-29 02:40:01,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3776026.6666666665, ans=0.125 2023-11-29 02:40:09,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3776026.6666666665, ans=0.0 2023-11-29 02:40:12,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3776093.3333333335, ans=0.125 2023-11-29 02:40:15,099 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-11-29 02:40:16,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3776093.3333333335, ans=0.2 2023-11-29 02:40:21,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3776093.3333333335, ans=0.125 2023-11-29 02:40:23,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3776093.3333333335, ans=0.0 2023-11-29 02:40:25,321 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1300, loss[loss=0.06984, simple_loss=0.09758, pruned_loss=0.01264, audio_tagging_loss=0.008416, over 14524.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08961, pruned_loss=0.01174, audio_tagging_loss=0.00834, over 3039271.67 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:40:43,991 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 8.895e+01 9.443e+01 1.023e+02 1.246e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-29 02:40:48,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3776226.6666666665, ans=0.1 2023-11-29 02:40:52,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3776293.3333333335, ans=0.1 2023-11-29 02:40:55,063 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566450 2023-11-29 02:41:11,086 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.84 vs. limit=12.0 2023-11-29 02:41:26,031 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1350, loss[loss=0.06809, simple_loss=0.08298, pruned_loss=0.01607, audio_tagging_loss=0.01053, over 16636.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.0894, pruned_loss=0.0117, audio_tagging_loss=0.008415, over 3043029.12 frames. ], batch size: 65, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:41:31,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3776493.3333333335, ans=0.125 2023-11-29 02:41:49,776 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.04 vs. limit=22.5 2023-11-29 02:41:56,979 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566500 2023-11-29 02:42:05,865 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2023-11-29 02:42:13,875 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:42:17,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3776760.0, ans=0.2 2023-11-29 02:42:17,775 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.69 vs. limit=6.0 2023-11-29 02:42:21,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3776760.0, ans=0.1 2023-11-29 02:42:29,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3776826.6666666665, ans=0.0 2023-11-29 02:42:29,817 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1400, loss[loss=0.06367, simple_loss=0.09158, pruned_loss=0.01032, audio_tagging_loss=0.007558, over 15263.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08969, pruned_loss=0.0118, audio_tagging_loss=0.008425, over 3049773.56 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:42:44,584 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:42:46,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3776893.3333333335, ans=0.95 2023-11-29 02:42:47,820 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 8.922e+01 9.372e+01 1.016e+02 1.403e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-29 02:42:58,365 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566550 2023-11-29 02:43:04,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3777026.6666666665, ans=0.125 2023-11-29 02:43:09,105 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:43:09,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3777026.6666666665, ans=0.0 2023-11-29 02:43:09,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3777026.6666666665, ans=0.1 2023-11-29 02:43:13,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3777026.6666666665, ans=0.125 2023-11-29 02:43:24,610 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2023-11-29 02:43:30,430 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1450, loss[loss=0.06445, simple_loss=0.0843, pruned_loss=0.01434, audio_tagging_loss=0.007959, over 14661.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.09001, pruned_loss=0.01187, audio_tagging_loss=0.00849, over 3049023.00 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:43:31,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3777160.0, ans=0.0 2023-11-29 02:43:37,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3777160.0, ans=0.0 2023-11-29 02:43:50,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3777226.6666666665, ans=0.125 2023-11-29 02:43:58,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3777293.3333333335, ans=0.0 2023-11-29 02:44:00,589 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566600 2023-11-29 02:44:08,595 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2023-11-29 02:44:32,070 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1500, loss[loss=0.09199, simple_loss=0.1271, pruned_loss=0.01956, audio_tagging_loss=0.008879, over 16042.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09041, pruned_loss=0.01198, audio_tagging_loss=0.008608, over 3048642.99 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:44:33,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3777493.3333333335, ans=0.04949747468305833 2023-11-29 02:44:51,117 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 9.184e+01 9.950e+01 1.078e+02 1.281e+02, threshold=1.990e+02, percent-clipped=0.0 2023-11-29 02:45:01,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3777626.6666666665, ans=0.125 2023-11-29 02:45:02,372 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566650 2023-11-29 02:45:04,251 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.37 vs. limit=12.0 2023-11-29 02:45:24,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3777760.0, ans=0.05 2023-11-29 02:45:27,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3777760.0, ans=0.0 2023-11-29 02:45:34,471 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1550, loss[loss=0.07219, simple_loss=0.09554, pruned_loss=0.01557, audio_tagging_loss=0.008844, over 15748.00 frames. ], tot_loss[loss=0.06591, simple_loss=0.09044, pruned_loss=0.01203, audio_tagging_loss=0.008659, over 3046786.53 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:46:03,625 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566700 2023-11-29 02:46:10,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3778026.6666666665, ans=0.2 2023-11-29 02:46:21,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3778026.6666666665, ans=0.2 2023-11-29 02:46:36,493 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1600, loss[loss=0.04781, simple_loss=0.05331, pruned_loss=0.006164, audio_tagging_loss=0.01499, over 14717.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.08961, pruned_loss=0.01193, audio_tagging_loss=0.00878, over 3046534.48 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:46:40,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3778160.0, ans=0.015 2023-11-29 02:46:40,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3778160.0, ans=0.125 2023-11-29 02:46:54,282 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.477e+01 9.129e+01 9.735e+01 1.042e+02 2.046e+02, threshold=1.947e+02, percent-clipped=1.0 2023-11-29 02:47:01,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3778293.3333333335, ans=0.2 2023-11-29 02:47:05,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3778293.3333333335, ans=0.1 2023-11-29 02:47:06,787 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566750 2023-11-29 02:47:24,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3778426.6666666665, ans=0.125 2023-11-29 02:47:33,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3778426.6666666665, ans=0.2 2023-11-29 02:47:37,578 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1650, loss[loss=0.09391, simple_loss=0.132, pruned_loss=0.02184, audio_tagging_loss=0.006061, over 15892.00 frames. ], tot_loss[loss=0.06598, simple_loss=0.0901, pruned_loss=0.01216, audio_tagging_loss=0.008768, over 3047061.69 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:47:57,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3778560.0, ans=0.125 2023-11-29 02:48:07,775 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566800 2023-11-29 02:48:12,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3778626.6666666665, ans=0.125 2023-11-29 02:48:40,039 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1700, loss[loss=0.05886, simple_loss=0.07988, pruned_loss=0.009183, audio_tagging_loss=0.009737, over 15498.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.08979, pruned_loss=0.01213, audio_tagging_loss=0.008829, over 3047342.33 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:48:53,073 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.23 vs. limit=15.0 2023-11-29 02:48:58,817 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.604e+01 9.116e+01 9.697e+01 1.043e+02 1.617e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-29 02:49:00,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3778893.3333333335, ans=0.0 2023-11-29 02:49:09,452 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566850 2023-11-29 02:49:41,221 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1750, loss[loss=0.08362, simple_loss=0.1205, pruned_loss=0.01602, audio_tagging_loss=0.007327, over 15498.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.0896, pruned_loss=0.01206, audio_tagging_loss=0.008799, over 3051526.80 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:49:57,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3779226.6666666665, ans=0.125 2023-11-29 02:50:11,324 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566900 2023-11-29 02:50:15,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3779293.3333333335, ans=0.2 2023-11-29 02:50:19,835 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.72 vs. limit=15.0 2023-11-29 02:50:20,544 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:50:39,132 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.10 vs. limit=10.0 2023-11-29 02:50:43,317 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1800, loss[loss=0.05771, simple_loss=0.07855, pruned_loss=0.008688, audio_tagging_loss=0.009742, over 14486.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08979, pruned_loss=0.0121, audio_tagging_loss=0.008692, over 3043531.25 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:50:48,676 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2023-11-29 02:51:02,037 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.164e+01 9.130e+01 9.771e+01 1.053e+02 1.389e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 02:51:13,248 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 566950 2023-11-29 02:51:20,355 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.11 vs. limit=15.0 2023-11-29 02:51:31,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3779760.0, ans=0.2 2023-11-29 02:51:45,202 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1850, loss[loss=0.06408, simple_loss=0.08942, pruned_loss=0.01072, audio_tagging_loss=0.008647, over 15604.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08935, pruned_loss=0.01193, audio_tagging_loss=0.008628, over 3045235.91 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:52:14,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3779960.0, ans=0.125 2023-11-29 02:52:15,094 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567000 2023-11-29 02:52:20,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3779960.0, ans=0.2 2023-11-29 02:52:39,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3780093.3333333335, ans=0.125 2023-11-29 02:52:47,568 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1900, loss[loss=0.0432, simple_loss=0.05932, pruned_loss=0.004992, audio_tagging_loss=0.008548, over 14840.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.09005, pruned_loss=0.01211, audio_tagging_loss=0.008502, over 3042362.23 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:53:06,762 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 9.033e+01 9.601e+01 1.005e+02 1.271e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 02:53:17,326 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567050 2023-11-29 02:53:23,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3780360.0, ans=0.1 2023-11-29 02:53:48,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3780493.3333333335, ans=0.2 2023-11-29 02:53:49,026 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 1950, loss[loss=0.06902, simple_loss=0.09519, pruned_loss=0.01326, audio_tagging_loss=0.008168, over 15878.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08951, pruned_loss=0.01193, audio_tagging_loss=0.008504, over 3052087.70 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 02:53:50,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3780493.3333333335, ans=0.0 2023-11-29 02:53:51,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3780493.3333333335, ans=0.2 2023-11-29 02:53:58,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3780493.3333333335, ans=0.125 2023-11-29 02:54:18,106 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567100 2023-11-29 02:54:22,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3780626.6666666665, ans=0.125 2023-11-29 02:54:51,068 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2000, loss[loss=0.07214, simple_loss=0.1023, pruned_loss=0.0123, audio_tagging_loss=0.008703, over 16323.00 frames. ], tot_loss[loss=0.06453, simple_loss=0.08853, pruned_loss=0.01175, audio_tagging_loss=0.008513, over 3052962.83 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:55:10,608 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.467e+01 8.857e+01 9.495e+01 1.044e+02 1.385e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-29 02:55:13,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3780893.3333333335, ans=0.125 2023-11-29 02:55:15,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3780960.0, ans=0.0 2023-11-29 02:55:20,821 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567150 2023-11-29 02:55:41,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3781093.3333333335, ans=0.1 2023-11-29 02:55:49,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3781093.3333333335, ans=0.0 2023-11-29 02:55:52,031 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2050, loss[loss=0.06885, simple_loss=0.09172, pruned_loss=0.01453, audio_tagging_loss=0.008459, over 16032.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08865, pruned_loss=0.01178, audio_tagging_loss=0.008474, over 3045820.31 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:55:58,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3781160.0, ans=0.0 2023-11-29 02:56:09,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3781226.6666666665, ans=0.1 2023-11-29 02:56:20,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3781293.3333333335, ans=0.125 2023-11-29 02:56:21,412 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567200 2023-11-29 02:56:26,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3781293.3333333335, ans=0.125 2023-11-29 02:56:31,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3781360.0, ans=0.0 2023-11-29 02:56:35,059 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 02:56:39,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3781426.6666666665, ans=0.1 2023-11-29 02:56:53,767 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2100, loss[loss=0.06205, simple_loss=0.09006, pruned_loss=0.008195, audio_tagging_loss=0.008827, over 14319.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08938, pruned_loss=0.0119, audio_tagging_loss=0.008364, over 3047012.27 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:56:56,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3781493.3333333335, ans=0.2 2023-11-29 02:57:04,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3781560.0, ans=0.0 2023-11-29 02:57:06,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3781560.0, ans=0.125 2023-11-29 02:57:13,809 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.990e+01 9.043e+01 9.646e+01 1.030e+02 1.494e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 02:57:21,233 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.72 vs. limit=22.5 2023-11-29 02:57:23,109 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567250 2023-11-29 02:57:28,332 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.25 vs. limit=22.5 2023-11-29 02:57:30,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3781693.3333333335, ans=0.0 2023-11-29 02:57:52,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3781760.0, ans=0.0 2023-11-29 02:57:55,481 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2150, loss[loss=0.07619, simple_loss=0.1037, pruned_loss=0.01657, audio_tagging_loss=0.007766, over 15320.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08889, pruned_loss=0.01191, audio_tagging_loss=0.008351, over 3043920.47 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:57:55,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3781826.6666666665, ans=0.125 2023-11-29 02:58:03,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2023-11-29 02:58:20,586 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2023-11-29 02:58:25,425 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567300 2023-11-29 02:58:26,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3781960.0, ans=0.2 2023-11-29 02:58:29,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3781960.0, ans=0.0 2023-11-29 02:58:31,685 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2023-11-29 02:58:34,673 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 02:58:37,866 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-29 02:58:38,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3782026.6666666665, ans=0.07 2023-11-29 02:58:53,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3782093.3333333335, ans=0.2 2023-11-29 02:58:56,789 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2200, loss[loss=0.08766, simple_loss=0.1257, pruned_loss=0.01715, audio_tagging_loss=0.007646, over 15348.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.09003, pruned_loss=0.01217, audio_tagging_loss=0.008413, over 3045591.11 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 02:59:00,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3782160.0, ans=0.1 2023-11-29 02:59:01,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3782160.0, ans=0.1 2023-11-29 02:59:07,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3782160.0, ans=0.125 2023-11-29 02:59:16,869 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 9.088e+01 9.723e+01 1.043e+02 1.467e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-29 02:59:26,406 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567350 2023-11-29 02:59:29,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3782293.3333333335, ans=0.0 2023-11-29 02:59:37,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3782360.0, ans=0.125 2023-11-29 02:59:42,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3782360.0, ans=0.2 2023-11-29 02:59:51,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3782426.6666666665, ans=0.125 2023-11-29 02:59:58,754 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2250, loss[loss=0.1027, simple_loss=0.1326, pruned_loss=0.02879, audio_tagging_loss=0.007575, over 15320.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08983, pruned_loss=0.01224, audio_tagging_loss=0.00855, over 3042664.20 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:00:10,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3782560.0, ans=0.0 2023-11-29 03:00:29,165 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567400 2023-11-29 03:00:29,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3782626.6666666665, ans=0.1 2023-11-29 03:00:36,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3782693.3333333335, ans=0.025 2023-11-29 03:00:51,139 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.16 vs. limit=15.0 2023-11-29 03:01:00,989 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2300, loss[loss=0.0509, simple_loss=0.05909, pruned_loss=0.007801, audio_tagging_loss=0.01356, over 14936.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.0893, pruned_loss=0.0122, audio_tagging_loss=0.008621, over 3038448.40 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:01:01,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=3782826.6666666665, ans=8.0 2023-11-29 03:01:20,735 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 9.062e+01 9.669e+01 1.048e+02 1.317e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 03:01:25,014 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=15.0 2023-11-29 03:01:30,799 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567450 2023-11-29 03:01:49,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3783093.3333333335, ans=0.1 2023-11-29 03:01:58,188 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:02:02,742 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2350, loss[loss=0.0819, simple_loss=0.1096, pruned_loss=0.01939, audio_tagging_loss=0.007711, over 14145.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.0898, pruned_loss=0.01224, audio_tagging_loss=0.008618, over 3040037.41 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:02:05,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3783160.0, ans=0.125 2023-11-29 03:02:29,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3783293.3333333335, ans=0.125 2023-11-29 03:02:32,430 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567500 2023-11-29 03:03:04,478 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2400, loss[loss=0.06361, simple_loss=0.08127, pruned_loss=0.0127, audio_tagging_loss=0.01027, over 14571.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08914, pruned_loss=0.01203, audio_tagging_loss=0.008801, over 3040706.96 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:03:19,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3783560.0, ans=0.125 2023-11-29 03:03:27,143 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 9.160e+01 9.857e+01 1.036e+02 1.512e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-29 03:03:34,335 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567550 2023-11-29 03:04:05,687 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2450, loss[loss=0.07351, simple_loss=0.1013, pruned_loss=0.01423, audio_tagging_loss=0.008631, over 15518.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08985, pruned_loss=0.01202, audio_tagging_loss=0.008692, over 3048607.99 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:04:19,237 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.12 vs. limit=10.0 2023-11-29 03:04:35,623 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567600 2023-11-29 03:04:55,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3784093.3333333335, ans=0.0 2023-11-29 03:04:55,867 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=15.0 2023-11-29 03:05:08,349 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2500, loss[loss=0.05189, simple_loss=0.07331, pruned_loss=0.007395, audio_tagging_loss=0.007841, over 14316.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.08987, pruned_loss=0.01217, audio_tagging_loss=0.008739, over 3046718.06 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:05:30,129 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.483e+01 8.821e+01 9.659e+01 1.073e+02 1.403e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-29 03:05:31,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3784293.3333333335, ans=0.1 2023-11-29 03:05:31,901 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.91 vs. limit=22.5 2023-11-29 03:05:33,233 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.49 vs. limit=15.0 2023-11-29 03:05:37,170 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567650 2023-11-29 03:05:42,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3784293.3333333335, ans=0.0 2023-11-29 03:05:55,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3784426.6666666665, ans=0.125 2023-11-29 03:05:59,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3784426.6666666665, ans=0.0 2023-11-29 03:06:09,246 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2550, loss[loss=0.06258, simple_loss=0.08477, pruned_loss=0.01177, audio_tagging_loss=0.008432, over 16204.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.08942, pruned_loss=0.01205, audio_tagging_loss=0.008696, over 3049205.89 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:06:25,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3784560.0, ans=0.0 2023-11-29 03:06:40,443 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567700 2023-11-29 03:06:59,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3784760.0, ans=0.125 2023-11-29 03:07:04,523 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.78 vs. limit=22.5 2023-11-29 03:07:12,030 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2600, loss[loss=0.05668, simple_loss=0.07823, pruned_loss=0.009346, audio_tagging_loss=0.00822, over 14769.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08865, pruned_loss=0.01196, audio_tagging_loss=0.008562, over 3049848.60 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:07:12,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3784826.6666666665, ans=0.1 2023-11-29 03:07:34,997 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.646e+01 8.651e+01 9.416e+01 1.044e+02 1.400e+02, threshold=1.883e+02, percent-clipped=0.0 2023-11-29 03:07:42,215 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567750 2023-11-29 03:08:14,955 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2650, loss[loss=0.06523, simple_loss=0.09086, pruned_loss=0.01231, audio_tagging_loss=0.007487, over 16170.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08893, pruned_loss=0.012, audio_tagging_loss=0.008399, over 3051989.60 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:08:38,059 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.73 vs. limit=15.0 2023-11-29 03:08:40,520 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.41 vs. limit=15.0 2023-11-29 03:08:43,334 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567800 2023-11-29 03:08:44,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3785293.3333333335, ans=0.125 2023-11-29 03:09:01,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3785360.0, ans=0.125 2023-11-29 03:09:13,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3785426.6666666665, ans=0.0 2023-11-29 03:09:15,753 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2700, loss[loss=0.0688, simple_loss=0.1002, pruned_loss=0.01053, audio_tagging_loss=0.008167, over 16306.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08922, pruned_loss=0.0121, audio_tagging_loss=0.008288, over 3052144.96 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:09:21,524 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2023-11-29 03:09:38,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3785560.0, ans=0.125 2023-11-29 03:09:38,841 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 9.139e+01 9.804e+01 1.056e+02 1.449e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-29 03:09:44,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.96 vs. limit=15.0 2023-11-29 03:09:46,623 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567850 2023-11-29 03:09:52,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3785693.3333333335, ans=0.125 2023-11-29 03:09:52,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3785693.3333333335, ans=0.125 2023-11-29 03:10:17,677 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2750, loss[loss=0.07788, simple_loss=0.1077, pruned_loss=0.0146, audio_tagging_loss=0.009405, over 15849.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08946, pruned_loss=0.01214, audio_tagging_loss=0.008249, over 3051139.79 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:10:22,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.30 vs. limit=10.0 2023-11-29 03:10:47,579 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567900 2023-11-29 03:10:56,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3786026.6666666665, ans=0.125 2023-11-29 03:11:12,708 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:11:19,931 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2800, loss[loss=0.05516, simple_loss=0.07695, pruned_loss=0.00736, audio_tagging_loss=0.009325, over 14229.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08873, pruned_loss=0.01205, audio_tagging_loss=0.0084, over 3042283.97 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:11:30,176 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:11:41,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3786226.6666666665, ans=0.125 2023-11-29 03:11:43,652 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 9.033e+01 9.912e+01 1.050e+02 3.585e+02, threshold=1.982e+02, percent-clipped=1.0 2023-11-29 03:11:44,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3786293.3333333335, ans=0.125 2023-11-29 03:11:49,530 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 567950 2023-11-29 03:12:03,810 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=12.0 2023-11-29 03:12:12,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3786426.6666666665, ans=0.125 2023-11-29 03:12:16,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3786426.6666666665, ans=0.0 2023-11-29 03:12:21,955 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2850, loss[loss=0.04547, simple_loss=0.05902, pruned_loss=0.007545, audio_tagging_loss=0.008415, over 14235.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08876, pruned_loss=0.01207, audio_tagging_loss=0.008505, over 3038684.98 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:12:32,630 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=12.0 2023-11-29 03:12:51,694 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568000 2023-11-29 03:13:15,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3786760.0, ans=0.125 2023-11-29 03:13:25,910 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2900, loss[loss=0.07349, simple_loss=0.1026, pruned_loss=0.0148, audio_tagging_loss=0.007377, over 16281.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08838, pruned_loss=0.01199, audio_tagging_loss=0.008483, over 3050359.42 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:13:38,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3786893.3333333335, ans=0.2 2023-11-29 03:13:51,305 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.068e+01 9.075e+01 9.599e+01 1.049e+02 1.799e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 03:13:56,088 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568050 2023-11-29 03:14:11,424 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.52 vs. limit=15.0 2023-11-29 03:14:28,240 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 2950, loss[loss=0.06613, simple_loss=0.08863, pruned_loss=0.01256, audio_tagging_loss=0.009252, over 15180.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08898, pruned_loss=0.012, audio_tagging_loss=0.008516, over 3048037.02 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:14:33,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3787160.0, ans=0.125 2023-11-29 03:14:43,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3787226.6666666665, ans=0.2 2023-11-29 03:14:57,828 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568100 2023-11-29 03:15:07,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3787360.0, ans=0.0 2023-11-29 03:15:07,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3787360.0, ans=0.0 2023-11-29 03:15:08,836 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.76 vs. limit=15.0 2023-11-29 03:15:25,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3787426.6666666665, ans=0.125 2023-11-29 03:15:25,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3787426.6666666665, ans=0.1 2023-11-29 03:15:30,034 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3000, loss[loss=0.06519, simple_loss=0.09141, pruned_loss=0.01285, audio_tagging_loss=0.006639, over 14819.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09029, pruned_loss=0.01216, audio_tagging_loss=0.00844, over 3049456.39 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:15:30,035 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 03:16:11,314 INFO [train_asr.py:1267] (2/4) Epoch 48, validation: loss=0.05793, simple_loss=0.05039, pruned_loss=0.005256, audio_tagging_loss=0.02748, over 4681554.00 frames. 2023-11-29 03:16:11,314 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 03:16:16,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3787493.3333333335, ans=0.125 2023-11-29 03:16:17,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3787493.3333333335, ans=0.2 2023-11-29 03:16:35,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 9.350e+01 9.749e+01 1.060e+02 2.355e+02, threshold=1.950e+02, percent-clipped=1.0 2023-11-29 03:16:41,305 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568150 2023-11-29 03:16:59,765 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2023-11-29 03:17:13,320 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3050, loss[loss=0.07547, simple_loss=0.1037, pruned_loss=0.01535, audio_tagging_loss=0.008291, over 15182.00 frames. ], tot_loss[loss=0.06639, simple_loss=0.0913, pruned_loss=0.01223, audio_tagging_loss=0.008515, over 3051167.40 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:17:31,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3787893.3333333335, ans=0.0 2023-11-29 03:17:42,880 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568200 2023-11-29 03:17:48,432 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.95 vs. limit=15.0 2023-11-29 03:17:51,330 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:17:52,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3788026.6666666665, ans=0.2 2023-11-29 03:18:03,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3788093.3333333335, ans=0.1 2023-11-29 03:18:10,926 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2023-11-29 03:18:13,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3788093.3333333335, ans=0.125 2023-11-29 03:18:15,762 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3100, loss[loss=0.0871, simple_loss=0.1265, pruned_loss=0.01756, audio_tagging_loss=0.006284, over 15682.00 frames. ], tot_loss[loss=0.06641, simple_loss=0.09111, pruned_loss=0.01228, audio_tagging_loss=0.008575, over 3047311.71 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:18:16,377 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.83 vs. limit=15.0 2023-11-29 03:18:17,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3788160.0, ans=0.125 2023-11-29 03:18:34,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3788226.6666666665, ans=0.125 2023-11-29 03:18:39,839 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 8.957e+01 9.617e+01 1.028e+02 1.274e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-29 03:18:45,213 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568250 2023-11-29 03:18:52,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3788360.0, ans=0.2 2023-11-29 03:19:12,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3788426.6666666665, ans=0.125 2023-11-29 03:19:17,466 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3150, loss[loss=0.06295, simple_loss=0.08554, pruned_loss=0.01128, audio_tagging_loss=0.008909, over 15581.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.09136, pruned_loss=0.01232, audio_tagging_loss=0.008631, over 3049217.21 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 03:19:21,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3788493.3333333335, ans=0.2 2023-11-29 03:19:45,220 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:19:47,264 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568300 2023-11-29 03:19:50,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3788626.6666666665, ans=0.09899494936611666 2023-11-29 03:20:19,169 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3200, loss[loss=0.05437, simple_loss=0.07273, pruned_loss=0.007177, audio_tagging_loss=0.01082, over 15653.00 frames. ], tot_loss[loss=0.06606, simple_loss=0.09037, pruned_loss=0.01216, audio_tagging_loss=0.008722, over 3048193.82 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:20:21,533 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.60 vs. limit=22.5 2023-11-29 03:20:31,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3788893.3333333335, ans=0.125 2023-11-29 03:20:38,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3788893.3333333335, ans=0.0 2023-11-29 03:20:44,394 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 8.999e+01 9.702e+01 1.062e+02 1.415e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 03:20:49,401 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568350 2023-11-29 03:20:50,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3788960.0, ans=0.1 2023-11-29 03:20:50,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3788960.0, ans=0.1 2023-11-29 03:20:55,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3789026.6666666665, ans=0.125 2023-11-29 03:20:56,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3789026.6666666665, ans=0.125 2023-11-29 03:21:19,426 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2023-11-29 03:21:21,240 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3250, loss[loss=0.06844, simple_loss=0.106, pruned_loss=0.01018, audio_tagging_loss=0.005281, over 15270.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08979, pruned_loss=0.01208, audio_tagging_loss=0.008896, over 3052462.24 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:21:22,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3789160.0, ans=0.0 2023-11-29 03:21:40,804 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.45 vs. limit=10.0 2023-11-29 03:21:44,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3789226.6666666665, ans=15.0 2023-11-29 03:21:48,748 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.55 vs. limit=12.0 2023-11-29 03:21:51,378 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568400 2023-11-29 03:22:01,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3789360.0, ans=0.0 2023-11-29 03:22:12,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3789426.6666666665, ans=0.125 2023-11-29 03:22:13,681 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:22:22,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3789426.6666666665, ans=0.125 2023-11-29 03:22:24,190 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3300, loss[loss=0.07791, simple_loss=0.1072, pruned_loss=0.0165, audio_tagging_loss=0.007827, over 15082.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08938, pruned_loss=0.01204, audio_tagging_loss=0.008906, over 3052488.66 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:22:25,613 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:22:44,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3789560.0, ans=0.07 2023-11-29 03:22:46,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3789560.0, ans=0.2 2023-11-29 03:22:48,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.473e+01 9.011e+01 9.553e+01 1.025e+02 1.344e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-29 03:22:53,482 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568450 2023-11-29 03:22:58,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3789626.6666666665, ans=0.2 2023-11-29 03:23:13,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3789760.0, ans=0.125 2023-11-29 03:23:14,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3789760.0, ans=0.2 2023-11-29 03:23:25,024 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3350, loss[loss=0.0759, simple_loss=0.1085, pruned_loss=0.01409, audio_tagging_loss=0.007563, over 15147.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08941, pruned_loss=0.01204, audio_tagging_loss=0.00876, over 3053831.16 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:23:53,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3789960.0, ans=0.2 2023-11-29 03:23:55,269 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568500 2023-11-29 03:23:55,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3789960.0, ans=0.1 2023-11-29 03:24:26,068 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2023-11-29 03:24:26,776 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3400, loss[loss=0.06466, simple_loss=0.08683, pruned_loss=0.01239, audio_tagging_loss=0.008857, over 15766.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08914, pruned_loss=0.0119, audio_tagging_loss=0.008642, over 3053054.84 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:24:28,668 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.77 vs. limit=6.0 2023-11-29 03:24:50,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3790293.3333333335, ans=0.125 2023-11-29 03:24:51,483 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.195e+01 9.061e+01 9.646e+01 1.021e+02 1.209e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 03:24:56,232 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568550 2023-11-29 03:25:02,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3790293.3333333335, ans=0.125 2023-11-29 03:25:08,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3790360.0, ans=0.1 2023-11-29 03:25:20,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3790426.6666666665, ans=0.125 2023-11-29 03:25:22,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3790426.6666666665, ans=0.1 2023-11-29 03:25:28,269 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3450, loss[loss=0.05812, simple_loss=0.08326, pruned_loss=0.008548, audio_tagging_loss=0.00794, over 15350.00 frames. ], tot_loss[loss=0.0656, simple_loss=0.09012, pruned_loss=0.01204, audio_tagging_loss=0.008495, over 3047224.62 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:25:29,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3790493.3333333335, ans=0.0 2023-11-29 03:25:58,432 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568600 2023-11-29 03:26:01,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3790626.6666666665, ans=0.05 2023-11-29 03:26:11,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3790693.3333333335, ans=0.125 2023-11-29 03:26:24,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3790760.0, ans=0.1 2023-11-29 03:26:30,407 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3500, loss[loss=0.06559, simple_loss=0.08832, pruned_loss=0.01518, audio_tagging_loss=0.006256, over 14896.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.0898, pruned_loss=0.01208, audio_tagging_loss=0.008425, over 3049918.18 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:26:54,790 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 8.797e+01 9.462e+01 1.015e+02 1.238e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-29 03:27:00,068 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568650 2023-11-29 03:27:00,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3790960.0, ans=0.125 2023-11-29 03:27:04,669 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:27:32,437 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3550, loss[loss=0.05516, simple_loss=0.071, pruned_loss=0.009538, audio_tagging_loss=0.01013, over 14743.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08963, pruned_loss=0.01197, audio_tagging_loss=0.008401, over 3041614.35 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:27:34,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3791160.0, ans=0.125 2023-11-29 03:27:44,856 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=12.0 2023-11-29 03:28:01,853 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568700 2023-11-29 03:28:31,571 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.85 vs. limit=10.0 2023-11-29 03:28:31,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=12.0 2023-11-29 03:28:34,003 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3600, loss[loss=0.06827, simple_loss=0.0878, pruned_loss=0.01427, audio_tagging_loss=0.01009, over 14832.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08908, pruned_loss=0.01185, audio_tagging_loss=0.008472, over 3044299.74 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:28:37,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3791493.3333333335, ans=0.125 2023-11-29 03:28:59,346 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.220e+01 8.871e+01 9.571e+01 1.037e+02 1.255e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 03:29:04,023 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568750 2023-11-29 03:29:12,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3791693.3333333335, ans=0.0 2023-11-29 03:29:27,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3791760.0, ans=0.1 2023-11-29 03:29:31,801 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:29:35,709 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3650, loss[loss=0.07114, simple_loss=0.09533, pruned_loss=0.01138, audio_tagging_loss=0.01209, over 15585.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08911, pruned_loss=0.012, audio_tagging_loss=0.008452, over 3045245.75 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:29:41,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3791826.6666666665, ans=0.2 2023-11-29 03:30:04,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3791960.0, ans=0.125 2023-11-29 03:30:05,062 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.23 vs. limit=10.0 2023-11-29 03:30:05,453 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568800 2023-11-29 03:30:08,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3791960.0, ans=0.125 2023-11-29 03:30:10,580 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.84 vs. limit=15.0 2023-11-29 03:30:33,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3792093.3333333335, ans=0.2 2023-11-29 03:30:37,613 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3700, loss[loss=0.04408, simple_loss=0.05412, pruned_loss=0.008141, audio_tagging_loss=0.008876, over 15011.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08995, pruned_loss=0.01219, audio_tagging_loss=0.008406, over 3047926.80 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:30:59,157 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2023-11-29 03:31:03,475 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.315e+01 9.169e+01 9.957e+01 1.078e+02 1.355e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-29 03:31:07,242 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568850 2023-11-29 03:31:13,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=3792293.3333333335, ans=15.0 2023-11-29 03:31:16,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3792360.0, ans=0.0 2023-11-29 03:31:16,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3792360.0, ans=0.125 2023-11-29 03:31:38,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3792426.6666666665, ans=0.0 2023-11-29 03:31:40,467 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3750, loss[loss=0.04207, simple_loss=0.0486, pruned_loss=0.00765, audio_tagging_loss=0.01012, over 14556.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09045, pruned_loss=0.0122, audio_tagging_loss=0.008393, over 3055224.01 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:31:50,153 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2023-11-29 03:31:50,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3792493.3333333335, ans=0.1 2023-11-29 03:31:50,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3792493.3333333335, ans=0.2 2023-11-29 03:32:00,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3792560.0, ans=0.07 2023-11-29 03:32:02,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3792560.0, ans=0.0 2023-11-29 03:32:11,222 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568900 2023-11-29 03:32:18,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3792693.3333333335, ans=0.0 2023-11-29 03:32:19,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3792693.3333333335, ans=0.125 2023-11-29 03:32:26,192 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:32:31,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3792760.0, ans=0.0 2023-11-29 03:32:37,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3792760.0, ans=0.0 2023-11-29 03:32:37,934 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.22 vs. limit=10.0 2023-11-29 03:32:42,227 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3800, loss[loss=0.05062, simple_loss=0.07571, pruned_loss=0.005148, audio_tagging_loss=0.007619, over 14836.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09056, pruned_loss=0.01207, audio_tagging_loss=0.008486, over 3053256.55 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:32:42,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3792826.6666666665, ans=0.05 2023-11-29 03:32:42,788 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.20 vs. limit=22.5 2023-11-29 03:33:00,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3792893.3333333335, ans=0.2 2023-11-29 03:33:08,143 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 9.049e+01 9.763e+01 1.085e+02 1.488e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-29 03:33:11,957 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 568950 2023-11-29 03:33:15,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3792960.0, ans=0.125 2023-11-29 03:33:15,986 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.27 vs. limit=15.0 2023-11-29 03:33:29,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3793026.6666666665, ans=0.0 2023-11-29 03:33:42,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3793093.3333333335, ans=0.2 2023-11-29 03:33:44,623 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3850, loss[loss=0.05932, simple_loss=0.07204, pruned_loss=0.01345, audio_tagging_loss=0.009851, over 14032.00 frames. ], tot_loss[loss=0.06595, simple_loss=0.0906, pruned_loss=0.01207, audio_tagging_loss=0.008584, over 3048088.37 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:33:50,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3793160.0, ans=0.0 2023-11-29 03:33:53,493 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.40 vs. limit=15.0 2023-11-29 03:33:54,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3793160.0, ans=0.1 2023-11-29 03:34:03,678 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.64 vs. limit=22.5 2023-11-29 03:34:08,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3793293.3333333335, ans=0.0 2023-11-29 03:34:13,334 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569000 2023-11-29 03:34:29,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3793360.0, ans=0.125 2023-11-29 03:34:45,171 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3900, loss[loss=0.06546, simple_loss=0.08572, pruned_loss=0.01304, audio_tagging_loss=0.00956, over 13441.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08995, pruned_loss=0.01195, audio_tagging_loss=0.008618, over 3043503.26 frames. ], batch size: 50, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:35:10,783 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.633e+01 9.044e+01 9.626e+01 1.053e+02 1.477e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-29 03:35:15,661 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569050 2023-11-29 03:35:27,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3793693.3333333335, ans=0.0 2023-11-29 03:35:36,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3793760.0, ans=0.125 2023-11-29 03:35:42,680 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.49 vs. limit=15.0 2023-11-29 03:35:43,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3793760.0, ans=0.125 2023-11-29 03:35:46,717 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 3950, loss[loss=0.08059, simple_loss=0.1091, pruned_loss=0.01845, audio_tagging_loss=0.00761, over 15282.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09012, pruned_loss=0.01201, audio_tagging_loss=0.008643, over 3040110.81 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:35:49,502 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2023-11-29 03:36:11,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3793960.0, ans=0.0 2023-11-29 03:36:16,303 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569100 2023-11-29 03:36:32,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3794026.6666666665, ans=0.2 2023-11-29 03:36:40,858 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=22.5 2023-11-29 03:36:48,499 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4000, loss[loss=0.06992, simple_loss=0.09634, pruned_loss=0.01183, audio_tagging_loss=0.00992, over 14950.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.08989, pruned_loss=0.0119, audio_tagging_loss=0.008774, over 3034850.27 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:37:13,017 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.93 vs. limit=12.0 2023-11-29 03:37:14,524 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.940e+01 9.124e+01 9.854e+01 1.064e+02 1.398e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-29 03:37:18,307 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569150 2023-11-29 03:37:28,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3794360.0, ans=0.2 2023-11-29 03:37:46,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3794426.6666666665, ans=0.0 2023-11-29 03:37:49,704 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4050, loss[loss=0.07079, simple_loss=0.09603, pruned_loss=0.01315, audio_tagging_loss=0.009628, over 14309.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08925, pruned_loss=0.01191, audio_tagging_loss=0.008817, over 3033106.39 frames. ], batch size: 53, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:37:55,470 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:38:05,051 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.01 vs. limit=10.0 2023-11-29 03:38:12,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3794626.6666666665, ans=0.125 2023-11-29 03:38:13,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3794626.6666666665, ans=0.0 2023-11-29 03:38:19,651 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569200 2023-11-29 03:38:24,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3794626.6666666665, ans=0.2 2023-11-29 03:38:24,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3794626.6666666665, ans=0.125 2023-11-29 03:38:29,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3794693.3333333335, ans=10.0 2023-11-29 03:38:51,641 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4100, loss[loss=0.05638, simple_loss=0.08124, pruned_loss=0.007157, audio_tagging_loss=0.008602, over 14014.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.0895, pruned_loss=0.01185, audio_tagging_loss=0.008735, over 3035831.99 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:39:19,472 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.511e+01 9.014e+01 9.699e+01 1.029e+02 1.254e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 03:39:20,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3794960.0, ans=0.125 2023-11-29 03:39:21,903 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569250 2023-11-29 03:39:45,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3795093.3333333335, ans=0.0 2023-11-29 03:39:51,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3795093.3333333335, ans=0.05 2023-11-29 03:39:53,511 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4150, loss[loss=0.06058, simple_loss=0.08599, pruned_loss=0.00966, audio_tagging_loss=0.007928, over 14849.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08953, pruned_loss=0.01193, audio_tagging_loss=0.00855, over 3028730.15 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:40:08,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3795226.6666666665, ans=0.125 2023-11-29 03:40:15,731 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=12.0 2023-11-29 03:40:17,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3795293.3333333335, ans=0.125 2023-11-29 03:40:22,862 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569300 2023-11-29 03:40:41,474 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:40:54,927 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4200, loss[loss=0.06677, simple_loss=0.08915, pruned_loss=0.01221, audio_tagging_loss=0.009977, over 15299.00 frames. ], tot_loss[loss=0.06498, simple_loss=0.08935, pruned_loss=0.01187, audio_tagging_loss=0.008434, over 3030031.05 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:40:55,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3795493.3333333335, ans=0.1 2023-11-29 03:40:56,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3795493.3333333335, ans=0.0 2023-11-29 03:41:02,579 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=22.5 2023-11-29 03:41:03,724 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=15.0 2023-11-29 03:41:17,631 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.12 vs. limit=12.0 2023-11-29 03:41:19,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3795626.6666666665, ans=0.125 2023-11-29 03:41:19,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3795626.6666666665, ans=0.125 2023-11-29 03:41:21,622 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.462e+01 9.132e+01 9.847e+01 1.051e+02 1.276e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-29 03:41:23,945 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569350 2023-11-29 03:41:26,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-11-29 03:41:29,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3795626.6666666665, ans=0.2 2023-11-29 03:41:29,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3795626.6666666665, ans=0.0 2023-11-29 03:41:32,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3795693.3333333335, ans=0.2 2023-11-29 03:41:36,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3795693.3333333335, ans=0.125 2023-11-29 03:41:40,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3795693.3333333335, ans=0.125 2023-11-29 03:41:40,793 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.02 vs. limit=15.0 2023-11-29 03:41:45,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3795760.0, ans=0.125 2023-11-29 03:41:50,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3795760.0, ans=0.125 2023-11-29 03:41:51,257 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.52 vs. limit=6.0 2023-11-29 03:41:56,092 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4250, loss[loss=0.06148, simple_loss=0.08717, pruned_loss=0.008543, audio_tagging_loss=0.009349, over 15250.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08847, pruned_loss=0.01169, audio_tagging_loss=0.008399, over 3026600.38 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:41:59,132 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.53 vs. limit=15.0 2023-11-29 03:42:24,981 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569400 2023-11-29 03:42:44,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3796093.3333333335, ans=0.0 2023-11-29 03:42:57,152 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4300, loss[loss=0.07348, simple_loss=0.1093, pruned_loss=0.01211, audio_tagging_loss=0.006732, over 15012.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08948, pruned_loss=0.01173, audio_tagging_loss=0.008313, over 3036102.53 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:42:57,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3796160.0, ans=0.0 2023-11-29 03:43:11,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3796226.6666666665, ans=0.125 2023-11-29 03:43:13,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3796226.6666666665, ans=0.1 2023-11-29 03:43:21,402 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5 2023-11-29 03:43:24,067 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 9.187e+01 9.912e+01 1.060e+02 1.366e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-29 03:43:27,210 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569450 2023-11-29 03:43:28,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3796293.3333333335, ans=0.125 2023-11-29 03:43:30,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3796293.3333333335, ans=0.125 2023-11-29 03:43:49,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3796426.6666666665, ans=0.0 2023-11-29 03:43:58,128 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4350, loss[loss=0.05455, simple_loss=0.06498, pruned_loss=0.009553, audio_tagging_loss=0.0125, over 16193.00 frames. ], tot_loss[loss=0.06422, simple_loss=0.08871, pruned_loss=0.01151, audio_tagging_loss=0.008351, over 3040225.57 frames. ], batch size: 64, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:44:27,828 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569500 2023-11-29 03:44:31,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3796626.6666666665, ans=0.025 2023-11-29 03:44:33,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3796626.6666666665, ans=0.1 2023-11-29 03:44:36,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3796693.3333333335, ans=0.07 2023-11-29 03:44:49,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3796760.0, ans=0.125 2023-11-29 03:45:00,052 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4400, loss[loss=0.06889, simple_loss=0.09913, pruned_loss=0.01193, audio_tagging_loss=0.007397, over 15638.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08928, pruned_loss=0.01171, audio_tagging_loss=0.008392, over 3040013.60 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:45:01,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3796826.6666666665, ans=0.05 2023-11-29 03:45:03,942 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:45:09,986 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.51 vs. limit=15.0 2023-11-29 03:45:26,457 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.800e+01 8.869e+01 9.573e+01 1.013e+02 1.408e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 03:45:28,853 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569550 2023-11-29 03:45:38,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3797026.6666666665, ans=0.1 2023-11-29 03:45:41,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3797026.6666666665, ans=0.1 2023-11-29 03:45:42,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3797026.6666666665, ans=0.2 2023-11-29 03:45:48,268 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.03 vs. limit=15.0 2023-11-29 03:46:00,727 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4450, loss[loss=0.101, simple_loss=0.1488, pruned_loss=0.0194, audio_tagging_loss=0.007198, over 17276.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.09036, pruned_loss=0.0119, audio_tagging_loss=0.008262, over 3045318.45 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:46:09,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3797160.0, ans=0.125 2023-11-29 03:46:30,074 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569600 2023-11-29 03:46:30,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3797293.3333333335, ans=0.125 2023-11-29 03:47:02,117 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4500, loss[loss=0.06445, simple_loss=0.09281, pruned_loss=0.008955, audio_tagging_loss=0.009088, over 14803.00 frames. ], tot_loss[loss=0.06562, simple_loss=0.09077, pruned_loss=0.01199, audio_tagging_loss=0.008242, over 3047464.96 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:47:14,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3797560.0, ans=0.0 2023-11-29 03:47:18,485 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-29 03:47:26,327 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.68 vs. limit=15.0 2023-11-29 03:47:29,049 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.852e+01 8.934e+01 9.508e+01 1.012e+02 1.257e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-29 03:47:31,539 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569650 2023-11-29 03:47:34,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3797626.6666666665, ans=0.1 2023-11-29 03:47:48,099 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2023-11-29 03:48:02,600 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4550, loss[loss=0.06531, simple_loss=0.09405, pruned_loss=0.009631, audio_tagging_loss=0.008657, over 13913.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.09108, pruned_loss=0.01196, audio_tagging_loss=0.008252, over 3043546.91 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:48:20,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3797893.3333333335, ans=0.125 2023-11-29 03:48:32,897 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569700 2023-11-29 03:48:39,405 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=12.0 2023-11-29 03:48:42,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3798026.6666666665, ans=0.125 2023-11-29 03:48:52,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3798093.3333333335, ans=0.0 2023-11-29 03:48:53,578 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 03:48:57,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3798093.3333333335, ans=0.1 2023-11-29 03:49:04,176 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4600, loss[loss=0.06341, simple_loss=0.07789, pruned_loss=0.01403, audio_tagging_loss=0.01043, over 14102.00 frames. ], tot_loss[loss=0.06597, simple_loss=0.09115, pruned_loss=0.01209, audio_tagging_loss=0.008304, over 3040731.39 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:49:04,854 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2023-11-29 03:49:12,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3798160.0, ans=0.0 2023-11-29 03:49:16,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3798226.6666666665, ans=0.125 2023-11-29 03:49:27,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3798293.3333333335, ans=0.0 2023-11-29 03:49:30,968 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.879e+01 8.831e+01 9.354e+01 1.006e+02 1.240e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-29 03:49:33,403 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569750 2023-11-29 03:49:37,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3798293.3333333335, ans=0.125 2023-11-29 03:49:38,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3798293.3333333335, ans=0.1 2023-11-29 03:50:05,619 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4650, loss[loss=0.07311, simple_loss=0.105, pruned_loss=0.01485, audio_tagging_loss=0.005743, over 15057.00 frames. ], tot_loss[loss=0.06627, simple_loss=0.0917, pruned_loss=0.01207, audio_tagging_loss=0.008347, over 3050162.36 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:50:34,909 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569800 2023-11-29 03:50:36,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3798626.6666666665, ans=0.09899494936611666 2023-11-29 03:50:37,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3798626.6666666665, ans=0.125 2023-11-29 03:50:53,810 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=12.0 2023-11-29 03:50:57,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3798760.0, ans=0.1 2023-11-29 03:51:06,107 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4700, loss[loss=0.05611, simple_loss=0.07743, pruned_loss=0.009538, audio_tagging_loss=0.00786, over 14891.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.092, pruned_loss=0.01219, audio_tagging_loss=0.008515, over 3049947.07 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:51:06,860 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2023-11-29 03:51:11,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3798826.6666666665, ans=0.0 2023-11-29 03:51:30,966 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.93 vs. limit=22.5 2023-11-29 03:51:31,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3798960.0, ans=0.125 2023-11-29 03:51:33,849 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.280e+01 9.105e+01 9.941e+01 1.052e+02 1.267e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-29 03:51:34,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.00 vs. limit=22.5 2023-11-29 03:51:36,180 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569850 2023-11-29 03:51:54,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3799093.3333333335, ans=0.0 2023-11-29 03:52:00,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3799093.3333333335, ans=0.125 2023-11-29 03:52:08,218 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4750, loss[loss=0.0708, simple_loss=0.09757, pruned_loss=0.0133, audio_tagging_loss=0.008709, over 15866.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09051, pruned_loss=0.01193, audio_tagging_loss=0.008652, over 3049177.56 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:52:27,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3799226.6666666665, ans=0.0 2023-11-29 03:52:29,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3799226.6666666665, ans=0.0 2023-11-29 03:52:36,652 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569900 2023-11-29 03:52:41,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3799293.3333333335, ans=0.125 2023-11-29 03:52:49,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3799360.0, ans=0.0 2023-11-29 03:53:07,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3799426.6666666665, ans=0.125 2023-11-29 03:53:08,321 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2023-11-29 03:53:09,909 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4800, loss[loss=0.05545, simple_loss=0.06665, pruned_loss=0.01257, audio_tagging_loss=0.009553, over 15083.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08981, pruned_loss=0.01186, audio_tagging_loss=0.00877, over 3055204.57 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 03:53:15,475 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2023-11-29 03:53:37,268 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 8.976e+01 9.656e+01 1.035e+02 1.213e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 03:53:38,618 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 569950 2023-11-29 03:53:42,271 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.62 vs. limit=15.0 2023-11-29 03:53:42,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3799626.6666666665, ans=0.0 2023-11-29 03:54:09,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3799760.0, ans=0.0 2023-11-29 03:54:11,294 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4850, loss[loss=0.04828, simple_loss=0.05833, pruned_loss=0.008264, audio_tagging_loss=0.01085, over 15508.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08938, pruned_loss=0.01179, audio_tagging_loss=0.008879, over 3057979.32 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:54:27,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3799893.3333333335, ans=0.1 2023-11-29 03:54:42,022 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570000 2023-11-29 03:55:13,230 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4900, loss[loss=0.06441, simple_loss=0.09022, pruned_loss=0.01244, audio_tagging_loss=0.006861, over 14872.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08867, pruned_loss=0.01165, audio_tagging_loss=0.008905, over 3047018.51 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:55:15,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3800160.0, ans=0.2 2023-11-29 03:55:43,184 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 8.986e+01 9.622e+01 1.028e+02 1.398e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 03:55:44,499 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570050 2023-11-29 03:55:55,916 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 03:55:57,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3800360.0, ans=15.0 2023-11-29 03:56:06,171 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-29 03:56:09,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3800426.6666666665, ans=0.2 2023-11-29 03:56:14,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3800426.6666666665, ans=0.125 2023-11-29 03:56:18,009 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 4950, loss[loss=0.07183, simple_loss=0.1017, pruned_loss=0.0129, audio_tagging_loss=0.008065, over 15837.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08905, pruned_loss=0.01174, audio_tagging_loss=0.008759, over 3046696.95 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:56:22,091 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2023-11-29 03:56:33,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3800560.0, ans=0.05 2023-11-29 03:56:41,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3800626.6666666665, ans=0.0 2023-11-29 03:56:47,109 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570100 2023-11-29 03:56:47,534 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2023-11-29 03:57:03,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3800693.3333333335, ans=0.025 2023-11-29 03:57:19,422 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5000, loss[loss=0.05607, simple_loss=0.07664, pruned_loss=0.0113, audio_tagging_loss=0.006447, over 14675.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08833, pruned_loss=0.01166, audio_tagging_loss=0.008636, over 3046108.95 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:57:21,333 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2023-11-29 03:57:22,359 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2023-11-29 03:57:27,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3800826.6666666665, ans=0.125 2023-11-29 03:57:34,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.08 vs. limit=10.0 2023-11-29 03:57:47,634 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-11-29 03:57:48,752 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.151e+01 8.990e+01 9.495e+01 1.030e+02 1.330e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-29 03:57:50,010 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570150 2023-11-29 03:57:54,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3800960.0, ans=0.125 2023-11-29 03:57:56,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2023-11-29 03:57:58,981 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2023-11-29 03:58:06,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3801026.6666666665, ans=0.2 2023-11-29 03:58:06,916 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.95 vs. limit=22.5 2023-11-29 03:58:21,204 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5050, loss[loss=0.04912, simple_loss=0.06966, pruned_loss=0.006439, audio_tagging_loss=0.007857, over 15348.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08897, pruned_loss=0.01196, audio_tagging_loss=0.008524, over 3042222.56 frames. ], batch size: 61, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:58:26,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3801160.0, ans=0.125 2023-11-29 03:58:33,920 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-11-29 03:58:35,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3801226.6666666665, ans=0.125 2023-11-29 03:58:39,153 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.02 vs. limit=10.0 2023-11-29 03:58:46,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3801293.3333333335, ans=0.125 2023-11-29 03:58:50,468 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570200 2023-11-29 03:58:56,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3801360.0, ans=0.2 2023-11-29 03:58:56,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3801360.0, ans=0.2 2023-11-29 03:59:02,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3801360.0, ans=0.125 2023-11-29 03:59:09,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3801426.6666666665, ans=0.0 2023-11-29 03:59:16,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3801426.6666666665, ans=0.1 2023-11-29 03:59:22,891 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5100, loss[loss=0.06825, simple_loss=0.09314, pruned_loss=0.0111, audio_tagging_loss=0.01057, over 16826.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08864, pruned_loss=0.01171, audio_tagging_loss=0.008477, over 3040902.47 frames. ], batch size: 63, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 03:59:39,819 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.04 vs. limit=10.0 2023-11-29 03:59:40,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3801560.0, ans=0.125 2023-11-29 03:59:41,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3801560.0, ans=0.0 2023-11-29 03:59:50,404 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.137e+01 9.192e+01 9.667e+01 1.067e+02 2.138e+02, threshold=1.933e+02, percent-clipped=1.0 2023-11-29 03:59:51,716 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570250 2023-11-29 04:00:19,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3801760.0, ans=0.0 2023-11-29 04:00:24,135 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5150, loss[loss=0.04734, simple_loss=0.05806, pruned_loss=0.007823, audio_tagging_loss=0.01048, over 14859.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08858, pruned_loss=0.01181, audio_tagging_loss=0.008459, over 3041682.27 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:00:26,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3801826.6666666665, ans=0.1 2023-11-29 04:00:29,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3801826.6666666665, ans=0.1 2023-11-29 04:00:33,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3801826.6666666665, ans=0.0 2023-11-29 04:00:36,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3801893.3333333335, ans=0.125 2023-11-29 04:00:42,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3801893.3333333335, ans=0.125 2023-11-29 04:00:53,269 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570300 2023-11-29 04:01:15,910 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.74 vs. limit=15.0 2023-11-29 04:01:25,318 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5200, loss[loss=0.0544, simple_loss=0.0715, pruned_loss=0.009163, audio_tagging_loss=0.00949, over 15337.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08921, pruned_loss=0.01185, audio_tagging_loss=0.008384, over 3042229.95 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:01:54,014 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.656e+01 9.016e+01 9.699e+01 1.038e+02 1.418e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 04:01:55,293 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570350 2023-11-29 04:02:26,884 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5250, loss[loss=0.04698, simple_loss=0.06348, pruned_loss=0.004612, audio_tagging_loss=0.01063, over 14756.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08946, pruned_loss=0.01189, audio_tagging_loss=0.008291, over 3041003.18 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:02:34,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3802493.3333333335, ans=0.1 2023-11-29 04:02:38,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3802560.0, ans=0.125 2023-11-29 04:02:39,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3802560.0, ans=0.2 2023-11-29 04:02:43,907 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.72 vs. limit=15.0 2023-11-29 04:02:53,099 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2023-11-29 04:02:56,003 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570400 2023-11-29 04:02:56,765 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.73 vs. limit=22.5 2023-11-29 04:02:59,712 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2023-11-29 04:03:21,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3802760.0, ans=0.125 2023-11-29 04:03:28,960 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5300, loss[loss=0.06744, simple_loss=0.09814, pruned_loss=0.01222, audio_tagging_loss=0.006157, over 15638.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08979, pruned_loss=0.01192, audio_tagging_loss=0.008164, over 3041231.80 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:03:30,855 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.22 vs. limit=15.0 2023-11-29 04:03:37,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3802826.6666666665, ans=0.0 2023-11-29 04:03:45,340 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.92 vs. limit=15.0 2023-11-29 04:03:52,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3802960.0, ans=0.1 2023-11-29 04:03:57,731 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.209e+01 9.143e+01 9.635e+01 1.038e+02 1.334e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-29 04:03:57,829 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570450 2023-11-29 04:03:59,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3802960.0, ans=0.2 2023-11-29 04:04:21,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3803093.3333333335, ans=0.2 2023-11-29 04:04:21,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3803093.3333333335, ans=0.125 2023-11-29 04:04:29,652 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5350, loss[loss=0.05861, simple_loss=0.07342, pruned_loss=0.01064, audio_tagging_loss=0.01126, over 15505.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08925, pruned_loss=0.01189, audio_tagging_loss=0.008273, over 3040684.57 frames. ], batch size: 58, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:04:32,075 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=12.0 2023-11-29 04:05:00,559 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570500 2023-11-29 04:05:05,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3803293.3333333335, ans=0.2 2023-11-29 04:05:17,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3803360.0, ans=0.0 2023-11-29 04:05:31,597 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5400, loss[loss=0.0672, simple_loss=0.09296, pruned_loss=0.011, audio_tagging_loss=0.009724, over 14860.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.09032, pruned_loss=0.01213, audio_tagging_loss=0.0083, over 3039093.02 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:05:34,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3803493.3333333335, ans=0.05 2023-11-29 04:05:52,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3803560.0, ans=0.0 2023-11-29 04:06:00,715 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=12.0 2023-11-29 04:06:01,162 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 9.108e+01 9.705e+01 1.034e+02 1.334e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-29 04:06:01,297 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570550 2023-11-29 04:06:17,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3803693.3333333335, ans=0.125 2023-11-29 04:06:33,409 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5450, loss[loss=0.05806, simple_loss=0.07794, pruned_loss=0.01321, audio_tagging_loss=0.005883, over 13668.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09066, pruned_loss=0.01222, audio_tagging_loss=0.008327, over 3041024.90 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:07:03,392 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570600 2023-11-29 04:07:09,518 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.10 vs. limit=22.5 2023-11-29 04:07:11,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3804026.6666666665, ans=0.0 2023-11-29 04:07:35,612 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5500, loss[loss=0.06704, simple_loss=0.08579, pruned_loss=0.01266, audio_tagging_loss=0.01149, over 15801.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09028, pruned_loss=0.0122, audio_tagging_loss=0.008416, over 3042283.92 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:07:40,481 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.98 vs. limit=15.0 2023-11-29 04:08:05,637 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570650 2023-11-29 04:08:06,696 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.860e+01 9.091e+01 9.676e+01 1.052e+02 2.081e+02, threshold=1.935e+02, percent-clipped=1.0 2023-11-29 04:08:09,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3804293.3333333335, ans=0.125 2023-11-29 04:08:37,436 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5550, loss[loss=0.06907, simple_loss=0.08812, pruned_loss=0.01618, audio_tagging_loss=0.008831, over 16095.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.09, pruned_loss=0.01206, audio_tagging_loss=0.008505, over 3040637.99 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 8.0 2023-11-29 04:08:38,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3804493.3333333335, ans=0.0 2023-11-29 04:09:06,998 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570700 2023-11-29 04:09:15,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3804693.3333333335, ans=0.125 2023-11-29 04:09:17,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3804693.3333333335, ans=0.2 2023-11-29 04:09:24,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3804693.3333333335, ans=0.125 2023-11-29 04:09:39,315 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5600, loss[loss=0.08061, simple_loss=0.1079, pruned_loss=0.01923, audio_tagging_loss=0.007404, over 14995.00 frames. ], tot_loss[loss=0.06578, simple_loss=0.09035, pruned_loss=0.012, audio_tagging_loss=0.008609, over 3042106.03 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:09:46,806 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2023-11-29 04:09:48,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3804826.6666666665, ans=0.125 2023-11-29 04:09:53,494 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.59 vs. limit=10.0 2023-11-29 04:10:08,859 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570750 2023-11-29 04:10:09,943 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 9.028e+01 9.748e+01 1.040e+02 1.265e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 04:10:26,121 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 04:10:27,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3805093.3333333335, ans=0.2 2023-11-29 04:10:31,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3805093.3333333335, ans=0.0 2023-11-29 04:10:40,927 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5650, loss[loss=0.08035, simple_loss=0.1116, pruned_loss=0.01856, audio_tagging_loss=0.005982, over 14667.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08973, pruned_loss=0.01187, audio_tagging_loss=0.008773, over 3047784.01 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:10:42,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3805160.0, ans=0.0 2023-11-29 04:10:42,609 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.92 vs. limit=15.0 2023-11-29 04:10:47,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3805160.0, ans=0.125 2023-11-29 04:10:48,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3805160.0, ans=0.2 2023-11-29 04:10:50,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3805160.0, ans=0.0 2023-11-29 04:10:51,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3805160.0, ans=0.125 2023-11-29 04:11:03,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3805226.6666666665, ans=0.125 2023-11-29 04:11:10,748 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570800 2023-11-29 04:11:19,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3805360.0, ans=0.125 2023-11-29 04:11:28,442 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.53 vs. limit=15.0 2023-11-29 04:11:42,453 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5700, loss[loss=0.07038, simple_loss=0.1024, pruned_loss=0.01123, audio_tagging_loss=0.007952, over 16158.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.08989, pruned_loss=0.01193, audio_tagging_loss=0.008793, over 3045799.65 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:11:50,955 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2023-11-29 04:12:11,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2023-11-29 04:12:11,931 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570850 2023-11-29 04:12:13,074 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 9.102e+01 9.721e+01 1.096e+02 1.374e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-29 04:12:14,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3805626.6666666665, ans=0.0 2023-11-29 04:12:36,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.18 vs. limit=6.0 2023-11-29 04:12:44,431 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5750, loss[loss=0.0737, simple_loss=0.1004, pruned_loss=0.01514, audio_tagging_loss=0.008345, over 15238.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08959, pruned_loss=0.01204, audio_tagging_loss=0.008663, over 3054208.09 frames. ], batch size: 55, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:12:49,458 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:12:52,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3805826.6666666665, ans=0.0 2023-11-29 04:12:59,296 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.61 vs. limit=5.0 2023-11-29 04:13:03,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3805893.3333333335, ans=0.125 2023-11-29 04:13:06,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3805893.3333333335, ans=0.05 2023-11-29 04:13:13,067 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570900 2023-11-29 04:13:44,343 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5800, loss[loss=0.06388, simple_loss=0.09008, pruned_loss=0.01239, audio_tagging_loss=0.006454, over 14386.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08946, pruned_loss=0.01198, audio_tagging_loss=0.008591, over 3046093.75 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:13:58,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3806226.6666666665, ans=0.1 2023-11-29 04:14:01,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3806226.6666666665, ans=0.1 2023-11-29 04:14:02,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2023-11-29 04:14:03,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3806226.6666666665, ans=0.0 2023-11-29 04:14:12,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3806293.3333333335, ans=0.0 2023-11-29 04:14:14,914 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 570950 2023-11-29 04:14:15,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 8.950e+01 9.520e+01 1.017e+02 1.550e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-29 04:14:28,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3806360.0, ans=0.2 2023-11-29 04:14:35,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3806426.6666666665, ans=0.1 2023-11-29 04:14:36,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3806426.6666666665, ans=0.0 2023-11-29 04:14:46,541 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5850, loss[loss=0.07649, simple_loss=0.09246, pruned_loss=0.02024, audio_tagging_loss=0.01002, over 14722.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08872, pruned_loss=0.01194, audio_tagging_loss=0.008495, over 3044706.65 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:14:48,387 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2023-11-29 04:14:49,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3806493.3333333335, ans=0.0 2023-11-29 04:14:49,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3806493.3333333335, ans=0.125 2023-11-29 04:14:56,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3806493.3333333335, ans=0.1 2023-11-29 04:15:10,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3806626.6666666665, ans=0.1 2023-11-29 04:15:15,843 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571000 2023-11-29 04:15:18,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3806626.6666666665, ans=0.2 2023-11-29 04:15:36,719 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2023-11-29 04:15:49,139 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5900, loss[loss=0.06971, simple_loss=0.1023, pruned_loss=0.01172, audio_tagging_loss=0.006848, over 14453.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08855, pruned_loss=0.01192, audio_tagging_loss=0.008439, over 3041540.10 frames. ], batch size: 54, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:16:10,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3806893.3333333335, ans=0.0 2023-11-29 04:16:17,703 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571050 2023-11-29 04:16:18,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.746e+01 9.359e+01 9.876e+01 1.067e+02 1.252e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-29 04:16:29,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3807026.6666666665, ans=0.0 2023-11-29 04:16:39,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3807093.3333333335, ans=0.1 2023-11-29 04:16:40,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3807093.3333333335, ans=0.2 2023-11-29 04:16:50,126 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 5950, loss[loss=0.0455, simple_loss=0.05851, pruned_loss=0.009229, audio_tagging_loss=0.007014, over 15504.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08946, pruned_loss=0.01199, audio_tagging_loss=0.008362, over 3048224.76 frames. ], batch size: 62, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:16:53,009 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.49 vs. limit=22.5 2023-11-29 04:17:19,928 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571100 2023-11-29 04:17:33,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3807360.0, ans=0.2 2023-11-29 04:17:51,337 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6000, loss[loss=0.07864, simple_loss=0.1167, pruned_loss=0.01449, audio_tagging_loss=0.005808, over 16018.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08942, pruned_loss=0.01187, audio_tagging_loss=0.008324, over 3046970.26 frames. ], batch size: 59, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:17:51,338 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 04:18:14,710 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3369, 4.3450, 4.5037, 4.4461], device='cuda:2') 2023-11-29 04:18:25,368 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8074, 5.8506, 5.8984, 5.8578], device='cuda:2') 2023-11-29 04:18:31,362 INFO [train_asr.py:1267] (2/4) Epoch 48, validation: loss=0.05827, simple_loss=0.05042, pruned_loss=0.005313, audio_tagging_loss=0.02774, over 4681554.00 frames. 2023-11-29 04:18:31,363 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 04:18:36,689 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2023-11-29 04:18:45,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3807560.0, ans=0.125 2023-11-29 04:18:55,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3807626.6666666665, ans=0.125 2023-11-29 04:19:00,342 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571150 2023-11-29 04:19:01,362 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 8.999e+01 9.693e+01 1.031e+02 2.165e+02, threshold=1.939e+02, percent-clipped=1.0 2023-11-29 04:19:05,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3807626.6666666665, ans=0.125 2023-11-29 04:19:10,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3807693.3333333335, ans=0.125 2023-11-29 04:19:19,603 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 04:19:32,552 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6050, loss[loss=0.06614, simple_loss=0.09341, pruned_loss=0.008133, audio_tagging_loss=0.0113, over 14794.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08854, pruned_loss=0.01185, audio_tagging_loss=0.008469, over 3048171.76 frames. ], batch size: 57, lr: 1.41e-03, grad_scale: 32.0 2023-11-29 04:19:32,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3807826.6666666665, ans=0.1 2023-11-29 04:19:38,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3807826.6666666665, ans=0.035 2023-11-29 04:19:41,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3807826.6666666665, ans=0.2 2023-11-29 04:19:56,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3807960.0, ans=0.125 2023-11-29 04:20:00,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3807960.0, ans=0.125 2023-11-29 04:20:02,428 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571200 2023-11-29 04:20:11,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3808026.6666666665, ans=0.125 2023-11-29 04:20:21,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3808093.3333333335, ans=0.125 2023-11-29 04:20:34,994 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6100, loss[loss=0.06687, simple_loss=0.09935, pruned_loss=0.0113, audio_tagging_loss=0.005889, over 15022.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08851, pruned_loss=0.01178, audio_tagging_loss=0.00843, over 3059902.88 frames. ], batch size: 56, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:20:47,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3808226.6666666665, ans=0.125 2023-11-29 04:20:48,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3808226.6666666665, ans=0.125 2023-11-29 04:20:49,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3808226.6666666665, ans=0.125 2023-11-29 04:21:05,499 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571250 2023-11-29 04:21:07,705 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.756e+01 8.969e+01 9.609e+01 1.049e+02 1.338e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-29 04:21:25,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3808426.6666666665, ans=0.125 2023-11-29 04:21:37,891 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6150, loss[loss=0.06403, simple_loss=0.08827, pruned_loss=0.01163, audio_tagging_loss=0.008268, over 16042.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08817, pruned_loss=0.0117, audio_tagging_loss=0.008526, over 3056587.00 frames. ], batch size: 60, lr: 1.41e-03, grad_scale: 16.0 2023-11-29 04:22:07,177 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571300 2023-11-29 04:22:16,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3808693.3333333335, ans=0.125 2023-11-29 04:22:18,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3808693.3333333335, ans=0.125 2023-11-29 04:22:31,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3808760.0, ans=0.0 2023-11-29 04:22:38,911 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6200, loss[loss=0.06086, simple_loss=0.08541, pruned_loss=0.009136, audio_tagging_loss=0.009022, over 15884.00 frames. ], tot_loss[loss=0.06367, simple_loss=0.08719, pruned_loss=0.0115, audio_tagging_loss=0.008575, over 3049988.68 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:22:40,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3808826.6666666665, ans=0.5 2023-11-29 04:23:02,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3808960.0, ans=0.0 2023-11-29 04:23:08,408 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571350 2023-11-29 04:23:10,641 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 8.947e+01 9.565e+01 1.046e+02 1.413e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 04:23:30,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3809093.3333333335, ans=0.125 2023-11-29 04:23:40,238 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6250, loss[loss=0.05453, simple_loss=0.0819, pruned_loss=0.00591, audio_tagging_loss=0.007676, over 15397.00 frames. ], tot_loss[loss=0.06351, simple_loss=0.08691, pruned_loss=0.01136, audio_tagging_loss=0.008691, over 3053475.15 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:24:02,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3809226.6666666665, ans=0.125 2023-11-29 04:24:03,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3809226.6666666665, ans=0.125 2023-11-29 04:24:10,215 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571400 2023-11-29 04:24:41,931 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6300, loss[loss=0.05784, simple_loss=0.08097, pruned_loss=0.008699, audio_tagging_loss=0.008656, over 15528.00 frames. ], tot_loss[loss=0.06396, simple_loss=0.08736, pruned_loss=0.01153, audio_tagging_loss=0.008754, over 3048257.12 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:24:53,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3809560.0, ans=0.0 2023-11-29 04:25:11,500 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571450 2023-11-29 04:25:13,815 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 9.159e+01 9.734e+01 1.043e+02 1.366e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-29 04:25:23,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3809693.3333333335, ans=0.125 2023-11-29 04:25:42,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3809826.6666666665, ans=0.125 2023-11-29 04:25:43,846 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6350, loss[loss=0.07086, simple_loss=0.09541, pruned_loss=0.01303, audio_tagging_loss=0.01012, over 16088.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08833, pruned_loss=0.01171, audio_tagging_loss=0.008807, over 3049120.07 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:25:45,544 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2023-11-29 04:25:49,288 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=12.0 2023-11-29 04:25:52,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2023-11-29 04:26:12,667 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571500 2023-11-29 04:26:29,124 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.81 vs. limit=15.0 2023-11-29 04:26:38,547 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-29 04:26:42,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3810093.3333333335, ans=0.125 2023-11-29 04:26:44,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3810160.0, ans=0.2 2023-11-29 04:26:45,492 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6400, loss[loss=0.07089, simple_loss=0.09253, pruned_loss=0.01604, audio_tagging_loss=0.008587, over 14281.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.0877, pruned_loss=0.01159, audio_tagging_loss=0.008991, over 3043551.42 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:26:45,867 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:26:47,260 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=15.0 2023-11-29 04:26:48,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3810160.0, ans=0.1 2023-11-29 04:26:49,731 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.36 vs. limit=15.0 2023-11-29 04:26:50,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3810160.0, ans=0.1 2023-11-29 04:27:15,293 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571550 2023-11-29 04:27:17,527 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 8.796e+01 9.535e+01 1.038e+02 1.501e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-29 04:27:29,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3810360.0, ans=0.0 2023-11-29 04:27:37,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3810426.6666666665, ans=0.0 2023-11-29 04:27:40,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=12.0 2023-11-29 04:27:43,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.81 vs. limit=12.0 2023-11-29 04:27:46,653 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6450, loss[loss=0.07644, simple_loss=0.1052, pruned_loss=0.01528, audio_tagging_loss=0.008553, over 15061.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08819, pruned_loss=0.01181, audio_tagging_loss=0.008981, over 3038490.40 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:27:52,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3810493.3333333335, ans=0.0 2023-11-29 04:27:55,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3810493.3333333335, ans=0.04949747468305833 2023-11-29 04:28:00,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3810560.0, ans=0.5 2023-11-29 04:28:04,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3810560.0, ans=0.1 2023-11-29 04:28:16,038 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=15.0 2023-11-29 04:28:16,530 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571600 2023-11-29 04:28:23,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3810693.3333333335, ans=0.125 2023-11-29 04:28:42,194 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2023-11-29 04:28:44,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2023-11-29 04:28:49,417 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6500, loss[loss=0.06351, simple_loss=0.08956, pruned_loss=0.01176, audio_tagging_loss=0.006975, over 16140.00 frames. ], tot_loss[loss=0.06447, simple_loss=0.08771, pruned_loss=0.01168, audio_tagging_loss=0.00893, over 3039310.69 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:29:08,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3810893.3333333335, ans=0.125 2023-11-29 04:29:12,841 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=22.5 2023-11-29 04:29:14,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3810960.0, ans=0.125 2023-11-29 04:29:16,583 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=22.5 2023-11-29 04:29:18,237 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571650 2023-11-29 04:29:19,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3810960.0, ans=0.1 2023-11-29 04:29:20,560 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 9.207e+01 9.940e+01 1.055e+02 1.312e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-29 04:29:50,415 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6550, loss[loss=0.0699, simple_loss=0.09207, pruned_loss=0.01418, audio_tagging_loss=0.009686, over 15081.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08766, pruned_loss=0.01169, audio_tagging_loss=0.00887, over 3038918.79 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:30:07,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3811226.6666666665, ans=0.2 2023-11-29 04:30:20,572 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571700 2023-11-29 04:30:25,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3811293.3333333335, ans=0.0 2023-11-29 04:30:30,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3811360.0, ans=0.125 2023-11-29 04:30:35,533 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.11 vs. limit=22.5 2023-11-29 04:30:38,963 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2023-11-29 04:30:51,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3811493.3333333335, ans=0.1 2023-11-29 04:30:51,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3811493.3333333335, ans=0.125 2023-11-29 04:30:52,139 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6600, loss[loss=0.0698, simple_loss=0.1036, pruned_loss=0.01317, audio_tagging_loss=0.004809, over 16128.00 frames. ], tot_loss[loss=0.06395, simple_loss=0.08703, pruned_loss=0.01165, audio_tagging_loss=0.008785, over 3038463.90 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:30:52,708 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.67 vs. limit=22.5 2023-11-29 04:30:53,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3811493.3333333335, ans=0.0 2023-11-29 04:30:58,981 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:31:18,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3811626.6666666665, ans=0.2 2023-11-29 04:31:22,096 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571750 2023-11-29 04:31:24,392 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.875e+01 9.047e+01 9.716e+01 1.044e+02 1.337e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-29 04:31:28,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3811693.3333333335, ans=0.125 2023-11-29 04:31:31,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3811693.3333333335, ans=0.125 2023-11-29 04:31:54,304 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6650, loss[loss=0.07661, simple_loss=0.1071, pruned_loss=0.01493, audio_tagging_loss=0.008138, over 14990.00 frames. ], tot_loss[loss=0.06422, simple_loss=0.08783, pruned_loss=0.01168, audio_tagging_loss=0.008626, over 3036089.92 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:32:00,331 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:32:09,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3811893.3333333335, ans=0.1 2023-11-29 04:32:18,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3811960.0, ans=0.0 2023-11-29 04:32:23,972 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571800 2023-11-29 04:32:39,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3812026.6666666665, ans=0.1 2023-11-29 04:32:46,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3812093.3333333335, ans=0.0 2023-11-29 04:32:47,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3812093.3333333335, ans=0.125 2023-11-29 04:32:53,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3812093.3333333335, ans=0.125 2023-11-29 04:32:55,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3812160.0, ans=0.2 2023-11-29 04:32:56,006 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6700, loss[loss=0.06381, simple_loss=0.09961, pruned_loss=0.00729, audio_tagging_loss=0.006721, over 15121.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.08807, pruned_loss=0.01161, audio_tagging_loss=0.008543, over 3037406.94 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:32:56,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3812160.0, ans=0.04949747468305833 2023-11-29 04:33:01,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3812160.0, ans=0.125 2023-11-29 04:33:18,554 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.37 vs. limit=15.0 2023-11-29 04:33:25,702 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571850 2023-11-29 04:33:25,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3812293.3333333335, ans=0.2 2023-11-29 04:33:29,127 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.986e+01 9.065e+01 9.575e+01 1.004e+02 1.192e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 04:33:29,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3812293.3333333335, ans=0.2 2023-11-29 04:33:30,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3812293.3333333335, ans=0.125 2023-11-29 04:33:41,971 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:33:48,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3812426.6666666665, ans=0.2 2023-11-29 04:33:54,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3812426.6666666665, ans=0.1 2023-11-29 04:33:57,351 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6750, loss[loss=0.06303, simple_loss=0.08162, pruned_loss=0.01261, audio_tagging_loss=0.009616, over 15206.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08821, pruned_loss=0.01173, audio_tagging_loss=0.008508, over 3036672.76 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:34:06,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3812493.3333333335, ans=0.125 2023-11-29 04:34:16,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3812560.0, ans=0.07 2023-11-29 04:34:17,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3812560.0, ans=0.5 2023-11-29 04:34:26,716 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571900 2023-11-29 04:34:26,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3812626.6666666665, ans=0.0 2023-11-29 04:34:42,746 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2023-11-29 04:34:58,726 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-11-29 04:34:59,693 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6800, loss[loss=0.08755, simple_loss=0.1254, pruned_loss=0.01815, audio_tagging_loss=0.00669, over 14512.00 frames. ], tot_loss[loss=0.06404, simple_loss=0.08765, pruned_loss=0.01162, audio_tagging_loss=0.0086, over 3034585.96 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:35:26,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3812960.0, ans=0.0 2023-11-29 04:35:29,205 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 571950 2023-11-29 04:35:31,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3812960.0, ans=0.0 2023-11-29 04:35:32,504 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.971e+01 9.540e+01 1.002e+02 2.888e+02, threshold=1.908e+02, percent-clipped=1.0 2023-11-29 04:35:41,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3813026.6666666665, ans=0.1 2023-11-29 04:35:46,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3813026.6666666665, ans=0.125 2023-11-29 04:35:58,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3813093.3333333335, ans=15.0 2023-11-29 04:36:00,797 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6850, loss[loss=0.04332, simple_loss=0.0594, pruned_loss=0.006359, audio_tagging_loss=0.007261, over 13968.00 frames. ], tot_loss[loss=0.06398, simple_loss=0.08767, pruned_loss=0.01158, audio_tagging_loss=0.008572, over 3033380.41 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:36:30,969 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572000 2023-11-29 04:36:53,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3813426.6666666665, ans=0.09899494936611666 2023-11-29 04:37:00,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3813426.6666666665, ans=0.125 2023-11-29 04:37:05,104 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6900, loss[loss=0.05623, simple_loss=0.08373, pruned_loss=0.006982, audio_tagging_loss=0.00738, over 16107.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08872, pruned_loss=0.01169, audio_tagging_loss=0.00849, over 3038546.30 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:37:20,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3813560.0, ans=0.2 2023-11-29 04:37:34,608 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572050 2023-11-29 04:37:37,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3813626.6666666665, ans=0.1 2023-11-29 04:37:38,057 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 9.005e+01 9.691e+01 1.035e+02 1.354e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 04:37:43,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3813693.3333333335, ans=0.125 2023-11-29 04:37:43,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3813693.3333333335, ans=0.125 2023-11-29 04:37:50,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3813693.3333333335, ans=0.125 2023-11-29 04:37:52,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3813693.3333333335, ans=0.0 2023-11-29 04:37:55,719 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 04:37:55,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3813760.0, ans=0.125 2023-11-29 04:38:06,651 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 6950, loss[loss=0.08208, simple_loss=0.1067, pruned_loss=0.02038, audio_tagging_loss=0.008371, over 14860.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08889, pruned_loss=0.01168, audio_tagging_loss=0.008452, over 3035576.74 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:38:08,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3813826.6666666665, ans=0.125 2023-11-29 04:38:36,757 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572100 2023-11-29 04:38:52,934 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-29 04:38:55,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3814093.3333333335, ans=0.0 2023-11-29 04:39:01,567 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.09 vs. limit=10.0 2023-11-29 04:39:07,951 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7000, loss[loss=0.08131, simple_loss=0.1141, pruned_loss=0.01693, audio_tagging_loss=0.007317, over 14818.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08907, pruned_loss=0.01169, audio_tagging_loss=0.008498, over 3041688.09 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:39:12,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3814160.0, ans=0.1 2023-11-29 04:39:14,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3814160.0, ans=0.125 2023-11-29 04:39:15,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3814160.0, ans=0.125 2023-11-29 04:39:15,762 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=15.0 2023-11-29 04:39:26,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3814226.6666666665, ans=0.0 2023-11-29 04:39:38,305 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572150 2023-11-29 04:39:43,421 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 8.947e+01 9.387e+01 1.017e+02 2.856e+02, threshold=1.877e+02, percent-clipped=1.0 2023-11-29 04:39:47,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3814360.0, ans=0.0 2023-11-29 04:39:58,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3814426.6666666665, ans=0.1 2023-11-29 04:40:02,622 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.46 vs. limit=15.0 2023-11-29 04:40:10,539 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7050, loss[loss=0.05668, simple_loss=0.07907, pruned_loss=0.006609, audio_tagging_loss=0.01053, over 15182.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.08799, pruned_loss=0.01161, audio_tagging_loss=0.008581, over 3041909.51 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:40:13,294 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:40:22,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3814560.0, ans=0.04949747468305833 2023-11-29 04:40:26,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3814560.0, ans=0.2 2023-11-29 04:40:39,614 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572200 2023-11-29 04:41:04,940 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=12.0 2023-11-29 04:41:12,063 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7100, loss[loss=0.06624, simple_loss=0.08227, pruned_loss=0.01542, audio_tagging_loss=0.009679, over 15529.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08859, pruned_loss=0.01178, audio_tagging_loss=0.008569, over 3051757.36 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 04:41:14,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3814826.6666666665, ans=0.0 2023-11-29 04:41:30,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3814893.3333333335, ans=0.0 2023-11-29 04:41:35,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3814960.0, ans=0.0 2023-11-29 04:41:40,532 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572250 2023-11-29 04:41:47,407 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 8.957e+01 9.566e+01 1.017e+02 1.804e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 04:41:50,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3815026.6666666665, ans=0.125 2023-11-29 04:41:52,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3815026.6666666665, ans=0.07 2023-11-29 04:42:13,123 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7150, loss[loss=0.06613, simple_loss=0.08832, pruned_loss=0.01398, audio_tagging_loss=0.007992, over 15526.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.0884, pruned_loss=0.01182, audio_tagging_loss=0.008652, over 3044640.36 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 04:42:14,865 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.70 vs. limit=6.0 2023-11-29 04:42:30,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3815226.6666666665, ans=0.5 2023-11-29 04:42:42,917 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572300 2023-11-29 04:42:44,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3815293.3333333335, ans=0.125 2023-11-29 04:42:56,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3815360.0, ans=0.0 2023-11-29 04:42:57,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3815360.0, ans=0.0 2023-11-29 04:43:00,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3815360.0, ans=0.2 2023-11-29 04:43:13,865 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7200, loss[loss=0.06543, simple_loss=0.08928, pruned_loss=0.01207, audio_tagging_loss=0.008712, over 16690.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.0888, pruned_loss=0.0119, audio_tagging_loss=0.00872, over 3049047.57 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:43:35,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3815560.0, ans=0.125 2023-11-29 04:43:44,242 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572350 2023-11-29 04:43:50,069 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.918e+01 9.002e+01 9.674e+01 1.041e+02 1.826e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-29 04:43:50,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3815693.3333333335, ans=0.0 2023-11-29 04:44:10,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3815760.0, ans=0.125 2023-11-29 04:44:15,612 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7250, loss[loss=0.04925, simple_loss=0.06317, pruned_loss=0.007733, audio_tagging_loss=0.009935, over 14913.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08771, pruned_loss=0.01173, audio_tagging_loss=0.008901, over 3047963.50 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:44:39,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3815960.0, ans=0.0 2023-11-29 04:44:43,144 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:44:44,352 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572400 2023-11-29 04:44:45,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3815960.0, ans=0.125 2023-11-29 04:44:55,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3816026.6666666665, ans=0.125 2023-11-29 04:44:59,892 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.45 vs. limit=22.5 2023-11-29 04:45:05,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3816093.3333333335, ans=0.125 2023-11-29 04:45:10,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3816093.3333333335, ans=0.1 2023-11-29 04:45:11,002 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.10 vs. limit=22.5 2023-11-29 04:45:18,357 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7300, loss[loss=0.06343, simple_loss=0.09099, pruned_loss=0.01023, audio_tagging_loss=0.007703, over 14615.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08884, pruned_loss=0.01194, audio_tagging_loss=0.008772, over 3045812.28 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:45:21,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3816160.0, ans=0.2 2023-11-29 04:45:35,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.32 vs. limit=15.0 2023-11-29 04:45:36,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3816226.6666666665, ans=0.125 2023-11-29 04:45:42,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3816293.3333333335, ans=0.125 2023-11-29 04:45:48,174 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572450 2023-11-29 04:45:52,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3816293.3333333335, ans=0.0 2023-11-29 04:45:54,547 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.609e+01 8.998e+01 9.655e+01 1.011e+02 1.283e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 04:46:14,357 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=15.0 2023-11-29 04:46:19,724 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7350, loss[loss=0.05632, simple_loss=0.07408, pruned_loss=0.01186, audio_tagging_loss=0.007416, over 13555.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08918, pruned_loss=0.01205, audio_tagging_loss=0.008575, over 3045899.85 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:46:19,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3816493.3333333335, ans=0.0 2023-11-29 04:46:21,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3816493.3333333335, ans=0.125 2023-11-29 04:46:31,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3816560.0, ans=0.125 2023-11-29 04:46:49,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3816626.6666666665, ans=0.0 2023-11-29 04:46:50,090 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572500 2023-11-29 04:46:50,720 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.43 vs. limit=22.5 2023-11-29 04:47:21,161 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7400, loss[loss=0.06817, simple_loss=0.09026, pruned_loss=0.01493, audio_tagging_loss=0.008103, over 14697.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08956, pruned_loss=0.01213, audio_tagging_loss=0.008474, over 3037312.89 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:47:51,298 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572550 2023-11-29 04:47:56,928 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 8.857e+01 9.571e+01 1.032e+02 1.214e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 04:48:07,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.68 vs. limit=15.0 2023-11-29 04:48:17,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3817093.3333333335, ans=0.125 2023-11-29 04:48:23,991 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7450, loss[loss=0.05734, simple_loss=0.08274, pruned_loss=0.008438, audio_tagging_loss=0.007526, over 14464.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09033, pruned_loss=0.01214, audio_tagging_loss=0.008353, over 3040100.92 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:48:35,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3817226.6666666665, ans=0.125 2023-11-29 04:48:45,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3817226.6666666665, ans=0.0 2023-11-29 04:48:52,833 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572600 2023-11-29 04:48:58,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3817293.3333333335, ans=0.2 2023-11-29 04:49:04,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3817360.0, ans=0.2 2023-11-29 04:49:11,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3817360.0, ans=0.1 2023-11-29 04:49:20,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3817426.6666666665, ans=0.1 2023-11-29 04:49:25,553 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7500, loss[loss=0.06433, simple_loss=0.09488, pruned_loss=0.009914, audio_tagging_loss=0.006975, over 15006.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09098, pruned_loss=0.01224, audio_tagging_loss=0.008197, over 3044463.36 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:49:30,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3817493.3333333335, ans=0.0 2023-11-29 04:49:43,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3817560.0, ans=0.0 2023-11-29 04:49:44,372 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:49:47,430 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.15 vs. limit=22.5 2023-11-29 04:49:56,467 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572650 2023-11-29 04:50:02,202 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.892e+01 9.110e+01 9.749e+01 1.048e+02 1.256e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 04:50:06,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3817693.3333333335, ans=0.1 2023-11-29 04:50:09,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3817693.3333333335, ans=0.125 2023-11-29 04:50:24,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3817760.0, ans=0.2 2023-11-29 04:50:27,263 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7550, loss[loss=0.06002, simple_loss=0.07864, pruned_loss=0.01303, audio_tagging_loss=0.007667, over 15670.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.09001, pruned_loss=0.01211, audio_tagging_loss=0.008226, over 3042036.48 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:50:57,266 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572700 2023-11-29 04:51:09,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3818026.6666666665, ans=0.125 2023-11-29 04:51:13,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3818026.6666666665, ans=0.0 2023-11-29 04:51:20,348 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.94 vs. limit=15.0 2023-11-29 04:51:25,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3818093.3333333335, ans=0.025 2023-11-29 04:51:29,823 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7600, loss[loss=0.06978, simple_loss=0.09851, pruned_loss=0.01247, audio_tagging_loss=0.008044, over 15571.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08902, pruned_loss=0.01192, audio_tagging_loss=0.008327, over 3045338.50 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:51:45,869 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.01 vs. limit=15.0 2023-11-29 04:51:48,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3818226.6666666665, ans=0.05 2023-11-29 04:51:52,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3818293.3333333335, ans=0.125 2023-11-29 04:51:58,834 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572750 2023-11-29 04:52:01,561 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-29 04:52:04,699 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 8.852e+01 9.526e+01 1.029e+02 1.380e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-29 04:52:23,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3818426.6666666665, ans=0.125 2023-11-29 04:52:30,895 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7650, loss[loss=0.06242, simple_loss=0.09406, pruned_loss=0.00929, audio_tagging_loss=0.006102, over 15245.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08882, pruned_loss=0.01193, audio_tagging_loss=0.008356, over 3043741.38 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:52:44,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3818560.0, ans=0.05 2023-11-29 04:52:57,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3818626.6666666665, ans=0.09899494936611666 2023-11-29 04:53:00,520 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572800 2023-11-29 04:53:00,976 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=15.0 2023-11-29 04:53:32,465 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7700, loss[loss=0.05812, simple_loss=0.08542, pruned_loss=0.007185, audio_tagging_loss=0.008223, over 14838.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08906, pruned_loss=0.01187, audio_tagging_loss=0.008332, over 3042309.37 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 04:53:39,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3818826.6666666665, ans=0.1 2023-11-29 04:53:42,243 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.70 vs. limit=15.0 2023-11-29 04:53:56,378 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2023-11-29 04:53:59,825 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2023-11-29 04:54:02,621 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572850 2023-11-29 04:54:04,171 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-11-29 04:54:09,487 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.101e+01 9.082e+01 9.588e+01 1.045e+02 1.280e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-29 04:54:34,876 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7750, loss[loss=0.07574, simple_loss=0.1082, pruned_loss=0.01197, audio_tagging_loss=0.009665, over 15433.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08871, pruned_loss=0.01186, audio_tagging_loss=0.008427, over 3037672.80 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:54:35,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3819160.0, ans=0.5 2023-11-29 04:54:58,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3819293.3333333335, ans=0.0 2023-11-29 04:55:04,154 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572900 2023-11-29 04:55:08,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3819293.3333333335, ans=0.125 2023-11-29 04:55:17,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3819360.0, ans=0.125 2023-11-29 04:55:28,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3819426.6666666665, ans=0.125 2023-11-29 04:55:36,098 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7800, loss[loss=0.06713, simple_loss=0.09168, pruned_loss=0.01242, audio_tagging_loss=0.008874, over 15386.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.089, pruned_loss=0.01192, audio_tagging_loss=0.008448, over 3044594.52 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:55:41,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3819493.3333333335, ans=0.04949747468305833 2023-11-29 04:55:51,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3819560.0, ans=0.125 2023-11-29 04:56:05,436 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 572950 2023-11-29 04:56:08,969 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2023-11-29 04:56:10,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3819626.6666666665, ans=0.0 2023-11-29 04:56:12,919 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 9.184e+01 1.003e+02 1.060e+02 1.343e+02, threshold=2.007e+02, percent-clipped=0.0 2023-11-29 04:56:33,709 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 04:56:37,818 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7850, loss[loss=0.05048, simple_loss=0.0585, pruned_loss=0.009251, audio_tagging_loss=0.01198, over 14493.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08881, pruned_loss=0.01199, audio_tagging_loss=0.00855, over 3049139.78 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:57:07,128 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573000 2023-11-29 04:57:07,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3819960.0, ans=0.0 2023-11-29 04:57:39,586 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7900, loss[loss=0.06747, simple_loss=0.09197, pruned_loss=0.01482, audio_tagging_loss=0.006662, over 14840.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08919, pruned_loss=0.01199, audio_tagging_loss=0.008623, over 3049369.52 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:58:09,648 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573050 2023-11-29 04:58:16,467 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 9.085e+01 9.812e+01 1.049e+02 1.531e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 04:58:19,450 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.73 vs. limit=6.0 2023-11-29 04:58:41,058 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 7950, loss[loss=0.06174, simple_loss=0.08613, pruned_loss=0.01035, audio_tagging_loss=0.008324, over 14535.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08932, pruned_loss=0.01185, audio_tagging_loss=0.008679, over 3058761.50 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 04:58:42,935 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-11-29 04:58:43,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3820493.3333333335, ans=0.125 2023-11-29 04:58:46,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3820493.3333333335, ans=0.2 2023-11-29 04:58:53,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3820560.0, ans=0.1 2023-11-29 04:58:56,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3820560.0, ans=0.1 2023-11-29 04:59:00,108 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 04:59:11,297 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573100 2023-11-29 04:59:23,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3820693.3333333335, ans=0.125 2023-11-29 04:59:23,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3820693.3333333335, ans=0.0 2023-11-29 04:59:26,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3820693.3333333335, ans=0.0 2023-11-29 04:59:31,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3820760.0, ans=0.025 2023-11-29 04:59:37,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3820760.0, ans=0.125 2023-11-29 04:59:39,755 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-29 04:59:43,483 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8000, loss[loss=0.06577, simple_loss=0.09449, pruned_loss=0.01137, audio_tagging_loss=0.007155, over 16788.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08911, pruned_loss=0.01191, audio_tagging_loss=0.008853, over 3058662.38 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:00:12,732 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573150 2023-11-29 05:00:20,797 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.456e+01 9.160e+01 9.620e+01 1.029e+02 4.171e+02, threshold=1.924e+02, percent-clipped=1.0 2023-11-29 05:00:26,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3821026.6666666665, ans=0.0 2023-11-29 05:00:45,125 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8050, loss[loss=0.05329, simple_loss=0.08118, pruned_loss=0.002449, audio_tagging_loss=0.01025, over 14122.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08929, pruned_loss=0.01185, audio_tagging_loss=0.008904, over 3048789.41 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:01:05,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3821226.6666666665, ans=0.1 2023-11-29 05:01:08,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3821293.3333333335, ans=0.0 2023-11-29 05:01:13,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3821293.3333333335, ans=0.1 2023-11-29 05:01:14,591 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573200 2023-11-29 05:01:46,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3821493.3333333335, ans=0.125 2023-11-29 05:01:47,027 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8100, loss[loss=0.0684, simple_loss=0.08441, pruned_loss=0.01531, audio_tagging_loss=0.01089, over 16788.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08984, pruned_loss=0.01194, audio_tagging_loss=0.008783, over 3052752.19 frames. ], batch size: 65, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:02:09,554 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:02:10,077 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.29 vs. limit=15.0 2023-11-29 05:02:16,352 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573250 2023-11-29 05:02:25,668 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.207e+01 9.031e+01 9.567e+01 1.056e+02 1.290e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 05:02:37,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3821760.0, ans=0.125 2023-11-29 05:02:47,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3821826.6666666665, ans=0.0 2023-11-29 05:02:48,030 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8150, loss[loss=0.0666, simple_loss=0.08743, pruned_loss=0.01447, audio_tagging_loss=0.008411, over 16405.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08924, pruned_loss=0.01187, audio_tagging_loss=0.008685, over 3045791.39 frames. ], batch size: 64, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:02:50,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3821826.6666666665, ans=0.125 2023-11-29 05:03:18,680 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573300 2023-11-29 05:03:20,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3821960.0, ans=0.125 2023-11-29 05:03:20,250 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2023-11-29 05:03:44,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3822093.3333333335, ans=0.1 2023-11-29 05:03:50,149 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8200, loss[loss=0.04875, simple_loss=0.06811, pruned_loss=0.007224, audio_tagging_loss=0.007474, over 14907.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08947, pruned_loss=0.01177, audio_tagging_loss=0.00851, over 3047836.84 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:03:54,222 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:03:57,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3822160.0, ans=0.125 2023-11-29 05:04:15,968 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:04:16,179 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=15.0 2023-11-29 05:04:19,289 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573350 2023-11-29 05:04:19,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3822293.3333333335, ans=0.125 2023-11-29 05:04:27,918 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.713e+01 9.111e+01 9.648e+01 1.058e+02 1.357e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 05:04:36,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3822360.0, ans=0.125 2023-11-29 05:04:45,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3822426.6666666665, ans=0.0 2023-11-29 05:04:49,549 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2023-11-29 05:04:51,499 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8250, loss[loss=0.05667, simple_loss=0.07741, pruned_loss=0.008417, audio_tagging_loss=0.00955, over 15408.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.09009, pruned_loss=0.01197, audio_tagging_loss=0.008517, over 3048258.50 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:05:06,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3822560.0, ans=0.125 2023-11-29 05:05:16,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3822626.6666666665, ans=0.1 2023-11-29 05:05:20,992 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573400 2023-11-29 05:05:26,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=3822626.6666666665, ans=12.0 2023-11-29 05:05:35,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3822693.3333333335, ans=0.1 2023-11-29 05:05:44,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3822760.0, ans=0.1 2023-11-29 05:05:52,755 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8300, loss[loss=0.06121, simple_loss=0.08842, pruned_loss=0.01052, audio_tagging_loss=0.006481, over 15072.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08938, pruned_loss=0.01201, audio_tagging_loss=0.008508, over 3053019.57 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:05:56,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3822826.6666666665, ans=0.0 2023-11-29 05:06:23,367 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573450 2023-11-29 05:06:31,402 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.078e+01 8.946e+01 9.758e+01 1.060e+02 1.383e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-29 05:06:40,081 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:06:44,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3823093.3333333335, ans=0.0 2023-11-29 05:06:47,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3823093.3333333335, ans=0.05 2023-11-29 05:06:54,997 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8350, loss[loss=0.06384, simple_loss=0.08429, pruned_loss=0.01401, audio_tagging_loss=0.007684, over 15954.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08934, pruned_loss=0.01205, audio_tagging_loss=0.008451, over 3057045.40 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:07:24,361 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573500 2023-11-29 05:07:48,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3823426.6666666665, ans=0.125 2023-11-29 05:07:50,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3823426.6666666665, ans=0.125 2023-11-29 05:07:57,402 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8400, loss[loss=0.05986, simple_loss=0.08236, pruned_loss=0.01041, audio_tagging_loss=0.008269, over 14828.00 frames. ], tot_loss[loss=0.06512, simple_loss=0.08907, pruned_loss=0.01205, audio_tagging_loss=0.008536, over 3056421.23 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:08:01,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3823493.3333333335, ans=0.125 2023-11-29 05:08:08,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3823560.0, ans=0.0 2023-11-29 05:08:10,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3823560.0, ans=0.1 2023-11-29 05:08:12,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3823560.0, ans=0.0 2023-11-29 05:08:25,930 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573550 2023-11-29 05:08:36,228 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.903e+01 9.025e+01 9.772e+01 1.057e+02 1.487e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 05:08:41,161 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.04 vs. limit=22.5 2023-11-29 05:08:42,078 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.94 vs. limit=15.0 2023-11-29 05:08:56,892 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8450, loss[loss=0.05295, simple_loss=0.07098, pruned_loss=0.007437, audio_tagging_loss=0.01002, over 14952.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08853, pruned_loss=0.01202, audio_tagging_loss=0.008569, over 3050552.81 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:09:03,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3823826.6666666665, ans=0.125 2023-11-29 05:09:17,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3823893.3333333335, ans=0.025 2023-11-29 05:09:28,163 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573600 2023-11-29 05:09:34,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3824026.6666666665, ans=0.125 2023-11-29 05:09:36,838 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:09:59,993 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8500, loss[loss=0.08897, simple_loss=0.128, pruned_loss=0.01767, audio_tagging_loss=0.007324, over 15691.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08979, pruned_loss=0.01208, audio_tagging_loss=0.008531, over 3048803.92 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:10:29,794 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573650 2023-11-29 05:10:33,929 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=22.5 2023-11-29 05:10:39,020 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.356e+01 9.190e+01 9.692e+01 1.077e+02 1.317e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 05:10:42,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3824360.0, ans=0.125 2023-11-29 05:10:43,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3824360.0, ans=0.125 2023-11-29 05:10:46,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3824360.0, ans=0.125 2023-11-29 05:10:55,937 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=15.0 2023-11-29 05:11:02,944 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8550, loss[loss=0.06183, simple_loss=0.09039, pruned_loss=0.008378, audio_tagging_loss=0.008263, over 16971.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08988, pruned_loss=0.0122, audio_tagging_loss=0.008525, over 3055883.04 frames. ], batch size: 63, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:11:08,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3824493.3333333335, ans=0.125 2023-11-29 05:11:19,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3824560.0, ans=0.125 2023-11-29 05:11:27,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3824626.6666666665, ans=0.0 2023-11-29 05:11:29,690 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.76 vs. limit=15.0 2023-11-29 05:11:30,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3824626.6666666665, ans=0.04949747468305833 2023-11-29 05:11:31,526 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573700 2023-11-29 05:11:40,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3824693.3333333335, ans=0.125 2023-11-29 05:11:56,241 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2023-11-29 05:12:01,529 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:12:02,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3824826.6666666665, ans=0.125 2023-11-29 05:12:03,553 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8600, loss[loss=0.07428, simple_loss=0.1011, pruned_loss=0.01326, audio_tagging_loss=0.01048, over 14743.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.08975, pruned_loss=0.01208, audio_tagging_loss=0.008589, over 3050235.22 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:12:28,977 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=22.5 2023-11-29 05:12:33,526 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573750 2023-11-29 05:12:34,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3824960.0, ans=0.2 2023-11-29 05:12:44,143 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.305e+01 8.900e+01 9.530e+01 1.037e+02 1.292e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-29 05:12:47,257 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.37 vs. limit=22.5 2023-11-29 05:12:49,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3825026.6666666665, ans=0.125 2023-11-29 05:12:53,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3825093.3333333335, ans=0.1 2023-11-29 05:12:58,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3825093.3333333335, ans=0.0 2023-11-29 05:13:03,141 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2023-11-29 05:13:04,702 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8650, loss[loss=0.06626, simple_loss=0.09583, pruned_loss=0.01173, audio_tagging_loss=0.006621, over 15004.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09033, pruned_loss=0.0121, audio_tagging_loss=0.00862, over 3050581.24 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:13:07,608 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=15.0 2023-11-29 05:13:09,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3825160.0, ans=0.0 2023-11-29 05:13:27,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3825226.6666666665, ans=0.1 2023-11-29 05:13:30,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3825293.3333333335, ans=0.0 2023-11-29 05:13:31,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3825293.3333333335, ans=0.1 2023-11-29 05:13:34,623 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573800 2023-11-29 05:13:44,903 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=15.0 2023-11-29 05:14:04,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3825426.6666666665, ans=0.125 2023-11-29 05:14:06,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3825493.3333333335, ans=0.125 2023-11-29 05:14:06,964 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8700, loss[loss=0.07453, simple_loss=0.106, pruned_loss=0.01549, audio_tagging_loss=0.006022, over 15943.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.08983, pruned_loss=0.01212, audio_tagging_loss=0.00865, over 3043566.74 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:14:19,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-11-29 05:14:28,021 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.70 vs. limit=22.5 2023-11-29 05:14:36,364 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573850 2023-11-29 05:14:47,680 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 9.152e+01 9.894e+01 1.070e+02 1.338e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 05:14:51,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3825693.3333333335, ans=0.125 2023-11-29 05:14:55,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3825760.0, ans=0.125 2023-11-29 05:15:00,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3825760.0, ans=0.1 2023-11-29 05:15:01,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3825760.0, ans=0.0 2023-11-29 05:15:08,733 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8750, loss[loss=0.05527, simple_loss=0.07794, pruned_loss=0.004526, audio_tagging_loss=0.01177, over 15833.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08995, pruned_loss=0.01202, audio_tagging_loss=0.008739, over 3044326.22 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:15:11,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3825826.6666666665, ans=0.5 2023-11-29 05:15:20,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3825893.3333333335, ans=0.2 2023-11-29 05:15:37,801 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573900 2023-11-29 05:15:39,819 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.27 vs. limit=15.0 2023-11-29 05:16:02,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3826093.3333333335, ans=0.0 2023-11-29 05:16:10,218 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8800, loss[loss=0.0611, simple_loss=0.07944, pruned_loss=0.01185, audio_tagging_loss=0.009529, over 14737.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09023, pruned_loss=0.01209, audio_tagging_loss=0.008811, over 3046530.49 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:16:12,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3826160.0, ans=0.125 2023-11-29 05:16:17,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3826160.0, ans=0.125 2023-11-29 05:16:20,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3826160.0, ans=0.5 2023-11-29 05:16:21,809 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=22.5 2023-11-29 05:16:39,821 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 573950 2023-11-29 05:16:50,296 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.805e+01 9.120e+01 9.746e+01 1.050e+02 1.300e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-29 05:16:50,538 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:16:55,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3826360.0, ans=0.0 2023-11-29 05:16:56,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3826360.0, ans=0.0 2023-11-29 05:16:59,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3826426.6666666665, ans=0.125 2023-11-29 05:17:11,272 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8850, loss[loss=0.07555, simple_loss=0.112, pruned_loss=0.01202, audio_tagging_loss=0.007512, over 15615.00 frames. ], tot_loss[loss=0.06699, simple_loss=0.09176, pruned_loss=0.01242, audio_tagging_loss=0.008686, over 3051165.94 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:17:26,533 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:17:37,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3826626.6666666665, ans=0.125 2023-11-29 05:17:40,677 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574000 2023-11-29 05:17:45,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3826626.6666666665, ans=0.125 2023-11-29 05:17:48,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3826693.3333333335, ans=0.07 2023-11-29 05:17:52,198 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:18:14,006 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8900, loss[loss=0.06157, simple_loss=0.08886, pruned_loss=0.009929, audio_tagging_loss=0.007214, over 14207.00 frames. ], tot_loss[loss=0.06715, simple_loss=0.09246, pruned_loss=0.01241, audio_tagging_loss=0.008511, over 3053996.52 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:18:25,234 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2023-11-29 05:18:39,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3826960.0, ans=0.0 2023-11-29 05:18:39,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3826960.0, ans=0.2 2023-11-29 05:18:43,694 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574050 2023-11-29 05:18:52,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3827026.6666666665, ans=0.125 2023-11-29 05:18:54,681 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 9.202e+01 9.774e+01 1.025e+02 3.343e+02, threshold=1.955e+02, percent-clipped=1.0 2023-11-29 05:19:04,085 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-29 05:19:15,222 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 8950, loss[loss=0.04964, simple_loss=0.06672, pruned_loss=0.005791, audio_tagging_loss=0.01049, over 15476.00 frames. ], tot_loss[loss=0.06628, simple_loss=0.09146, pruned_loss=0.01211, audio_tagging_loss=0.008444, over 3050941.82 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:19:39,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3827293.3333333335, ans=0.0 2023-11-29 05:19:41,286 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2023-11-29 05:19:42,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3827293.3333333335, ans=0.09899494936611666 2023-11-29 05:19:45,574 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574100 2023-11-29 05:19:45,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3827293.3333333335, ans=0.125 2023-11-29 05:19:52,732 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=15.0 2023-11-29 05:19:55,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3827360.0, ans=0.025 2023-11-29 05:20:05,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3827426.6666666665, ans=0.1 2023-11-29 05:20:15,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3827426.6666666665, ans=0.1 2023-11-29 05:20:17,589 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9000, loss[loss=0.05249, simple_loss=0.07016, pruned_loss=0.009274, audio_tagging_loss=0.008137, over 15888.00 frames. ], tot_loss[loss=0.06636, simple_loss=0.09153, pruned_loss=0.01222, audio_tagging_loss=0.008368, over 3057829.02 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:20:17,590 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 05:20:56,942 INFO [train_asr.py:1267] (2/4) Epoch 48, validation: loss=0.05922, simple_loss=0.05036, pruned_loss=0.00529, audio_tagging_loss=0.02875, over 4681554.00 frames. 2023-11-29 05:20:56,943 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 05:21:03,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3827493.3333333335, ans=0.125 2023-11-29 05:21:04,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3827493.3333333335, ans=0.0 2023-11-29 05:21:04,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3827493.3333333335, ans=0.1 2023-11-29 05:21:09,220 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.78 vs. limit=15.0 2023-11-29 05:21:10,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3827560.0, ans=0.125 2023-11-29 05:21:17,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3827560.0, ans=0.125 2023-11-29 05:21:26,347 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574150 2023-11-29 05:21:27,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3827626.6666666665, ans=0.125 2023-11-29 05:21:33,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3827693.3333333335, ans=0.1 2023-11-29 05:21:37,482 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 9.287e+01 9.829e+01 1.058e+02 1.335e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-29 05:21:37,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3827693.3333333335, ans=0.2 2023-11-29 05:21:42,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3827693.3333333335, ans=0.125 2023-11-29 05:21:42,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3827693.3333333335, ans=0.0 2023-11-29 05:21:48,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3827760.0, ans=0.0 2023-11-29 05:21:52,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3827760.0, ans=0.0 2023-11-29 05:21:58,577 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9050, loss[loss=0.07304, simple_loss=0.1026, pruned_loss=0.014, audio_tagging_loss=0.007724, over 14997.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.09039, pruned_loss=0.01209, audio_tagging_loss=0.008385, over 3055526.72 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:22:27,883 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574200 2023-11-29 05:22:43,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3828026.6666666665, ans=0.125 2023-11-29 05:22:50,733 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:23:00,517 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9100, loss[loss=0.07935, simple_loss=0.111, pruned_loss=0.01892, audio_tagging_loss=0.004943, over 15248.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08919, pruned_loss=0.01196, audio_tagging_loss=0.008418, over 3051729.69 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:23:29,686 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574250 2023-11-29 05:23:29,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3828293.3333333335, ans=0.125 2023-11-29 05:23:40,968 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.452e+01 9.007e+01 9.515e+01 1.034e+02 1.309e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-29 05:23:51,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3828426.6666666665, ans=0.125 2023-11-29 05:23:56,874 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=15.0 2023-11-29 05:24:01,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3828493.3333333335, ans=0.1 2023-11-29 05:24:02,089 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9150, loss[loss=0.05871, simple_loss=0.07905, pruned_loss=0.009081, audio_tagging_loss=0.0101, over 14898.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.0883, pruned_loss=0.0118, audio_tagging_loss=0.008456, over 3048178.00 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:24:18,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3828560.0, ans=0.0 2023-11-29 05:24:32,086 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574300 2023-11-29 05:24:32,695 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.95 vs. limit=10.0 2023-11-29 05:24:56,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3828760.0, ans=0.125 2023-11-29 05:25:04,011 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9200, loss[loss=0.07511, simple_loss=0.1079, pruned_loss=0.01503, audio_tagging_loss=0.006146, over 15228.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08885, pruned_loss=0.01181, audio_tagging_loss=0.008402, over 3043825.94 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:25:09,723 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2023-11-29 05:25:10,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3828826.6666666665, ans=0.125 2023-11-29 05:25:27,434 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.40 vs. limit=15.0 2023-11-29 05:25:28,190 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:25:33,849 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574350 2023-11-29 05:25:36,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3828960.0, ans=0.125 2023-11-29 05:25:42,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3829026.6666666665, ans=0.2 2023-11-29 05:25:44,293 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.055e+01 8.927e+01 9.501e+01 1.029e+02 1.392e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-29 05:25:48,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3829026.6666666665, ans=0.2 2023-11-29 05:25:51,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3829026.6666666665, ans=0.0 2023-11-29 05:25:54,131 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:26:02,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3829093.3333333335, ans=0.0 2023-11-29 05:26:02,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3829093.3333333335, ans=0.2 2023-11-29 05:26:06,027 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9250, loss[loss=0.07585, simple_loss=0.1163, pruned_loss=0.01101, audio_tagging_loss=0.006659, over 16309.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08927, pruned_loss=0.01182, audio_tagging_loss=0.008348, over 3048496.45 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:26:25,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3829226.6666666665, ans=0.125 2023-11-29 05:26:35,591 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574400 2023-11-29 05:26:46,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3829360.0, ans=0.125 2023-11-29 05:26:48,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3829360.0, ans=0.1 2023-11-29 05:26:52,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=3829360.0, ans=15.0 2023-11-29 05:27:06,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3829426.6666666665, ans=0.125 2023-11-29 05:27:08,211 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9300, loss[loss=0.0575, simple_loss=0.0773, pruned_loss=0.01017, audio_tagging_loss=0.008681, over 14143.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08896, pruned_loss=0.01174, audio_tagging_loss=0.008444, over 3043257.61 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:27:13,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3829493.3333333335, ans=0.125 2023-11-29 05:27:31,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3829626.6666666665, ans=0.0 2023-11-29 05:27:36,639 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-29 05:27:37,431 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574450 2023-11-29 05:27:40,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3829626.6666666665, ans=0.125 2023-11-29 05:27:47,420 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.18 vs. limit=10.0 2023-11-29 05:27:50,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3829693.3333333335, ans=0.125 2023-11-29 05:27:51,500 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.398e+01 8.967e+01 9.597e+01 1.017e+02 1.229e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-29 05:27:54,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3829693.3333333335, ans=0.1 2023-11-29 05:28:08,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3829826.6666666665, ans=0.125 2023-11-29 05:28:09,015 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9350, loss[loss=0.0645, simple_loss=0.09302, pruned_loss=0.01065, audio_tagging_loss=0.007343, over 14015.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08929, pruned_loss=0.01183, audio_tagging_loss=0.008397, over 3034048.24 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:28:12,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3829826.6666666665, ans=0.0 2023-11-29 05:28:15,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3829826.6666666665, ans=0.07 2023-11-29 05:28:26,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3829893.3333333335, ans=0.0 2023-11-29 05:28:26,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3829893.3333333335, ans=0.125 2023-11-29 05:28:26,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3829893.3333333335, ans=0.125 2023-11-29 05:28:37,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3829960.0, ans=0.1 2023-11-29 05:28:39,337 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574500 2023-11-29 05:28:52,237 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:28:53,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.64 vs. limit=10.0 2023-11-29 05:28:54,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3830026.6666666665, ans=0.1 2023-11-29 05:28:55,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3830026.6666666665, ans=0.125 2023-11-29 05:29:10,051 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9400, loss[loss=0.06439, simple_loss=0.09624, pruned_loss=0.009287, audio_tagging_loss=0.006982, over 14879.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08904, pruned_loss=0.01181, audio_tagging_loss=0.008581, over 3034244.10 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:29:10,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3830160.0, ans=0.125 2023-11-29 05:29:10,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.05 vs. limit=10.0 2023-11-29 05:29:23,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3830226.6666666665, ans=0.0 2023-11-29 05:29:30,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3830226.6666666665, ans=0.09899494936611666 2023-11-29 05:29:36,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3830293.3333333335, ans=0.0 2023-11-29 05:29:38,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3830293.3333333335, ans=0.125 2023-11-29 05:29:39,415 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574550 2023-11-29 05:29:44,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3830293.3333333335, ans=0.0 2023-11-29 05:29:46,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3830360.0, ans=0.125 2023-11-29 05:29:53,497 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.610e+01 9.033e+01 9.709e+01 1.034e+02 1.178e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-29 05:30:05,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3830426.6666666665, ans=0.125 2023-11-29 05:30:12,141 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9450, loss[loss=0.06012, simple_loss=0.08129, pruned_loss=0.009348, audio_tagging_loss=0.01013, over 14808.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08933, pruned_loss=0.01187, audio_tagging_loss=0.008622, over 3040610.40 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:30:13,327 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:30:18,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3830493.3333333335, ans=0.125 2023-11-29 05:30:20,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3830493.3333333335, ans=0.125 2023-11-29 05:30:24,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3830560.0, ans=0.125 2023-11-29 05:30:35,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-11-29 05:30:41,518 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574600 2023-11-29 05:30:54,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3830693.3333333335, ans=0.0 2023-11-29 05:30:57,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3830693.3333333335, ans=0.0 2023-11-29 05:31:02,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3830760.0, ans=0.5 2023-11-29 05:31:13,403 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9500, loss[loss=0.07867, simple_loss=0.1056, pruned_loss=0.01861, audio_tagging_loss=0.007272, over 16041.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.0884, pruned_loss=0.0117, audio_tagging_loss=0.008754, over 3037689.08 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:31:44,257 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574650 2023-11-29 05:31:51,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3831026.6666666665, ans=0.1 2023-11-29 05:31:53,909 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2023-11-29 05:31:56,867 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 8.911e+01 9.563e+01 1.027e+02 1.260e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 05:31:57,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3831026.6666666665, ans=0.0 2023-11-29 05:32:03,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3831093.3333333335, ans=0.125 2023-11-29 05:32:15,816 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9550, loss[loss=0.05303, simple_loss=0.07223, pruned_loss=0.007021, audio_tagging_loss=0.009897, over 13808.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08828, pruned_loss=0.01162, audio_tagging_loss=0.008836, over 3032662.00 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:32:24,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3831160.0, ans=0.95 2023-11-29 05:32:38,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3831226.6666666665, ans=0.125 2023-11-29 05:32:39,676 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2023-11-29 05:32:40,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3831293.3333333335, ans=0.2 2023-11-29 05:32:44,976 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574700 2023-11-29 05:32:45,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3831293.3333333335, ans=0.125 2023-11-29 05:32:45,334 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=15.0 2023-11-29 05:32:57,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3831360.0, ans=0.0 2023-11-29 05:33:01,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3831360.0, ans=0.125 2023-11-29 05:33:17,848 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9600, loss[loss=0.061, simple_loss=0.08425, pruned_loss=0.008288, audio_tagging_loss=0.01058, over 15712.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08849, pruned_loss=0.01166, audio_tagging_loss=0.008891, over 3040858.17 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:33:18,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.06 vs. limit=10.0 2023-11-29 05:33:24,014 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.59 vs. limit=10.0 2023-11-29 05:33:31,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3831560.0, ans=0.2 2023-11-29 05:33:32,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3831560.0, ans=0.125 2023-11-29 05:33:36,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3831560.0, ans=0.0 2023-11-29 05:33:45,406 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.11 vs. limit=22.5 2023-11-29 05:33:46,164 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574750 2023-11-29 05:33:49,743 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.31 vs. limit=10.0 2023-11-29 05:34:01,056 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 9.154e+01 9.787e+01 1.038e+02 1.402e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-29 05:34:03,093 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2023-11-29 05:34:17,748 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:34:18,741 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9650, loss[loss=0.06123, simple_loss=0.09168, pruned_loss=0.009271, audio_tagging_loss=0.006121, over 15008.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08836, pruned_loss=0.01162, audio_tagging_loss=0.00885, over 3042631.14 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:34:21,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3831826.6666666665, ans=0.125 2023-11-29 05:34:35,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3831893.3333333335, ans=0.125 2023-11-29 05:34:41,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3831893.3333333335, ans=0.2 2023-11-29 05:34:47,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3831960.0, ans=0.125 2023-11-29 05:34:50,467 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574800 2023-11-29 05:34:57,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3832026.6666666665, ans=0.0 2023-11-29 05:35:08,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3832093.3333333335, ans=0.0 2023-11-29 05:35:20,958 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9700, loss[loss=0.07855, simple_loss=0.1131, pruned_loss=0.01666, audio_tagging_loss=0.005316, over 14906.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.08976, pruned_loss=0.0119, audio_tagging_loss=0.008663, over 3049348.69 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:35:23,852 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.35 vs. limit=15.0 2023-11-29 05:35:27,622 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:35:50,788 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574850 2023-11-29 05:36:02,204 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.26 vs. limit=10.0 2023-11-29 05:36:03,816 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.548e+01 9.065e+01 9.811e+01 1.054e+02 1.349e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 05:36:06,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3832360.0, ans=0.0 2023-11-29 05:36:17,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3832426.6666666665, ans=0.125 2023-11-29 05:36:23,054 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9750, loss[loss=0.08064, simple_loss=0.1131, pruned_loss=0.0179, audio_tagging_loss=0.006166, over 16023.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.09007, pruned_loss=0.01183, audio_tagging_loss=0.008541, over 3051329.38 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:36:23,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3832493.3333333335, ans=0.0 2023-11-29 05:36:33,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3832560.0, ans=0.125 2023-11-29 05:36:45,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3832626.6666666665, ans=0.125 2023-11-29 05:36:48,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3832626.6666666665, ans=0.0 2023-11-29 05:36:51,672 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574900 2023-11-29 05:36:53,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3832626.6666666665, ans=0.125 2023-11-29 05:37:03,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3832693.3333333335, ans=0.1 2023-11-29 05:37:23,658 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9800, loss[loss=0.06213, simple_loss=0.08486, pruned_loss=0.01151, audio_tagging_loss=0.00819, over 15006.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.08939, pruned_loss=0.01172, audio_tagging_loss=0.008478, over 3056287.71 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:37:23,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3832826.6666666665, ans=0.125 2023-11-29 05:37:51,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3832960.0, ans=0.0 2023-11-29 05:37:51,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3832960.0, ans=0.125 2023-11-29 05:37:52,469 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 574950 2023-11-29 05:37:53,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3832960.0, ans=0.1 2023-11-29 05:38:03,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3833026.6666666665, ans=0.07 2023-11-29 05:38:05,607 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.528e+01 9.329e+01 9.816e+01 1.069e+02 1.352e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-29 05:38:07,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3833026.6666666665, ans=0.0 2023-11-29 05:38:09,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3833026.6666666665, ans=0.125 2023-11-29 05:38:20,047 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:38:23,361 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9850, loss[loss=0.05919, simple_loss=0.08742, pruned_loss=0.008177, audio_tagging_loss=0.007305, over 13844.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.09046, pruned_loss=0.01185, audio_tagging_loss=0.008344, over 3049114.68 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:38:24,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3833160.0, ans=0.125 2023-11-29 05:38:26,239 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-29 05:38:38,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3833226.6666666665, ans=0.1 2023-11-29 05:38:53,099 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575000 2023-11-29 05:39:24,252 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9900, loss[loss=0.0774, simple_loss=0.1106, pruned_loss=0.01491, audio_tagging_loss=0.007216, over 15661.00 frames. ], tot_loss[loss=0.06555, simple_loss=0.09013, pruned_loss=0.012, audio_tagging_loss=0.008489, over 3045178.40 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:39:26,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3833493.3333333335, ans=0.09899494936611666 2023-11-29 05:39:28,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3833493.3333333335, ans=0.125 2023-11-29 05:39:30,912 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=8.0 2023-11-29 05:39:41,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3833560.0, ans=0.0 2023-11-29 05:39:44,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3833560.0, ans=0.1 2023-11-29 05:39:50,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3833626.6666666665, ans=0.0 2023-11-29 05:39:53,585 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575050 2023-11-29 05:40:00,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3833693.3333333335, ans=0.0 2023-11-29 05:40:06,127 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.137e+01 9.333e+01 9.894e+01 1.049e+02 1.495e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 05:40:06,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3833693.3333333335, ans=0.0 2023-11-29 05:40:06,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3833693.3333333335, ans=0.2 2023-11-29 05:40:13,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3833760.0, ans=0.125 2023-11-29 05:40:25,400 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 9950, loss[loss=0.05278, simple_loss=0.07508, pruned_loss=0.007007, audio_tagging_loss=0.008233, over 14758.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08932, pruned_loss=0.01186, audio_tagging_loss=0.008482, over 3050733.73 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:40:26,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3833826.6666666665, ans=0.0 2023-11-29 05:40:41,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3833893.3333333335, ans=0.125 2023-11-29 05:40:50,066 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.79 vs. limit=15.0 2023-11-29 05:40:53,864 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575100 2023-11-29 05:41:00,033 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.24 vs. limit=22.5 2023-11-29 05:41:25,604 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10000, loss[loss=0.06246, simple_loss=0.08083, pruned_loss=0.01427, audio_tagging_loss=0.007777, over 15296.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08996, pruned_loss=0.0121, audio_tagging_loss=0.008407, over 3054687.89 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:41:25,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3834160.0, ans=0.125 2023-11-29 05:41:35,223 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.80 vs. limit=10.0 2023-11-29 05:41:42,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3834226.6666666665, ans=0.0 2023-11-29 05:41:52,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3834293.3333333335, ans=0.1 2023-11-29 05:41:55,762 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575150 2023-11-29 05:42:08,203 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.189e+01 8.882e+01 9.475e+01 1.008e+02 1.351e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-29 05:42:14,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3834426.6666666665, ans=0.2 2023-11-29 05:42:26,209 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10050, loss[loss=0.07507, simple_loss=0.1079, pruned_loss=0.01319, audio_tagging_loss=0.007948, over 13803.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08995, pruned_loss=0.01193, audio_tagging_loss=0.008437, over 3047395.52 frames. ], batch size: 52, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:42:44,611 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.27 vs. limit=15.0 2023-11-29 05:42:55,665 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575200 2023-11-29 05:43:09,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3834693.3333333335, ans=0.125 2023-11-29 05:43:12,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3834693.3333333335, ans=0.0 2023-11-29 05:43:22,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3834760.0, ans=0.0 2023-11-29 05:43:28,462 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10100, loss[loss=0.06552, simple_loss=0.08068, pruned_loss=0.01535, audio_tagging_loss=0.009822, over 15177.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08977, pruned_loss=0.01194, audio_tagging_loss=0.008479, over 3049154.39 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:43:43,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3834893.3333333335, ans=10.0 2023-11-29 05:43:44,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3834893.3333333335, ans=0.125 2023-11-29 05:43:55,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3834960.0, ans=0.125 2023-11-29 05:43:56,909 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575250 2023-11-29 05:44:01,596 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.09 vs. limit=15.0 2023-11-29 05:44:08,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3835026.6666666665, ans=0.125 2023-11-29 05:44:10,709 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.549e+01 9.243e+01 9.943e+01 1.062e+02 1.322e+02, threshold=1.989e+02, percent-clipped=0.0 2023-11-29 05:44:15,412 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2023-11-29 05:44:16,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3835093.3333333335, ans=0.125 2023-11-29 05:44:20,555 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:44:25,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3835093.3333333335, ans=0.125 2023-11-29 05:44:26,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3835093.3333333335, ans=0.125 2023-11-29 05:44:28,562 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10150, loss[loss=0.05616, simple_loss=0.07637, pruned_loss=0.00744, audio_tagging_loss=0.01054, over 15126.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08969, pruned_loss=0.01207, audio_tagging_loss=0.008522, over 3046851.27 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:44:57,783 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575300 2023-11-29 05:45:00,023 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:45:12,607 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.79 vs. limit=15.0 2023-11-29 05:45:24,704 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.89 vs. limit=15.0 2023-11-29 05:45:28,656 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10200, loss[loss=0.06715, simple_loss=0.08559, pruned_loss=0.01626, audio_tagging_loss=0.008096, over 13963.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08928, pruned_loss=0.01198, audio_tagging_loss=0.008534, over 3040577.78 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:45:28,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3835493.3333333335, ans=0.0 2023-11-29 05:45:34,166 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:45:45,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3835560.0, ans=0.0 2023-11-29 05:45:46,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3835560.0, ans=0.125 2023-11-29 05:45:48,994 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2023-11-29 05:45:50,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3835560.0, ans=0.1 2023-11-29 05:45:53,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3835626.6666666665, ans=0.125 2023-11-29 05:45:54,123 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.01 vs. limit=22.5 2023-11-29 05:45:54,733 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:45:57,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3835626.6666666665, ans=0.0 2023-11-29 05:45:58,175 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575350 2023-11-29 05:46:01,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3835626.6666666665, ans=0.1 2023-11-29 05:46:12,470 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.929e+01 8.881e+01 9.698e+01 1.024e+02 1.374e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 05:46:29,757 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10250, loss[loss=0.07045, simple_loss=0.09229, pruned_loss=0.01499, audio_tagging_loss=0.009315, over 14172.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.0891, pruned_loss=0.01211, audio_tagging_loss=0.008583, over 3046967.35 frames. ], batch size: 53, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:46:37,980 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2023-11-29 05:46:41,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3835893.3333333335, ans=0.0 2023-11-29 05:46:53,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3835960.0, ans=0.1 2023-11-29 05:46:53,790 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.74 vs. limit=10.0 2023-11-29 05:46:58,950 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575400 2023-11-29 05:47:18,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3836093.3333333335, ans=0.0 2023-11-29 05:47:30,872 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10300, loss[loss=0.06742, simple_loss=0.08909, pruned_loss=0.0137, audio_tagging_loss=0.009176, over 15543.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08889, pruned_loss=0.01202, audio_tagging_loss=0.008692, over 3051714.77 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:47:48,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3836226.6666666665, ans=0.0 2023-11-29 05:47:50,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3836226.6666666665, ans=0.0 2023-11-29 05:48:00,367 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575450 2023-11-29 05:48:14,972 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.198e+01 9.831e+01 1.050e+02 1.376e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-29 05:48:31,798 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10350, loss[loss=0.07461, simple_loss=0.1086, pruned_loss=0.01247, audio_tagging_loss=0.007821, over 14887.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08861, pruned_loss=0.01196, audio_tagging_loss=0.008805, over 3048398.40 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:48:54,408 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.37 vs. limit=15.0 2023-11-29 05:49:01,325 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575500 2023-11-29 05:49:08,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3836693.3333333335, ans=0.0 2023-11-29 05:49:13,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3836693.3333333335, ans=0.0 2023-11-29 05:49:31,945 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10400, loss[loss=0.07621, simple_loss=0.1029, pruned_loss=0.01519, audio_tagging_loss=0.009564, over 16617.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.0882, pruned_loss=0.01179, audio_tagging_loss=0.008871, over 3039814.12 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:50:00,919 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575550 2023-11-29 05:50:15,792 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.305e+01 9.181e+01 9.898e+01 1.077e+02 1.252e+02, threshold=1.980e+02, percent-clipped=0.0 2023-11-29 05:50:25,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3837093.3333333335, ans=0.1 2023-11-29 05:50:32,038 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10450, loss[loss=0.0702, simple_loss=0.09711, pruned_loss=0.01226, audio_tagging_loss=0.009381, over 16244.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08795, pruned_loss=0.01169, audio_tagging_loss=0.008922, over 3048207.31 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:50:47,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3837226.6666666665, ans=0.125 2023-11-29 05:51:02,144 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575600 2023-11-29 05:51:19,060 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=12.0 2023-11-29 05:51:33,390 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10500, loss[loss=0.06203, simple_loss=0.08617, pruned_loss=0.01214, audio_tagging_loss=0.006801, over 15162.00 frames. ], tot_loss[loss=0.0644, simple_loss=0.08788, pruned_loss=0.01163, audio_tagging_loss=0.008833, over 3049523.50 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:51:39,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3837493.3333333335, ans=0.125 2023-11-29 05:51:39,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3837493.3333333335, ans=0.2 2023-11-29 05:51:55,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3837560.0, ans=0.1 2023-11-29 05:52:01,908 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575650 2023-11-29 05:52:04,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3837626.6666666665, ans=0.0 2023-11-29 05:52:08,131 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.19 vs. limit=22.5 2023-11-29 05:52:16,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.176e+01 9.059e+01 9.610e+01 1.014e+02 2.042e+02, threshold=1.922e+02, percent-clipped=1.0 2023-11-29 05:52:17,109 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:52:25,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3837760.0, ans=0.1 2023-11-29 05:52:33,992 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10550, loss[loss=0.0672, simple_loss=0.09366, pruned_loss=0.01352, audio_tagging_loss=0.006849, over 14957.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08834, pruned_loss=0.0118, audio_tagging_loss=0.008774, over 3048000.93 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 05:52:38,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3837826.6666666665, ans=0.125 2023-11-29 05:52:43,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3837826.6666666665, ans=0.0 2023-11-29 05:52:56,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3837960.0, ans=0.125 2023-11-29 05:53:03,046 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575700 2023-11-29 05:53:05,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3837960.0, ans=0.125 2023-11-29 05:53:16,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3838026.6666666665, ans=0.125 2023-11-29 05:53:23,917 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 05:53:34,003 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10600, loss[loss=0.06927, simple_loss=0.09227, pruned_loss=0.01223, audio_tagging_loss=0.0109, over 15762.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08873, pruned_loss=0.01172, audio_tagging_loss=0.008708, over 3051492.97 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:53:47,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3838226.6666666665, ans=0.125 2023-11-29 05:54:04,233 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575750 2023-11-29 05:54:11,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3838360.0, ans=0.0 2023-11-29 05:54:18,872 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 9.196e+01 9.659e+01 1.047e+02 1.317e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-29 05:54:28,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3838426.6666666665, ans=0.1 2023-11-29 05:54:34,924 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10650, loss[loss=0.07493, simple_loss=0.1121, pruned_loss=0.01303, audio_tagging_loss=0.005835, over 15625.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08875, pruned_loss=0.01188, audio_tagging_loss=0.008581, over 3046714.67 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:54:38,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3838493.3333333335, ans=0.125 2023-11-29 05:54:38,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3838493.3333333335, ans=0.125 2023-11-29 05:55:03,687 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575800 2023-11-29 05:55:07,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3838626.6666666665, ans=0.1 2023-11-29 05:55:19,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3838693.3333333335, ans=0.0 2023-11-29 05:55:21,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3838693.3333333335, ans=0.2 2023-11-29 05:55:36,118 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10700, loss[loss=0.06345, simple_loss=0.08697, pruned_loss=0.01141, audio_tagging_loss=0.008554, over 14497.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08912, pruned_loss=0.01191, audio_tagging_loss=0.008534, over 3047419.68 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:55:48,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3838893.3333333335, ans=0.125 2023-11-29 05:56:02,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3838960.0, ans=0.0 2023-11-29 05:56:04,136 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575850 2023-11-29 05:56:21,885 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.915e+01 9.906e+01 1.068e+02 1.666e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 05:56:26,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3839093.3333333335, ans=0.0 2023-11-29 05:56:35,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2023-11-29 05:56:35,674 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10750, loss[loss=0.06447, simple_loss=0.09194, pruned_loss=0.01024, audio_tagging_loss=0.008257, over 14720.00 frames. ], tot_loss[loss=0.06397, simple_loss=0.08756, pruned_loss=0.01163, audio_tagging_loss=0.008552, over 3045253.88 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 05:56:57,059 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2023-11-29 05:56:58,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3839226.6666666665, ans=10.0 2023-11-29 05:57:05,245 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575900 2023-11-29 05:57:18,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3839360.0, ans=0.2 2023-11-29 05:57:32,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3839426.6666666665, ans=0.125 2023-11-29 05:57:36,387 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10800, loss[loss=0.0546, simple_loss=0.0764, pruned_loss=0.007989, audio_tagging_loss=0.008411, over 15174.00 frames. ], tot_loss[loss=0.06416, simple_loss=0.08802, pruned_loss=0.01159, audio_tagging_loss=0.008558, over 3049033.08 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:58:03,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3839626.6666666665, ans=0.0 2023-11-29 05:58:04,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3839626.6666666665, ans=0.125 2023-11-29 05:58:04,880 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 575950 2023-11-29 05:58:12,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3839693.3333333335, ans=0.0 2023-11-29 05:58:14,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3839693.3333333335, ans=0.125 2023-11-29 05:58:21,346 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.991e+01 9.894e+01 1.056e+02 1.415e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 05:58:29,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3839760.0, ans=0.125 2023-11-29 05:58:31,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3839760.0, ans=0.125 2023-11-29 05:58:36,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3839826.6666666665, ans=0.125 2023-11-29 05:58:37,121 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10850, loss[loss=0.05505, simple_loss=0.0788, pruned_loss=0.007953, audio_tagging_loss=0.007693, over 14286.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08982, pruned_loss=0.0118, audio_tagging_loss=0.008461, over 3049163.28 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:59:05,511 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576000 2023-11-29 05:59:38,599 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.45 vs. limit=15.0 2023-11-29 05:59:39,109 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 05:59:40,181 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10900, loss[loss=0.06422, simple_loss=0.1008, pruned_loss=0.008987, audio_tagging_loss=0.004853, over 15448.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08962, pruned_loss=0.01185, audio_tagging_loss=0.00842, over 3050069.82 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 05:59:57,969 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=15.0 2023-11-29 06:00:09,747 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576050 2023-11-29 06:00:14,297 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2023-11-29 06:00:26,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3840360.0, ans=0.125 2023-11-29 06:00:27,251 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.719e+01 8.980e+01 9.608e+01 1.052e+02 1.228e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-29 06:00:36,857 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:00:41,328 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 10950, loss[loss=0.05482, simple_loss=0.07496, pruned_loss=0.01, audio_tagging_loss=0.007338, over 15274.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08916, pruned_loss=0.01185, audio_tagging_loss=0.008537, over 3045870.51 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:01:12,286 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576100 2023-11-29 06:01:12,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3840626.6666666665, ans=0.0 2023-11-29 06:01:14,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3840626.6666666665, ans=0.125 2023-11-29 06:01:22,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3840693.3333333335, ans=0.0 2023-11-29 06:01:27,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3840693.3333333335, ans=0.2 2023-11-29 06:01:30,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3840760.0, ans=0.1 2023-11-29 06:01:34,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3840760.0, ans=0.125 2023-11-29 06:01:42,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3840826.6666666665, ans=0.2 2023-11-29 06:01:43,996 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11000, loss[loss=0.05882, simple_loss=0.07873, pruned_loss=0.00901, audio_tagging_loss=0.01044, over 14866.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.0882, pruned_loss=0.0118, audio_tagging_loss=0.008663, over 3045685.57 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:01:56,958 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:02:01,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3840893.3333333335, ans=0.1 2023-11-29 06:02:04,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3840893.3333333335, ans=0.125 2023-11-29 06:02:06,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3840893.3333333335, ans=0.125 2023-11-29 06:02:13,257 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576150 2023-11-29 06:02:13,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3840960.0, ans=0.125 2023-11-29 06:02:30,131 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.421e+01 8.865e+01 9.537e+01 1.028e+02 1.366e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-29 06:02:32,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3841093.3333333335, ans=22.5 2023-11-29 06:02:41,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3841093.3333333335, ans=0.125 2023-11-29 06:02:41,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3841093.3333333335, ans=0.125 2023-11-29 06:02:45,429 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11050, loss[loss=0.0673, simple_loss=0.09793, pruned_loss=0.01254, audio_tagging_loss=0.005788, over 15020.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08928, pruned_loss=0.01194, audio_tagging_loss=0.008665, over 3048563.66 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:02:56,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3841226.6666666665, ans=0.1 2023-11-29 06:03:07,435 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:03:09,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3841293.3333333335, ans=0.1 2023-11-29 06:03:10,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3841293.3333333335, ans=0.2 2023-11-29 06:03:10,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3841293.3333333335, ans=0.2 2023-11-29 06:03:12,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3841293.3333333335, ans=0.125 2023-11-29 06:03:14,273 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576200 2023-11-29 06:03:33,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3841360.0, ans=0.125 2023-11-29 06:03:47,122 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11100, loss[loss=0.06164, simple_loss=0.08632, pruned_loss=0.009307, audio_tagging_loss=0.009173, over 15775.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08907, pruned_loss=0.01201, audio_tagging_loss=0.008767, over 3060342.65 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:03:55,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3841493.3333333335, ans=0.09899494936611666 2023-11-29 06:04:08,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3841560.0, ans=0.125 2023-11-29 06:04:09,071 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.25 vs. limit=22.5 2023-11-29 06:04:17,525 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576250 2023-11-29 06:04:33,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 9.155e+01 9.764e+01 1.030e+02 1.216e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-29 06:04:38,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3841760.0, ans=0.1 2023-11-29 06:04:47,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3841826.6666666665, ans=0.04949747468305833 2023-11-29 06:04:48,691 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11150, loss[loss=0.0676, simple_loss=0.09189, pruned_loss=0.01364, audio_tagging_loss=0.008011, over 16750.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08892, pruned_loss=0.01194, audio_tagging_loss=0.008765, over 3050565.18 frames. ], batch size: 61, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:04:57,520 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.96 vs. limit=22.5 2023-11-29 06:05:09,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3841893.3333333335, ans=0.125 2023-11-29 06:05:16,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3841960.0, ans=0.0 2023-11-29 06:05:18,588 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576300 2023-11-29 06:05:27,448 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:05:36,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3842093.3333333335, ans=0.0 2023-11-29 06:05:36,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3842093.3333333335, ans=0.1 2023-11-29 06:05:51,031 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11200, loss[loss=0.07376, simple_loss=0.104, pruned_loss=0.01422, audio_tagging_loss=0.007529, over 14990.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08868, pruned_loss=0.01179, audio_tagging_loss=0.008836, over 3047675.66 frames. ], batch size: 57, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 06:05:52,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3842160.0, ans=0.0 2023-11-29 06:05:57,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3842160.0, ans=0.125 2023-11-29 06:06:11,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3842226.6666666665, ans=0.125 2023-11-29 06:06:19,524 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576350 2023-11-29 06:06:22,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3842293.3333333335, ans=0.125 2023-11-29 06:06:22,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.83 vs. limit=22.5 2023-11-29 06:06:37,280 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 8.953e+01 9.671e+01 1.033e+02 1.680e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 06:06:51,536 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11250, loss[loss=0.05777, simple_loss=0.08236, pruned_loss=0.009712, audio_tagging_loss=0.00688, over 14573.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08901, pruned_loss=0.01184, audio_tagging_loss=0.008787, over 3051668.10 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 32.0 2023-11-29 06:06:56,983 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.43 vs. limit=22.5 2023-11-29 06:07:07,723 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:07:11,925 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.56 vs. limit=22.5 2023-11-29 06:07:16,186 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2023-11-29 06:07:21,417 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576400 2023-11-29 06:07:21,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3842626.6666666665, ans=0.125 2023-11-29 06:07:51,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3842826.6666666665, ans=0.0 2023-11-29 06:07:53,064 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11300, loss[loss=0.05707, simple_loss=0.07808, pruned_loss=0.01161, audio_tagging_loss=0.006417, over 15068.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08853, pruned_loss=0.01184, audio_tagging_loss=0.008648, over 3051056.37 frames. ], batch size: 60, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:08:13,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3842893.3333333335, ans=0.125 2023-11-29 06:08:17,832 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.12 vs. limit=10.0 2023-11-29 06:08:20,225 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=22.5 2023-11-29 06:08:23,074 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576450 2023-11-29 06:08:27,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3842960.0, ans=0.0 2023-11-29 06:08:33,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3843026.6666666665, ans=0.0 2023-11-29 06:08:34,647 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2023-11-29 06:08:40,637 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2023-11-29 06:08:42,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.057e+01 9.108e+01 9.647e+01 1.054e+02 1.325e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 06:08:55,274 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11350, loss[loss=0.06248, simple_loss=0.08844, pruned_loss=0.0107, audio_tagging_loss=0.007562, over 16064.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08907, pruned_loss=0.01189, audio_tagging_loss=0.008568, over 3056633.06 frames. ], batch size: 59, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:08:59,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3843160.0, ans=0.1 2023-11-29 06:09:02,405 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.00 vs. limit=22.5 2023-11-29 06:09:04,936 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.41 vs. limit=15.0 2023-11-29 06:09:16,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3843226.6666666665, ans=0.09899494936611666 2023-11-29 06:09:20,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3843293.3333333335, ans=0.125 2023-11-29 06:09:24,788 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576500 2023-11-29 06:09:28,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3843293.3333333335, ans=0.1 2023-11-29 06:09:56,509 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11400, loss[loss=0.0574, simple_loss=0.07562, pruned_loss=0.01018, audio_tagging_loss=0.009408, over 16170.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08913, pruned_loss=0.01195, audio_tagging_loss=0.008548, over 3052016.32 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:10:20,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3843626.6666666665, ans=0.125 2023-11-29 06:10:26,246 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576550 2023-11-29 06:10:36,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3843693.3333333335, ans=0.2 2023-11-29 06:10:39,253 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.92 vs. limit=15.0 2023-11-29 06:10:45,566 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.262e+01 9.236e+01 1.000e+02 1.071e+02 1.321e+02, threshold=2.000e+02, percent-clipped=0.0 2023-11-29 06:10:57,919 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11450, loss[loss=0.04929, simple_loss=0.06668, pruned_loss=0.00791, audio_tagging_loss=0.008037, over 14443.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08917, pruned_loss=0.01193, audio_tagging_loss=0.00848, over 3049317.78 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:11:07,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3843826.6666666665, ans=0.0 2023-11-29 06:11:14,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3843893.3333333335, ans=0.5 2023-11-29 06:11:18,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3843893.3333333335, ans=0.0 2023-11-29 06:11:20,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3843893.3333333335, ans=0.0 2023-11-29 06:11:28,123 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576600 2023-11-29 06:11:28,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3843960.0, ans=0.2 2023-11-29 06:11:39,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3844026.6666666665, ans=0.125 2023-11-29 06:11:58,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3844093.3333333335, ans=0.125 2023-11-29 06:11:59,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3844160.0, ans=0.1 2023-11-29 06:12:00,435 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11500, loss[loss=0.06624, simple_loss=0.09192, pruned_loss=0.01126, audio_tagging_loss=0.009015, over 14330.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.0887, pruned_loss=0.01182, audio_tagging_loss=0.008473, over 3041926.20 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:12:06,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3844160.0, ans=0.1 2023-11-29 06:12:16,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3844226.6666666665, ans=0.125 2023-11-29 06:12:30,141 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576650 2023-11-29 06:12:31,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3844293.3333333335, ans=0.0 2023-11-29 06:12:42,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3844360.0, ans=0.1 2023-11-29 06:12:50,853 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.910e+01 9.568e+01 1.052e+02 1.642e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 06:12:53,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3844426.6666666665, ans=0.125 2023-11-29 06:13:02,151 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11550, loss[loss=0.0832, simple_loss=0.1152, pruned_loss=0.01557, audio_tagging_loss=0.01004, over 15778.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08945, pruned_loss=0.01195, audio_tagging_loss=0.008459, over 3049806.64 frames. ], batch size: 58, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:13:11,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3844493.3333333335, ans=0.125 2023-11-29 06:13:18,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3844560.0, ans=0.0 2023-11-29 06:13:20,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3844560.0, ans=0.0 2023-11-29 06:13:32,419 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576700 2023-11-29 06:13:42,670 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:14:03,742 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11600, loss[loss=0.06564, simple_loss=0.09171, pruned_loss=0.01249, audio_tagging_loss=0.007298, over 15052.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.09015, pruned_loss=0.01204, audio_tagging_loss=0.008414, over 3052915.82 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:14:15,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3844893.3333333335, ans=0.125 2023-11-29 06:14:24,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3844893.3333333335, ans=0.125 2023-11-29 06:14:24,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3844893.3333333335, ans=0.125 2023-11-29 06:14:32,672 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576750 2023-11-29 06:14:38,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3844960.0, ans=0.125 2023-11-29 06:14:54,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=3845093.3333333335, ans=0.5 2023-11-29 06:14:55,095 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.877e+01 9.031e+01 9.516e+01 1.044e+02 1.307e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-29 06:15:02,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3845093.3333333335, ans=0.0 2023-11-29 06:15:05,667 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11650, loss[loss=0.07233, simple_loss=0.1018, pruned_loss=0.01311, audio_tagging_loss=0.008307, over 14502.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08985, pruned_loss=0.012, audio_tagging_loss=0.008436, over 3050148.61 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:15:24,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3845226.6666666665, ans=0.125 2023-11-29 06:15:31,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3845293.3333333335, ans=0.0 2023-11-29 06:15:35,345 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576800 2023-11-29 06:15:38,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3845293.3333333335, ans=0.125 2023-11-29 06:16:03,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3845426.6666666665, ans=0.09899494936611666 2023-11-29 06:16:07,148 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11700, loss[loss=0.06254, simple_loss=0.08925, pruned_loss=0.009851, audio_tagging_loss=0.008063, over 14736.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08872, pruned_loss=0.01181, audio_tagging_loss=0.008522, over 3045207.78 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:16:09,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3845493.3333333335, ans=0.0 2023-11-29 06:16:16,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3845493.3333333335, ans=0.0 2023-11-29 06:16:37,048 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576850 2023-11-29 06:16:55,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3845760.0, ans=0.2 2023-11-29 06:16:56,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3845760.0, ans=0.125 2023-11-29 06:16:58,533 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 8.889e+01 9.558e+01 1.009e+02 1.379e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-29 06:17:00,194 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2023-11-29 06:17:09,139 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11750, loss[loss=0.07359, simple_loss=0.1063, pruned_loss=0.01401, audio_tagging_loss=0.006443, over 15379.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08998, pruned_loss=0.01195, audio_tagging_loss=0.008443, over 3047913.45 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:17:16,869 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=22.5 2023-11-29 06:17:38,164 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576900 2023-11-29 06:17:39,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3845960.0, ans=0.1 2023-11-29 06:18:10,113 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11800, loss[loss=0.09867, simple_loss=0.1478, pruned_loss=0.01751, audio_tagging_loss=0.007274, over 15771.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.09016, pruned_loss=0.01193, audio_tagging_loss=0.008454, over 3045004.98 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:18:12,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3846160.0, ans=0.1 2023-11-29 06:18:21,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3846226.6666666665, ans=0.0 2023-11-29 06:18:21,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3846226.6666666665, ans=0.1 2023-11-29 06:18:27,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3846226.6666666665, ans=0.1 2023-11-29 06:18:38,845 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 576950 2023-11-29 06:18:45,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3846360.0, ans=0.125 2023-11-29 06:18:52,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3846360.0, ans=0.0 2023-11-29 06:18:53,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3846360.0, ans=0.125 2023-11-29 06:19:00,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3846426.6666666665, ans=0.125 2023-11-29 06:19:01,314 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 9.085e+01 9.909e+01 1.081e+02 1.450e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-29 06:19:08,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3846426.6666666665, ans=0.125 2023-11-29 06:19:10,580 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11850, loss[loss=0.05799, simple_loss=0.08281, pruned_loss=0.007272, audio_tagging_loss=0.009318, over 14987.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08875, pruned_loss=0.01168, audio_tagging_loss=0.008596, over 3044843.16 frames. ], batch size: 54, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:19:30,178 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2023-11-29 06:19:37,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3846626.6666666665, ans=0.125 2023-11-29 06:19:40,302 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577000 2023-11-29 06:19:44,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3846626.6666666665, ans=0.125 2023-11-29 06:19:52,167 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=22.5 2023-11-29 06:19:56,711 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=15.0 2023-11-29 06:20:11,087 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11900, loss[loss=0.07227, simple_loss=0.09704, pruned_loss=0.01541, audio_tagging_loss=0.008346, over 16313.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08925, pruned_loss=0.01174, audio_tagging_loss=0.008619, over 3047608.41 frames. ], batch size: 62, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:20:11,695 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=22.5 2023-11-29 06:20:18,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3846826.6666666665, ans=0.125 2023-11-29 06:20:41,437 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577050 2023-11-29 06:20:46,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3846960.0, ans=0.125 2023-11-29 06:20:56,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3847026.6666666665, ans=10.0 2023-11-29 06:21:02,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3847093.3333333335, ans=0.1 2023-11-29 06:21:02,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3847093.3333333335, ans=0.025 2023-11-29 06:21:02,971 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.818e+01 9.006e+01 9.638e+01 1.018e+02 1.407e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-29 06:21:10,111 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.65 vs. limit=6.0 2023-11-29 06:21:13,260 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.37 vs. limit=10.0 2023-11-29 06:21:13,617 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 11950, loss[loss=0.06131, simple_loss=0.07747, pruned_loss=0.01046, audio_tagging_loss=0.01211, over 14730.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08883, pruned_loss=0.01178, audio_tagging_loss=0.008715, over 3050570.39 frames. ], batch size: 56, lr: 1.40e-03, grad_scale: 8.0 2023-11-29 06:21:19,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3847160.0, ans=0.2 2023-11-29 06:21:42,245 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577100 2023-11-29 06:21:55,585 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.61 vs. limit=15.0 2023-11-29 06:22:00,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3847426.6666666665, ans=0.07 2023-11-29 06:22:12,415 INFO [train_asr.py:1235] (2/4) Epoch 48, batch 12000, loss[loss=0.0637, simple_loss=0.08553, pruned_loss=0.01052, audio_tagging_loss=0.01041, over 15143.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.0888, pruned_loss=0.01169, audio_tagging_loss=0.008813, over 3048064.09 frames. ], batch size: 55, lr: 1.40e-03, grad_scale: 16.0 2023-11-29 06:22:12,416 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 06:22:30,916 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.7704, 3.0585, 2.8377, 3.4263, 3.0458, 3.1138, 3.2286, 2.9754], device='cuda:2') 2023-11-29 06:22:52,519 INFO [train_asr.py:1267] (2/4) Epoch 48, validation: loss=0.05839, simple_loss=0.05056, pruned_loss=0.005496, audio_tagging_loss=0.02761, over 4681554.00 frames. 2023-11-29 06:22:52,520 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 06:22:55,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3847493.3333333335, ans=0.1 2023-11-29 06:23:05,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3847560.0, ans=0.0 2023-11-29 06:23:15,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3847626.6666666665, ans=0.2 2023-11-29 06:23:44,067 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 0, loss[loss=0.08845, simple_loss=0.112, pruned_loss=0.01754, audio_tagging_loss=0.01489, over 16162.00 frames. ], tot_loss[loss=0.08845, simple_loss=0.112, pruned_loss=0.01754, audio_tagging_loss=0.01489, over 16162.00 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:23:44,068 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 06:24:02,092 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7389, 5.7209, 5.8161, 5.7651], device='cuda:2') 2023-11-29 06:24:12,104 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3297, 4.3130, 4.4921, 4.4399], device='cuda:2') 2023-11-29 06:24:16,102 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8153, 4.9838, 5.0952, 4.9238], device='cuda:2') 2023-11-29 06:24:20,383 INFO [train_asr.py:1267] (2/4) Epoch 49, validation: loss=0.05827, simple_loss=0.05045, pruned_loss=0.005376, audio_tagging_loss=0.02767, over 4681554.00 frames. 2023-11-29 06:24:20,384 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 06:24:20,501 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577150 2023-11-29 06:24:20,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3847653.3333333335, ans=0.09899494936611666 2023-11-29 06:24:21,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3847653.3333333335, ans=0.2 2023-11-29 06:24:42,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 9.193e+01 9.994e+01 1.113e+02 1.489e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-29 06:24:44,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3847786.6666666665, ans=0.1 2023-11-29 06:24:44,657 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=22.5 2023-11-29 06:24:52,814 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.12 vs. limit=15.0 2023-11-29 06:24:59,293 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2023-11-29 06:25:07,080 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2023-11-29 06:25:22,989 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 50, loss[loss=0.06534, simple_loss=0.08094, pruned_loss=0.009173, audio_tagging_loss=0.01569, over 15391.00 frames. ], tot_loss[loss=0.07517, simple_loss=0.09265, pruned_loss=0.01288, audio_tagging_loss=0.01597, over 689478.70 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:25:23,066 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577200 2023-11-29 06:25:29,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3847986.6666666665, ans=0.125 2023-11-29 06:25:29,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3847986.6666666665, ans=0.0 2023-11-29 06:25:45,870 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=12.0 2023-11-29 06:25:57,365 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-29 06:26:12,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3848253.3333333335, ans=0.1 2023-11-29 06:26:12,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3848253.3333333335, ans=0.125 2023-11-29 06:26:17,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3848253.3333333335, ans=0.07 2023-11-29 06:26:25,054 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 100, loss[loss=0.07071, simple_loss=0.07978, pruned_loss=0.01524, audio_tagging_loss=0.01557, over 13426.00 frames. ], tot_loss[loss=0.07176, simple_loss=0.08822, pruned_loss=0.01207, audio_tagging_loss=0.01557, over 1203033.65 frames. ], batch size: 52, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:26:25,132 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577250 2023-11-29 06:26:37,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3848386.6666666665, ans=0.1 2023-11-29 06:26:49,321 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 9.815e+01 1.050e+02 1.112e+02 1.329e+02, threshold=2.101e+02, percent-clipped=0.0 2023-11-29 06:26:56,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3848453.3333333335, ans=0.0 2023-11-29 06:27:06,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3848520.0, ans=0.125 2023-11-29 06:27:12,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2023-11-29 06:27:27,356 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 150, loss[loss=0.09013, simple_loss=0.13, pruned_loss=0.01614, audio_tagging_loss=0.008974, over 15079.00 frames. ], tot_loss[loss=0.0695, simple_loss=0.08775, pruned_loss=0.01168, audio_tagging_loss=0.01395, over 1611842.66 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:27:27,439 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577300 2023-11-29 06:28:09,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3848853.3333333335, ans=0.125 2023-11-29 06:28:12,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3848853.3333333335, ans=0.125 2023-11-29 06:28:16,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3848920.0, ans=0.125 2023-11-29 06:28:31,061 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 200, loss[loss=0.05201, simple_loss=0.07584, pruned_loss=0.005969, audio_tagging_loss=0.008125, over 15212.00 frames. ], tot_loss[loss=0.06853, simple_loss=0.08885, pruned_loss=0.01174, audio_tagging_loss=0.01237, over 1928065.35 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:28:31,142 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577350 2023-11-29 06:28:41,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3849053.3333333335, ans=0.125 2023-11-29 06:28:43,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3849053.3333333335, ans=0.2 2023-11-29 06:28:53,800 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.143e+01 9.385e+01 9.861e+01 1.084e+02 1.515e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-29 06:29:03,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3849120.0, ans=0.125 2023-11-29 06:29:10,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3849186.6666666665, ans=0.1 2023-11-29 06:29:25,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3849253.3333333335, ans=0.125 2023-11-29 06:29:31,520 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 250, loss[loss=0.0832, simple_loss=0.1195, pruned_loss=0.01629, audio_tagging_loss=0.007139, over 15683.00 frames. ], tot_loss[loss=0.06787, simple_loss=0.08935, pruned_loss=0.01194, audio_tagging_loss=0.01125, over 2175168.29 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:29:31,613 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577400 2023-11-29 06:29:34,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3849320.0, ans=0.125 2023-11-29 06:29:42,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3849320.0, ans=0.125 2023-11-29 06:30:05,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3849453.3333333335, ans=0.2 2023-11-29 06:30:06,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3849453.3333333335, ans=0.125 2023-11-29 06:30:32,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3849586.6666666665, ans=0.125 2023-11-29 06:30:34,224 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 300, loss[loss=0.07378, simple_loss=0.1041, pruned_loss=0.01435, audio_tagging_loss=0.007379, over 14584.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.08998, pruned_loss=0.0121, audio_tagging_loss=0.01033, over 2369744.95 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:30:34,310 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577450 2023-11-29 06:30:34,872 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2023-11-29 06:30:58,196 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.896e+01 9.309e+01 1.014e+02 1.083e+02 1.326e+02, threshold=2.029e+02, percent-clipped=0.0 2023-11-29 06:31:06,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3849786.6666666665, ans=0.1 2023-11-29 06:31:15,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3849853.3333333335, ans=0.125 2023-11-29 06:31:29,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3849920.0, ans=0.2 2023-11-29 06:31:37,014 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 350, loss[loss=0.07802, simple_loss=0.1148, pruned_loss=0.01249, audio_tagging_loss=0.008151, over 15497.00 frames. ], tot_loss[loss=0.06705, simple_loss=0.09033, pruned_loss=0.01214, audio_tagging_loss=0.009744, over 2520243.94 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:31:37,104 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577500 2023-11-29 06:31:55,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3850053.3333333335, ans=0.125 2023-11-29 06:32:08,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3850120.0, ans=0.2 2023-11-29 06:32:10,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3850120.0, ans=0.125 2023-11-29 06:32:11,980 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.91 vs. limit=15.0 2023-11-29 06:32:15,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3850186.6666666665, ans=0.0 2023-11-29 06:32:39,090 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 400, loss[loss=0.07123, simple_loss=0.1011, pruned_loss=0.01365, audio_tagging_loss=0.007056, over 15774.00 frames. ], tot_loss[loss=0.06656, simple_loss=0.0898, pruned_loss=0.01228, audio_tagging_loss=0.009385, over 2634752.67 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:32:39,166 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577550 2023-11-29 06:32:42,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3850320.0, ans=0.0 2023-11-29 06:32:59,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3850386.6666666665, ans=0.1 2023-11-29 06:33:00,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3850386.6666666665, ans=0.125 2023-11-29 06:33:02,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.275e+01 8.755e+01 9.458e+01 1.037e+02 1.447e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-29 06:33:15,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3850520.0, ans=0.0 2023-11-29 06:33:41,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3850653.3333333335, ans=0.0 2023-11-29 06:33:41,967 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 450, loss[loss=0.06353, simple_loss=0.08927, pruned_loss=0.01131, audio_tagging_loss=0.007576, over 14793.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.08902, pruned_loss=0.01207, audio_tagging_loss=0.009157, over 2722229.90 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:33:42,052 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577600 2023-11-29 06:34:06,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3850786.6666666665, ans=0.2 2023-11-29 06:34:22,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3850853.3333333335, ans=0.0 2023-11-29 06:34:45,306 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 500, loss[loss=0.07966, simple_loss=0.1154, pruned_loss=0.01523, audio_tagging_loss=0.006732, over 14595.00 frames. ], tot_loss[loss=0.06632, simple_loss=0.09013, pruned_loss=0.01229, audio_tagging_loss=0.008955, over 2796520.79 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:34:45,421 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577650 2023-11-29 06:34:45,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3850986.6666666665, ans=0.125 2023-11-29 06:34:45,730 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:34:58,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3851053.3333333335, ans=0.125 2023-11-29 06:35:09,254 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 8.909e+01 9.530e+01 1.043e+02 1.565e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-29 06:35:22,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3851186.6666666665, ans=10.0 2023-11-29 06:35:22,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3851186.6666666665, ans=0.125 2023-11-29 06:35:47,390 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 550, loss[loss=0.06273, simple_loss=0.07923, pruned_loss=0.01246, audio_tagging_loss=0.01066, over 16244.00 frames. ], tot_loss[loss=0.06589, simple_loss=0.08967, pruned_loss=0.01215, audio_tagging_loss=0.008909, over 2850881.69 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:35:47,475 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577700 2023-11-29 06:35:47,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3851320.0, ans=0.1 2023-11-29 06:35:52,733 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.56 vs. limit=22.5 2023-11-29 06:36:01,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3851386.6666666665, ans=0.125 2023-11-29 06:36:11,039 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.38 vs. limit=15.0 2023-11-29 06:36:46,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3851586.6666666665, ans=0.0 2023-11-29 06:36:49,872 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 600, loss[loss=0.08573, simple_loss=0.1232, pruned_loss=0.01772, audio_tagging_loss=0.006397, over 15094.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08901, pruned_loss=0.01197, audio_tagging_loss=0.008888, over 2889915.82 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:36:49,995 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577750 2023-11-29 06:36:59,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3851653.3333333335, ans=0.125 2023-11-29 06:37:03,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3851720.0, ans=0.125 2023-11-29 06:37:06,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3851720.0, ans=0.125 2023-11-29 06:37:14,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 8.849e+01 9.501e+01 1.048e+02 1.415e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-29 06:37:21,283 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.59 vs. limit=10.0 2023-11-29 06:37:22,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3851786.6666666665, ans=0.1 2023-11-29 06:37:25,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3851786.6666666665, ans=0.04949747468305833 2023-11-29 06:37:38,352 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2023-11-29 06:37:47,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3851920.0, ans=0.1 2023-11-29 06:37:52,653 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 650, loss[loss=0.08261, simple_loss=0.1191, pruned_loss=0.01582, audio_tagging_loss=0.00725, over 15694.00 frames. ], tot_loss[loss=0.06557, simple_loss=0.08928, pruned_loss=0.01211, audio_tagging_loss=0.008818, over 2923499.12 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:37:52,737 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577800 2023-11-29 06:37:52,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3851986.6666666665, ans=0.0 2023-11-29 06:37:59,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3851986.6666666665, ans=0.0 2023-11-29 06:38:00,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3851986.6666666665, ans=0.125 2023-11-29 06:38:04,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3852053.3333333335, ans=0.07 2023-11-29 06:38:20,630 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:38:53,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3852253.3333333335, ans=0.2 2023-11-29 06:38:55,859 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 700, loss[loss=0.0771, simple_loss=0.09647, pruned_loss=0.02227, audio_tagging_loss=0.006594, over 14274.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08893, pruned_loss=0.012, audio_tagging_loss=0.008754, over 2950721.82 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:38:55,956 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577850 2023-11-29 06:38:57,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3852320.0, ans=0.125 2023-11-29 06:39:20,735 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 9.141e+01 9.968e+01 1.043e+02 1.174e+02, threshold=1.994e+02, percent-clipped=0.0 2023-11-29 06:39:22,284 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:39:43,310 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=15.0 2023-11-29 06:39:58,581 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 750, loss[loss=0.07836, simple_loss=0.1098, pruned_loss=0.01399, audio_tagging_loss=0.009492, over 14691.00 frames. ], tot_loss[loss=0.06537, simple_loss=0.08942, pruned_loss=0.01197, audio_tagging_loss=0.00869, over 2970624.70 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:39:58,661 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577900 2023-11-29 06:40:00,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3852653.3333333335, ans=0.0 2023-11-29 06:40:09,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3852653.3333333335, ans=0.1 2023-11-29 06:40:44,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3852853.3333333335, ans=0.0 2023-11-29 06:41:01,459 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 800, loss[loss=0.06417, simple_loss=0.08971, pruned_loss=0.01009, audio_tagging_loss=0.009229, over 15143.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08923, pruned_loss=0.0119, audio_tagging_loss=0.008734, over 2989975.05 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:41:01,531 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 577950 2023-11-29 06:41:04,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3852986.6666666665, ans=0.125 2023-11-29 06:41:26,072 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 9.191e+01 9.688e+01 1.032e+02 1.219e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 06:41:27,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3853120.0, ans=0.2 2023-11-29 06:42:04,091 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 850, loss[loss=0.06486, simple_loss=0.08471, pruned_loss=0.01202, audio_tagging_loss=0.01048, over 14335.00 frames. ], tot_loss[loss=0.06508, simple_loss=0.08892, pruned_loss=0.0118, audio_tagging_loss=0.008822, over 2995987.84 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:42:04,172 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578000 2023-11-29 06:42:16,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3853386.6666666665, ans=0.125 2023-11-29 06:42:24,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3853386.6666666665, ans=0.1 2023-11-29 06:42:59,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3853586.6666666665, ans=0.125 2023-11-29 06:43:05,879 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 900, loss[loss=0.05878, simple_loss=0.08168, pruned_loss=0.008346, audio_tagging_loss=0.009592, over 15093.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08902, pruned_loss=0.01181, audio_tagging_loss=0.008862, over 3002782.76 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:43:05,951 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578050 2023-11-29 06:43:12,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3853653.3333333335, ans=0.1 2023-11-29 06:43:17,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3853653.3333333335, ans=0.0 2023-11-29 06:43:30,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3853720.0, ans=0.125 2023-11-29 06:43:33,418 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.882e+01 9.400e+01 1.003e+02 1.065e+02 1.240e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-29 06:43:51,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3853853.3333333335, ans=10.0 2023-11-29 06:44:00,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3853920.0, ans=0.2 2023-11-29 06:44:00,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3853920.0, ans=0.0 2023-11-29 06:44:01,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3853920.0, ans=0.0 2023-11-29 06:44:09,219 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 950, loss[loss=0.05049, simple_loss=0.06553, pruned_loss=0.0108, audio_tagging_loss=0.006925, over 15717.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08858, pruned_loss=0.01173, audio_tagging_loss=0.008868, over 3009382.36 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:44:09,314 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578100 2023-11-29 06:44:11,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2023-11-29 06:44:31,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3854053.3333333335, ans=0.2 2023-11-29 06:44:32,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3854120.0, ans=0.125 2023-11-29 06:44:36,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3854120.0, ans=0.125 2023-11-29 06:44:44,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3854186.6666666665, ans=0.1 2023-11-29 06:44:50,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3854186.6666666665, ans=0.0 2023-11-29 06:44:58,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3854253.3333333335, ans=0.125 2023-11-29 06:45:11,230 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1000, loss[loss=0.06711, simple_loss=0.09791, pruned_loss=0.01166, audio_tagging_loss=0.006491, over 14165.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08841, pruned_loss=0.01172, audio_tagging_loss=0.008717, over 3016059.93 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:45:11,325 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578150 2023-11-29 06:45:18,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3854320.0, ans=0.0 2023-11-29 06:45:22,429 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.21 vs. limit=12.0 2023-11-29 06:45:24,388 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:45:37,175 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.501e+01 8.899e+01 9.614e+01 1.019e+02 1.244e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-29 06:45:39,603 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:45:43,344 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:46:12,488 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1050, loss[loss=0.06095, simple_loss=0.08584, pruned_loss=0.008267, audio_tagging_loss=0.009763, over 15845.00 frames. ], tot_loss[loss=0.06468, simple_loss=0.08865, pruned_loss=0.01176, audio_tagging_loss=0.008593, over 3027152.56 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:46:12,572 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578200 2023-11-29 06:46:23,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3854653.3333333335, ans=0.125 2023-11-29 06:46:41,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3854786.6666666665, ans=0.125 2023-11-29 06:46:58,179 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=12.0 2023-11-29 06:47:04,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3854920.0, ans=0.0 2023-11-29 06:47:09,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3854920.0, ans=0.2 2023-11-29 06:47:15,097 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1100, loss[loss=0.0696, simple_loss=0.1001, pruned_loss=0.01368, audio_tagging_loss=0.005869, over 14838.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08937, pruned_loss=0.01189, audio_tagging_loss=0.00851, over 3033834.79 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:47:15,162 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578250 2023-11-29 06:47:17,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3854986.6666666665, ans=0.5 2023-11-29 06:47:19,564 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:47:28,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3855053.3333333335, ans=0.125 2023-11-29 06:47:40,643 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 9.246e+01 9.671e+01 1.044e+02 1.404e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 06:47:49,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3855120.0, ans=0.125 2023-11-29 06:48:18,221 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1150, loss[loss=0.06701, simple_loss=0.09142, pruned_loss=0.009792, audio_tagging_loss=0.01151, over 16410.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08935, pruned_loss=0.01188, audio_tagging_loss=0.008392, over 3033174.92 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:48:18,351 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578300 2023-11-29 06:48:22,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3855320.0, ans=0.0 2023-11-29 06:48:48,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3855453.3333333335, ans=0.125 2023-11-29 06:48:55,466 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2023-11-29 06:49:03,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3855520.0, ans=0.125 2023-11-29 06:49:05,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3855520.0, ans=0.1 2023-11-29 06:49:06,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3855586.6666666665, ans=0.125 2023-11-29 06:49:09,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3855586.6666666665, ans=0.125 2023-11-29 06:49:09,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3855586.6666666665, ans=0.125 2023-11-29 06:49:19,158 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2023-11-29 06:49:19,691 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1200, loss[loss=0.0716, simple_loss=0.09529, pruned_loss=0.01332, audio_tagging_loss=0.01064, over 14214.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08933, pruned_loss=0.01189, audio_tagging_loss=0.00832, over 3036080.99 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:49:19,771 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578350 2023-11-29 06:49:33,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3855720.0, ans=0.125 2023-11-29 06:49:36,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3855720.0, ans=0.0 2023-11-29 06:49:36,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3855720.0, ans=0.125 2023-11-29 06:49:47,192 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.561e+01 8.935e+01 9.457e+01 1.024e+02 1.157e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-29 06:50:15,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3855920.0, ans=0.125 2023-11-29 06:50:17,561 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-11-29 06:50:18,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3855920.0, ans=0.1 2023-11-29 06:50:21,569 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1250, loss[loss=0.07378, simple_loss=0.1059, pruned_loss=0.01289, audio_tagging_loss=0.007932, over 16701.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08917, pruned_loss=0.01176, audio_tagging_loss=0.008372, over 3040548.90 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:50:21,668 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578400 2023-11-29 06:50:24,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3855986.6666666665, ans=0.09899494936611666 2023-11-29 06:50:28,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3855986.6666666665, ans=0.0 2023-11-29 06:50:43,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3856053.3333333335, ans=0.05 2023-11-29 06:51:03,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3856186.6666666665, ans=0.125 2023-11-29 06:51:05,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3856186.6666666665, ans=0.0 2023-11-29 06:51:05,643 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2023-11-29 06:51:10,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3856253.3333333335, ans=0.0 2023-11-29 06:51:24,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3856320.0, ans=0.0 2023-11-29 06:51:24,784 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1300, loss[loss=0.06166, simple_loss=0.08258, pruned_loss=0.009892, audio_tagging_loss=0.01047, over 16234.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08881, pruned_loss=0.01173, audio_tagging_loss=0.008387, over 3041305.36 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:51:24,856 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578450 2023-11-29 06:51:27,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3856320.0, ans=0.1 2023-11-29 06:51:31,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3856320.0, ans=0.2 2023-11-29 06:51:33,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3856320.0, ans=0.125 2023-11-29 06:51:39,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3856386.6666666665, ans=0.0 2023-11-29 06:51:49,609 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 06:51:50,516 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.203e+01 8.934e+01 9.381e+01 1.015e+02 1.347e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-29 06:52:00,925 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2023-11-29 06:52:09,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3856520.0, ans=0.025 2023-11-29 06:52:23,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3856586.6666666665, ans=0.0 2023-11-29 06:52:25,835 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1350, loss[loss=0.06069, simple_loss=0.07955, pruned_loss=0.01237, audio_tagging_loss=0.008539, over 15987.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.09, pruned_loss=0.01183, audio_tagging_loss=0.008312, over 3047000.96 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:52:25,913 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578500 2023-11-29 06:52:39,878 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.77 vs. limit=22.5 2023-11-29 06:52:44,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3856720.0, ans=0.125 2023-11-29 06:52:44,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3856720.0, ans=0.2 2023-11-29 06:52:56,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3856786.6666666665, ans=0.0 2023-11-29 06:52:59,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3856786.6666666665, ans=0.0 2023-11-29 06:53:05,704 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2023-11-29 06:53:06,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3856853.3333333335, ans=0.2 2023-11-29 06:53:10,891 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 06:53:14,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3856920.0, ans=0.125 2023-11-29 06:53:17,480 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.81 vs. limit=6.0 2023-11-29 06:53:20,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3856920.0, ans=0.2 2023-11-29 06:53:26,927 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1400, loss[loss=0.05334, simple_loss=0.0681, pruned_loss=0.007697, audio_tagging_loss=0.0116, over 15156.00 frames. ], tot_loss[loss=0.06544, simple_loss=0.09019, pruned_loss=0.01196, audio_tagging_loss=0.008391, over 3046183.27 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:53:27,013 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578550 2023-11-29 06:53:34,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3856986.6666666665, ans=0.0 2023-11-29 06:53:48,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3857053.3333333335, ans=0.2 2023-11-29 06:53:49,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3857053.3333333335, ans=0.125 2023-11-29 06:53:54,948 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.714e+01 9.091e+01 9.742e+01 1.050e+02 1.544e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 06:54:06,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3857186.6666666665, ans=0.0 2023-11-29 06:54:08,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3857186.6666666665, ans=0.125 2023-11-29 06:54:29,531 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1450, loss[loss=0.04667, simple_loss=0.05862, pruned_loss=0.008174, audio_tagging_loss=0.009192, over 15012.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08993, pruned_loss=0.01191, audio_tagging_loss=0.008523, over 3049442.74 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:54:29,616 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578600 2023-11-29 06:55:00,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3857453.3333333335, ans=0.125 2023-11-29 06:55:12,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3857520.0, ans=0.0 2023-11-29 06:55:12,976 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.97 vs. limit=12.0 2023-11-29 06:55:19,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3857586.6666666665, ans=0.125 2023-11-29 06:55:31,273 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1500, loss[loss=0.05179, simple_loss=0.07782, pruned_loss=0.005235, audio_tagging_loss=0.007639, over 16687.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08963, pruned_loss=0.01197, audio_tagging_loss=0.008563, over 3048754.35 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:55:31,357 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578650 2023-11-29 06:55:37,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3857653.3333333335, ans=0.2 2023-11-29 06:55:47,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3857720.0, ans=0.05 2023-11-29 06:55:52,577 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.80 vs. limit=10.0 2023-11-29 06:55:57,797 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 9.098e+01 9.715e+01 1.024e+02 1.252e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-29 06:56:24,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3857920.0, ans=0.125 2023-11-29 06:56:26,207 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.23 vs. limit=15.0 2023-11-29 06:56:32,891 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1550, loss[loss=0.06584, simple_loss=0.09464, pruned_loss=0.01034, audio_tagging_loss=0.008181, over 15943.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08935, pruned_loss=0.01183, audio_tagging_loss=0.008659, over 3052656.78 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:56:32,984 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578700 2023-11-29 06:56:34,565 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=22.5 2023-11-29 06:56:37,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3857986.6666666665, ans=0.2 2023-11-29 06:57:00,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3858120.0, ans=0.0 2023-11-29 06:57:08,195 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.57 vs. limit=10.0 2023-11-29 06:57:34,254 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1600, loss[loss=0.05091, simple_loss=0.06376, pruned_loss=0.009763, audio_tagging_loss=0.009263, over 15525.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.09, pruned_loss=0.01206, audio_tagging_loss=0.008661, over 3049881.35 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:57:34,358 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578750 2023-11-29 06:57:35,003 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=12.0 2023-11-29 06:57:38,984 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.27 vs. limit=10.0 2023-11-29 06:57:47,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3858386.6666666665, ans=0.0 2023-11-29 06:57:51,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3858386.6666666665, ans=0.125 2023-11-29 06:57:59,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3858453.3333333335, ans=0.0 2023-11-29 06:58:00,864 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.518e+01 9.073e+01 9.678e+01 1.045e+02 1.590e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 06:58:17,370 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.64 vs. limit=10.0 2023-11-29 06:58:22,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3858586.6666666665, ans=0.125 2023-11-29 06:58:35,994 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1650, loss[loss=0.06329, simple_loss=0.09275, pruned_loss=0.009662, audio_tagging_loss=0.00725, over 15318.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08915, pruned_loss=0.01203, audio_tagging_loss=0.008745, over 3047140.54 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 06:58:36,094 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578800 2023-11-29 06:59:37,392 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1700, loss[loss=0.05916, simple_loss=0.08186, pruned_loss=0.009796, audio_tagging_loss=0.008432, over 15149.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08912, pruned_loss=0.01199, audio_tagging_loss=0.008747, over 3048204.98 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 06:59:37,477 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578850 2023-11-29 06:59:37,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3858986.6666666665, ans=0.07 2023-11-29 06:59:39,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3858986.6666666665, ans=0.125 2023-11-29 06:59:43,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3858986.6666666665, ans=0.125 2023-11-29 07:00:07,247 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.933e+01 9.056e+01 9.736e+01 1.037e+02 1.295e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-29 07:00:11,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3859120.0, ans=0.0 2023-11-29 07:00:14,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3859186.6666666665, ans=0.125 2023-11-29 07:00:18,542 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.28 vs. limit=6.0 2023-11-29 07:00:39,557 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1750, loss[loss=0.06965, simple_loss=0.1042, pruned_loss=0.01141, audio_tagging_loss=0.00614, over 15567.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09001, pruned_loss=0.01213, audio_tagging_loss=0.00864, over 3051851.15 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:00:39,649 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578900 2023-11-29 07:01:10,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3859453.3333333335, ans=0.0 2023-11-29 07:01:28,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3859586.6666666665, ans=0.125 2023-11-29 07:01:41,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3859653.3333333335, ans=0.0 2023-11-29 07:01:42,530 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1800, loss[loss=0.05754, simple_loss=0.0806, pruned_loss=0.009522, audio_tagging_loss=0.007723, over 15698.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08969, pruned_loss=0.01214, audio_tagging_loss=0.00856, over 3053921.67 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:01:42,623 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 578950 2023-11-29 07:01:50,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3859653.3333333335, ans=0.125 2023-11-29 07:02:11,100 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 9.159e+01 9.748e+01 1.040e+02 1.409e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 07:02:17,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3859786.6666666665, ans=0.1 2023-11-29 07:02:35,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3859920.0, ans=0.07 2023-11-29 07:02:36,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3859920.0, ans=0.0 2023-11-29 07:02:44,380 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1850, loss[loss=0.07387, simple_loss=0.1077, pruned_loss=0.01355, audio_tagging_loss=0.006473, over 15416.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08991, pruned_loss=0.01209, audio_tagging_loss=0.008443, over 3056748.30 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:02:44,461 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579000 2023-11-29 07:02:44,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3859986.6666666665, ans=0.0 2023-11-29 07:03:05,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3860053.3333333335, ans=0.125 2023-11-29 07:03:24,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3860186.6666666665, ans=0.125 2023-11-29 07:03:35,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3860253.3333333335, ans=0.125 2023-11-29 07:03:46,141 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1900, loss[loss=0.07468, simple_loss=0.1085, pruned_loss=0.01453, audio_tagging_loss=0.005916, over 16074.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08994, pruned_loss=0.01201, audio_tagging_loss=0.008408, over 3062302.68 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:03:46,232 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579050 2023-11-29 07:03:46,678 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-29 07:03:51,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3860320.0, ans=0.125 2023-11-29 07:03:52,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3860320.0, ans=0.125 2023-11-29 07:03:55,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3860320.0, ans=0.125 2023-11-29 07:04:01,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3860386.6666666665, ans=0.125 2023-11-29 07:04:02,843 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.66 vs. limit=15.0 2023-11-29 07:04:09,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3860453.3333333335, ans=0.1 2023-11-29 07:04:14,617 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.649e+01 8.930e+01 9.376e+01 1.025e+02 1.828e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-29 07:04:14,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3860453.3333333335, ans=0.1 2023-11-29 07:04:16,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3860453.3333333335, ans=0.2 2023-11-29 07:04:35,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3860586.6666666665, ans=0.0 2023-11-29 07:04:47,572 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 1950, loss[loss=0.05168, simple_loss=0.07134, pruned_loss=0.007002, audio_tagging_loss=0.009009, over 15245.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08998, pruned_loss=0.0121, audio_tagging_loss=0.008359, over 3061084.33 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:04:47,660 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579100 2023-11-29 07:05:03,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3860720.0, ans=0.125 2023-11-29 07:05:03,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3860720.0, ans=0.0 2023-11-29 07:05:10,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3860720.0, ans=0.125 2023-11-29 07:05:14,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3860786.6666666665, ans=0.125 2023-11-29 07:05:18,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3860786.6666666665, ans=0.125 2023-11-29 07:05:27,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3860853.3333333335, ans=0.0 2023-11-29 07:05:45,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3860920.0, ans=0.0 2023-11-29 07:05:48,967 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2000, loss[loss=0.07143, simple_loss=0.09373, pruned_loss=0.01423, audio_tagging_loss=0.01034, over 16624.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08961, pruned_loss=0.01201, audio_tagging_loss=0.008439, over 3054109.14 frames. ], batch size: 64, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:05:49,043 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579150 2023-11-29 07:06:10,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3861053.3333333335, ans=0.125 2023-11-29 07:06:14,134 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.07 vs. limit=22.5 2023-11-29 07:06:16,978 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 9.275e+01 1.004e+02 1.066e+02 1.335e+02, threshold=2.008e+02, percent-clipped=0.0 2023-11-29 07:06:18,936 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2023-11-29 07:06:43,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3861253.3333333335, ans=0.125 2023-11-29 07:06:50,238 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2050, loss[loss=0.05318, simple_loss=0.07227, pruned_loss=0.009121, audio_tagging_loss=0.007925, over 15409.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.089, pruned_loss=0.01196, audio_tagging_loss=0.008428, over 3044711.11 frames. ], batch size: 63, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:06:50,330 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579200 2023-11-29 07:06:50,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3861320.0, ans=0.1 2023-11-29 07:07:13,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3861453.3333333335, ans=0.2 2023-11-29 07:07:44,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3861586.6666666665, ans=0.0 2023-11-29 07:07:51,820 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2100, loss[loss=0.07499, simple_loss=0.1059, pruned_loss=0.01498, audio_tagging_loss=0.007049, over 14842.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08865, pruned_loss=0.01194, audio_tagging_loss=0.008445, over 3042358.96 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:07:51,933 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579250 2023-11-29 07:08:04,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3861720.0, ans=0.125 2023-11-29 07:08:20,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 9.068e+01 9.532e+01 1.017e+02 1.251e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-29 07:08:20,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3861786.6666666665, ans=0.05 2023-11-29 07:08:46,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3861920.0, ans=0.0 2023-11-29 07:08:50,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3861920.0, ans=0.125 2023-11-29 07:08:52,574 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2150, loss[loss=0.06587, simple_loss=0.08607, pruned_loss=0.01506, audio_tagging_loss=0.007773, over 15858.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08781, pruned_loss=0.01178, audio_tagging_loss=0.008518, over 3046021.52 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:08:52,652 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579300 2023-11-29 07:08:53,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3861986.6666666665, ans=0.125 2023-11-29 07:09:09,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3862053.3333333335, ans=0.125 2023-11-29 07:09:29,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3862186.6666666665, ans=0.1 2023-11-29 07:09:31,188 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:09:37,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3862186.6666666665, ans=0.2 2023-11-29 07:09:48,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.01 vs. limit=15.0 2023-11-29 07:09:55,035 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2200, loss[loss=0.05301, simple_loss=0.0684, pruned_loss=0.0101, audio_tagging_loss=0.008712, over 14523.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08924, pruned_loss=0.01204, audio_tagging_loss=0.008447, over 3050437.01 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:09:55,118 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579350 2023-11-29 07:09:55,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3862320.0, ans=0.1 2023-11-29 07:10:02,273 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:10:17,687 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2023-11-29 07:10:22,816 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 9.191e+01 9.631e+01 1.029e+02 1.249e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 07:10:23,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3862453.3333333335, ans=0.95 2023-11-29 07:10:28,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3862453.3333333335, ans=0.125 2023-11-29 07:10:36,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3862520.0, ans=0.125 2023-11-29 07:10:37,950 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.58 vs. limit=22.5 2023-11-29 07:10:45,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3862586.6666666665, ans=0.0 2023-11-29 07:10:53,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3862586.6666666665, ans=0.125 2023-11-29 07:10:55,454 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2250, loss[loss=0.05897, simple_loss=0.08863, pruned_loss=0.009163, audio_tagging_loss=0.005497, over 15639.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08832, pruned_loss=0.01187, audio_tagging_loss=0.008469, over 3049235.98 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:10:55,536 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579400 2023-11-29 07:10:56,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3862653.3333333335, ans=0.2 2023-11-29 07:11:03,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3862653.3333333335, ans=0.1 2023-11-29 07:11:12,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3862720.0, ans=0.0 2023-11-29 07:11:49,906 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.31 vs. limit=15.0 2023-11-29 07:11:54,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3862920.0, ans=0.125 2023-11-29 07:11:55,408 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2023-11-29 07:11:56,106 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2300, loss[loss=0.06058, simple_loss=0.07746, pruned_loss=0.01068, audio_tagging_loss=0.01117, over 14807.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08909, pruned_loss=0.01195, audio_tagging_loss=0.008517, over 3050273.48 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:11:56,173 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579450 2023-11-29 07:12:12,666 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.65 vs. limit=12.0 2023-11-29 07:12:18,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3863053.3333333335, ans=0.2 2023-11-29 07:12:26,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.966e+01 9.023e+01 9.872e+01 1.066e+02 2.413e+02, threshold=1.974e+02, percent-clipped=1.0 2023-11-29 07:12:27,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3863120.0, ans=0.09899494936611666 2023-11-29 07:12:40,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3863186.6666666665, ans=0.0 2023-11-29 07:12:51,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3863253.3333333335, ans=0.0 2023-11-29 07:12:52,600 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:12:59,111 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2350, loss[loss=0.06496, simple_loss=0.09014, pruned_loss=0.01294, audio_tagging_loss=0.006956, over 16120.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08939, pruned_loss=0.01199, audio_tagging_loss=0.00856, over 3054094.19 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:12:59,201 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579500 2023-11-29 07:13:10,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3863386.6666666665, ans=0.0 2023-11-29 07:13:12,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3863386.6666666665, ans=0.0 2023-11-29 07:13:12,564 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2023-11-29 07:13:15,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3863386.6666666665, ans=0.125 2023-11-29 07:13:15,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3863386.6666666665, ans=0.125 2023-11-29 07:13:25,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3863453.3333333335, ans=0.2 2023-11-29 07:14:00,816 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2400, loss[loss=0.05522, simple_loss=0.0704, pruned_loss=0.01073, audio_tagging_loss=0.009294, over 14296.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08915, pruned_loss=0.01204, audio_tagging_loss=0.00857, over 3053153.62 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:14:00,893 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579550 2023-11-29 07:14:09,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3863653.3333333335, ans=0.125 2023-11-29 07:14:18,889 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.50 vs. limit=15.0 2023-11-29 07:14:20,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3863720.0, ans=0.125 2023-11-29 07:14:29,232 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.703e+01 9.216e+01 9.806e+01 1.047e+02 1.244e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-29 07:15:00,887 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2450, loss[loss=0.0651, simple_loss=0.08743, pruned_loss=0.01192, audio_tagging_loss=0.009468, over 16195.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08976, pruned_loss=0.01199, audio_tagging_loss=0.008557, over 3052843.49 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:15:00,969 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579600 2023-11-29 07:15:17,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3864053.3333333335, ans=0.0 2023-11-29 07:15:32,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=3864120.0, ans=0.02 2023-11-29 07:15:36,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.28 vs. limit=6.0 2023-11-29 07:16:02,329 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2500, loss[loss=0.06484, simple_loss=0.08577, pruned_loss=0.0132, audio_tagging_loss=0.008753, over 14782.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08878, pruned_loss=0.01185, audio_tagging_loss=0.008671, over 3048469.75 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:16:02,418 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579650 2023-11-29 07:16:09,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3864320.0, ans=0.125 2023-11-29 07:16:12,180 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.56 vs. limit=22.5 2023-11-29 07:16:31,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.736e+01 9.100e+01 9.554e+01 1.019e+02 1.302e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-29 07:16:35,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3864453.3333333335, ans=0.125 2023-11-29 07:16:41,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3864520.0, ans=0.125 2023-11-29 07:16:47,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3864520.0, ans=0.1 2023-11-29 07:16:49,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3864520.0, ans=0.2 2023-11-29 07:16:51,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3864586.6666666665, ans=0.125 2023-11-29 07:17:04,439 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2550, loss[loss=0.0615, simple_loss=0.08564, pruned_loss=0.008054, audio_tagging_loss=0.01062, over 14271.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08907, pruned_loss=0.01191, audio_tagging_loss=0.008611, over 3047720.16 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:17:04,528 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579700 2023-11-29 07:17:09,125 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.67 vs. limit=15.0 2023-11-29 07:17:42,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3864853.3333333335, ans=0.0 2023-11-29 07:17:46,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3864853.3333333335, ans=0.0 2023-11-29 07:17:49,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3864853.3333333335, ans=0.2 2023-11-29 07:18:02,727 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.88 vs. limit=10.0 2023-11-29 07:18:05,612 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2600, loss[loss=0.05823, simple_loss=0.07265, pruned_loss=0.01239, audio_tagging_loss=0.009524, over 15350.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08853, pruned_loss=0.01187, audio_tagging_loss=0.008489, over 3051669.90 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:18:05,692 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579750 2023-11-29 07:18:30,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3865120.0, ans=0.125 2023-11-29 07:18:36,165 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 8.765e+01 9.414e+01 9.856e+01 2.856e+02, threshold=1.883e+02, percent-clipped=1.0 2023-11-29 07:18:38,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3865120.0, ans=0.125 2023-11-29 07:19:02,737 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=15.0 2023-11-29 07:19:04,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3865320.0, ans=0.125 2023-11-29 07:19:05,831 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2650, loss[loss=0.0631, simple_loss=0.08451, pruned_loss=0.01297, audio_tagging_loss=0.007882, over 15749.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08929, pruned_loss=0.01193, audio_tagging_loss=0.008414, over 3053867.96 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:19:05,911 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579800 2023-11-29 07:19:06,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.86 vs. limit=15.0 2023-11-29 07:19:27,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3865386.6666666665, ans=0.125 2023-11-29 07:19:28,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3865386.6666666665, ans=0.0 2023-11-29 07:19:32,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3865453.3333333335, ans=0.125 2023-11-29 07:19:39,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3865453.3333333335, ans=0.125 2023-11-29 07:19:49,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3865520.0, ans=0.125 2023-11-29 07:20:02,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3865586.6666666665, ans=0.125 2023-11-29 07:20:05,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3865653.3333333335, ans=0.125 2023-11-29 07:20:06,914 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2700, loss[loss=0.05066, simple_loss=0.0661, pruned_loss=0.009695, audio_tagging_loss=0.007919, over 14407.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08925, pruned_loss=0.01204, audio_tagging_loss=0.008338, over 3047237.29 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:20:07,012 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579850 2023-11-29 07:20:16,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.84 vs. limit=15.0 2023-11-29 07:20:31,249 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:20:36,843 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.953e+01 9.168e+01 9.728e+01 1.035e+02 1.379e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 07:20:58,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3865920.0, ans=0.1 2023-11-29 07:21:04,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3865920.0, ans=0.125 2023-11-29 07:21:04,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3865920.0, ans=0.0 2023-11-29 07:21:05,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3865920.0, ans=0.125 2023-11-29 07:21:07,824 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2750, loss[loss=0.06059, simple_loss=0.0728, pruned_loss=0.01363, audio_tagging_loss=0.01056, over 16644.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.088, pruned_loss=0.01182, audio_tagging_loss=0.008304, over 3047859.26 frames. ], batch size: 64, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:21:07,899 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579900 2023-11-29 07:21:09,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=22.5 2023-11-29 07:21:19,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3866053.3333333335, ans=0.1 2023-11-29 07:21:27,880 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2023-11-29 07:21:59,930 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:22:08,093 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2800, loss[loss=0.07285, simple_loss=0.09891, pruned_loss=0.01427, audio_tagging_loss=0.009118, over 14956.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08794, pruned_loss=0.0119, audio_tagging_loss=0.008413, over 3047021.85 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:22:08,165 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 579950 2023-11-29 07:22:19,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3866386.6666666665, ans=0.125 2023-11-29 07:22:37,344 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2023-11-29 07:22:39,164 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 8.979e+01 9.442e+01 1.009e+02 1.188e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-29 07:22:40,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3866453.3333333335, ans=0.0 2023-11-29 07:22:46,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3866520.0, ans=0.0 2023-11-29 07:22:53,461 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:22:57,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3866586.6666666665, ans=0.05 2023-11-29 07:23:09,365 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2850, loss[loss=0.06213, simple_loss=0.08668, pruned_loss=0.01134, audio_tagging_loss=0.007454, over 14001.00 frames. ], tot_loss[loss=0.06376, simple_loss=0.08727, pruned_loss=0.01172, audio_tagging_loss=0.008403, over 3040752.82 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:23:09,442 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580000 2023-11-29 07:23:29,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3866720.0, ans=0.125 2023-11-29 07:23:42,121 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2023-11-29 07:24:13,534 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2900, loss[loss=0.05517, simple_loss=0.07561, pruned_loss=0.008129, audio_tagging_loss=0.009238, over 14865.00 frames. ], tot_loss[loss=0.06383, simple_loss=0.08715, pruned_loss=0.01177, audio_tagging_loss=0.008481, over 3048289.35 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:24:13,615 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580050 2023-11-29 07:24:13,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3866986.6666666665, ans=0.125 2023-11-29 07:24:26,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3867053.3333333335, ans=0.125 2023-11-29 07:24:42,678 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-29 07:24:44,582 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.265e+01 8.980e+01 9.788e+01 1.062e+02 1.550e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-29 07:25:02,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3867253.3333333335, ans=0.1 2023-11-29 07:25:14,047 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 2950, loss[loss=0.09187, simple_loss=0.1265, pruned_loss=0.01903, audio_tagging_loss=0.009588, over 15352.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08868, pruned_loss=0.01198, audio_tagging_loss=0.008549, over 3052213.54 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:25:14,123 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580100 2023-11-29 07:25:44,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3867453.3333333335, ans=0.125 2023-11-29 07:25:56,880 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-11-29 07:26:13,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3867586.6666666665, ans=0.1 2023-11-29 07:26:15,553 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3000, loss[loss=0.0759, simple_loss=0.1114, pruned_loss=0.01355, audio_tagging_loss=0.006632, over 15031.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08991, pruned_loss=0.01222, audio_tagging_loss=0.008501, over 3053265.39 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:26:15,554 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 07:26:41,190 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6148, 3.7579, 4.0045, 3.5418], device='cuda:2') 2023-11-29 07:26:52,782 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.6658, 2.3277, 2.1281, 2.1916, 2.5756, 2.4902, 2.6499, 2.5695], device='cuda:2') 2023-11-29 07:26:54,605 INFO [train_asr.py:1267] (2/4) Epoch 49, validation: loss=0.05747, simple_loss=0.05054, pruned_loss=0.005474, audio_tagging_loss=0.02673, over 4681554.00 frames. 2023-11-29 07:26:54,605 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 07:26:54,686 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580150 2023-11-29 07:27:09,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3867720.0, ans=0.1 2023-11-29 07:27:14,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3867720.0, ans=0.125 2023-11-29 07:27:26,160 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 9.023e+01 9.601e+01 1.027e+02 1.356e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 07:27:26,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3867786.6666666665, ans=0.04949747468305833 2023-11-29 07:27:55,412 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3050, loss[loss=0.0497, simple_loss=0.06023, pruned_loss=0.008295, audio_tagging_loss=0.01129, over 15729.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09025, pruned_loss=0.01227, audio_tagging_loss=0.008526, over 3047380.94 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:27:55,504 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580200 2023-11-29 07:28:04,324 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.51 vs. limit=15.0 2023-11-29 07:28:32,366 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:28:32,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3868186.6666666665, ans=0.125 2023-11-29 07:28:56,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3868320.0, ans=0.04949747468305833 2023-11-29 07:28:57,693 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3100, loss[loss=0.05555, simple_loss=0.07739, pruned_loss=0.008629, audio_tagging_loss=0.008224, over 15898.00 frames. ], tot_loss[loss=0.0663, simple_loss=0.09095, pruned_loss=0.01227, audio_tagging_loss=0.008552, over 3053766.60 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:28:57,793 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580250 2023-11-29 07:29:09,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3868386.6666666665, ans=0.125 2023-11-29 07:29:13,161 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.82 vs. limit=6.0 2023-11-29 07:29:22,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3868453.3333333335, ans=0.0 2023-11-29 07:29:27,917 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.29 vs. limit=15.0 2023-11-29 07:29:29,749 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.864e+01 8.858e+01 9.570e+01 1.021e+02 1.337e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 07:29:59,556 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3150, loss[loss=0.052, simple_loss=0.0691, pruned_loss=0.008258, audio_tagging_loss=0.009194, over 14540.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09036, pruned_loss=0.01224, audio_tagging_loss=0.008611, over 3058697.38 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:29:59,657 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580300 2023-11-29 07:30:01,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3868653.3333333335, ans=0.125 2023-11-29 07:30:12,237 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:30:30,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3868786.6666666665, ans=0.125 2023-11-29 07:30:32,078 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.55 vs. limit=22.5 2023-11-29 07:30:51,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3868920.0, ans=0.125 2023-11-29 07:30:52,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3868920.0, ans=0.1 2023-11-29 07:31:00,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3868986.6666666665, ans=0.1 2023-11-29 07:31:01,060 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3200, loss[loss=0.06206, simple_loss=0.08575, pruned_loss=0.01086, audio_tagging_loss=0.008325, over 15686.00 frames. ], tot_loss[loss=0.06583, simple_loss=0.08997, pruned_loss=0.01214, audio_tagging_loss=0.008715, over 3058781.69 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:31:01,148 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580350 2023-11-29 07:31:04,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3868986.6666666665, ans=0.125 2023-11-29 07:31:13,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3869053.3333333335, ans=0.0 2023-11-29 07:31:14,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3869053.3333333335, ans=0.125 2023-11-29 07:31:33,097 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.044e+01 8.935e+01 9.459e+01 1.020e+02 1.289e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-29 07:31:43,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3869186.6666666665, ans=0.0 2023-11-29 07:31:45,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3869186.6666666665, ans=0.125 2023-11-29 07:32:00,434 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=22.5 2023-11-29 07:32:02,191 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3250, loss[loss=0.05096, simple_loss=0.07161, pruned_loss=0.005555, audio_tagging_loss=0.0096, over 14913.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.0898, pruned_loss=0.01211, audio_tagging_loss=0.008748, over 3057772.49 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:32:02,273 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580400 2023-11-29 07:32:41,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3869520.0, ans=0.0 2023-11-29 07:32:54,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3869586.6666666665, ans=0.1 2023-11-29 07:32:55,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3869586.6666666665, ans=0.125 2023-11-29 07:33:01,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3869586.6666666665, ans=0.125 2023-11-29 07:33:04,505 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3300, loss[loss=0.06846, simple_loss=0.09676, pruned_loss=0.01266, audio_tagging_loss=0.007418, over 15311.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08962, pruned_loss=0.01209, audio_tagging_loss=0.008804, over 3060161.65 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:33:04,601 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580450 2023-11-29 07:33:11,050 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.68 vs. limit=15.0 2023-11-29 07:33:19,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3869720.0, ans=0.125 2023-11-29 07:33:37,734 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.887e+01 8.902e+01 9.466e+01 1.005e+02 1.164e+02, threshold=1.893e+02, percent-clipped=0.0 2023-11-29 07:33:46,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.11 vs. limit=22.5 2023-11-29 07:33:47,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3869853.3333333335, ans=0.0 2023-11-29 07:34:06,871 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3350, loss[loss=0.05063, simple_loss=0.06391, pruned_loss=0.01152, audio_tagging_loss=0.007163, over 15987.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08895, pruned_loss=0.01195, audio_tagging_loss=0.008771, over 3056688.61 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:34:06,957 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580500 2023-11-29 07:34:10,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3869986.6666666665, ans=0.125 2023-11-29 07:34:17,523 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.37 vs. limit=22.5 2023-11-29 07:34:32,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3870120.0, ans=0.125 2023-11-29 07:34:47,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3870186.6666666665, ans=0.125 2023-11-29 07:34:51,619 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:35:00,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3870253.3333333335, ans=0.125 2023-11-29 07:35:05,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3870253.3333333335, ans=0.125 2023-11-29 07:35:08,976 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3400, loss[loss=0.06047, simple_loss=0.08354, pruned_loss=0.009601, audio_tagging_loss=0.009093, over 15665.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08944, pruned_loss=0.01188, audio_tagging_loss=0.00869, over 3063711.17 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:35:09,078 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580550 2023-11-29 07:35:11,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.02 vs. limit=6.0 2023-11-29 07:35:21,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3870386.6666666665, ans=0.035 2023-11-29 07:35:25,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3870386.6666666665, ans=0.125 2023-11-29 07:35:41,954 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 9.012e+01 9.460e+01 1.056e+02 1.309e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-29 07:36:11,805 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3450, loss[loss=0.05, simple_loss=0.06501, pruned_loss=0.00708, audio_tagging_loss=0.01042, over 14777.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08935, pruned_loss=0.01187, audio_tagging_loss=0.008521, over 3057187.67 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:36:11,883 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580600 2023-11-29 07:36:21,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3870653.3333333335, ans=0.2 2023-11-29 07:37:05,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3870920.0, ans=0.1 2023-11-29 07:37:13,497 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3500, loss[loss=0.0679, simple_loss=0.09567, pruned_loss=0.01177, audio_tagging_loss=0.008298, over 15321.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.09044, pruned_loss=0.01198, audio_tagging_loss=0.008443, over 3053113.77 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:37:13,601 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580650 2023-11-29 07:37:18,925 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2023-11-29 07:37:26,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3871053.3333333335, ans=0.5 2023-11-29 07:37:34,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3871053.3333333335, ans=0.1 2023-11-29 07:37:42,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3871120.0, ans=0.125 2023-11-29 07:37:47,396 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:37:48,504 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 8.986e+01 9.811e+01 1.065e+02 1.473e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 07:37:54,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3871186.6666666665, ans=0.2 2023-11-29 07:38:13,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3871253.3333333335, ans=0.125 2023-11-29 07:38:16,947 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3550, loss[loss=0.07203, simple_loss=0.1001, pruned_loss=0.0141, audio_tagging_loss=0.007896, over 15093.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08834, pruned_loss=0.01171, audio_tagging_loss=0.008506, over 3049335.73 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:38:17,043 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580700 2023-11-29 07:38:23,732 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=15.0 2023-11-29 07:38:27,701 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2023-11-29 07:38:36,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3871386.6666666665, ans=0.125 2023-11-29 07:38:38,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3871386.6666666665, ans=0.0 2023-11-29 07:38:39,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3871386.6666666665, ans=0.125 2023-11-29 07:38:48,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3871453.3333333335, ans=0.125 2023-11-29 07:38:57,569 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=22.5 2023-11-29 07:39:18,695 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3600, loss[loss=0.06367, simple_loss=0.08828, pruned_loss=0.01218, audio_tagging_loss=0.007358, over 14774.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08786, pruned_loss=0.0117, audio_tagging_loss=0.008492, over 3048455.55 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:39:18,797 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580750 2023-11-29 07:39:35,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3871720.0, ans=0.125 2023-11-29 07:39:38,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3871720.0, ans=0.0 2023-11-29 07:39:40,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3871720.0, ans=0.1 2023-11-29 07:39:51,830 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 8.727e+01 9.343e+01 1.017e+02 1.458e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-29 07:39:53,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3871786.6666666665, ans=0.1 2023-11-29 07:40:00,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3871853.3333333335, ans=0.125 2023-11-29 07:40:19,964 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3650, loss[loss=0.05956, simple_loss=0.07797, pruned_loss=0.01271, audio_tagging_loss=0.007857, over 15805.00 frames. ], tot_loss[loss=0.06404, simple_loss=0.08765, pruned_loss=0.01172, audio_tagging_loss=0.008497, over 3045486.17 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:40:20,036 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580800 2023-11-29 07:40:25,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3871986.6666666665, ans=0.0 2023-11-29 07:40:55,120 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.88 vs. limit=15.0 2023-11-29 07:41:12,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3872253.3333333335, ans=0.0 2023-11-29 07:41:13,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3872253.3333333335, ans=0.04949747468305833 2023-11-29 07:41:15,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3872253.3333333335, ans=0.125 2023-11-29 07:41:17,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3872253.3333333335, ans=0.1 2023-11-29 07:41:21,611 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3700, loss[loss=0.06926, simple_loss=0.09821, pruned_loss=0.01409, audio_tagging_loss=0.006058, over 16220.00 frames. ], tot_loss[loss=0.065, simple_loss=0.08942, pruned_loss=0.01193, audio_tagging_loss=0.008367, over 3047355.00 frames. ], batch size: 62, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:41:21,689 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580850 2023-11-29 07:41:43,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3872386.6666666665, ans=0.0 2023-11-29 07:41:55,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=3872453.3333333335, ans=0.02 2023-11-29 07:41:56,225 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 9.236e+01 9.960e+01 1.067e+02 1.392e+02, threshold=1.992e+02, percent-clipped=0.0 2023-11-29 07:42:01,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3872520.0, ans=0.125 2023-11-29 07:42:24,366 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3750, loss[loss=0.05769, simple_loss=0.07674, pruned_loss=0.009147, audio_tagging_loss=0.01017, over 14656.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.09024, pruned_loss=0.0121, audio_tagging_loss=0.008348, over 3042885.39 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:42:24,441 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580900 2023-11-29 07:42:25,223 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=22.5 2023-11-29 07:42:29,193 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.77 vs. limit=15.0 2023-11-29 07:42:57,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3872786.6666666665, ans=0.1 2023-11-29 07:43:05,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3872853.3333333335, ans=0.125 2023-11-29 07:43:09,384 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:43:18,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3872920.0, ans=0.125 2023-11-29 07:43:26,307 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3800, loss[loss=0.06057, simple_loss=0.08316, pruned_loss=0.009049, audio_tagging_loss=0.009939, over 15293.00 frames. ], tot_loss[loss=0.06588, simple_loss=0.09085, pruned_loss=0.01213, audio_tagging_loss=0.008322, over 3053140.21 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:43:26,393 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 580950 2023-11-29 07:43:26,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3872986.6666666665, ans=0.125 2023-11-29 07:43:32,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3872986.6666666665, ans=0.125 2023-11-29 07:43:45,252 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.30 vs. limit=15.0 2023-11-29 07:43:47,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3873053.3333333335, ans=0.125 2023-11-29 07:43:57,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3873120.0, ans=0.1 2023-11-29 07:44:01,539 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.972e+01 9.513e+01 1.036e+02 1.364e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-29 07:44:01,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3873120.0, ans=0.1 2023-11-29 07:44:08,242 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2023-11-29 07:44:11,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3873186.6666666665, ans=0.0 2023-11-29 07:44:28,022 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3850, loss[loss=0.07545, simple_loss=0.112, pruned_loss=0.01315, audio_tagging_loss=0.006325, over 14580.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.09006, pruned_loss=0.01202, audio_tagging_loss=0.008438, over 3058400.83 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:44:28,099 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581000 2023-11-29 07:44:37,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3873320.0, ans=0.125 2023-11-29 07:45:05,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2023-11-29 07:45:27,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3873586.6666666665, ans=0.1 2023-11-29 07:45:30,895 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3900, loss[loss=0.06054, simple_loss=0.08704, pruned_loss=0.007995, audio_tagging_loss=0.009023, over 15383.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.0891, pruned_loss=0.01186, audio_tagging_loss=0.008526, over 3058063.59 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:45:30,973 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581050 2023-11-29 07:45:40,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3873653.3333333335, ans=0.0 2023-11-29 07:45:54,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3873786.6666666665, ans=0.125 2023-11-29 07:45:57,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3873786.6666666665, ans=0.125 2023-11-29 07:46:04,755 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.538e+01 8.794e+01 9.412e+01 1.023e+02 1.323e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-29 07:46:31,865 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 3950, loss[loss=0.05879, simple_loss=0.07635, pruned_loss=0.01022, audio_tagging_loss=0.0104, over 14954.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08918, pruned_loss=0.01185, audio_tagging_loss=0.008579, over 3050907.96 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:46:31,947 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581100 2023-11-29 07:46:40,655 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=12.0 2023-11-29 07:46:51,919 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2023-11-29 07:47:07,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3874186.6666666665, ans=0.125 2023-11-29 07:47:10,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3874186.6666666665, ans=0.125 2023-11-29 07:47:12,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3874186.6666666665, ans=0.2 2023-11-29 07:47:14,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3874186.6666666665, ans=0.125 2023-11-29 07:47:19,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3874253.3333333335, ans=0.125 2023-11-29 07:47:32,131 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4000, loss[loss=0.06381, simple_loss=0.08708, pruned_loss=0.01194, audio_tagging_loss=0.008337, over 16490.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08915, pruned_loss=0.01192, audio_tagging_loss=0.008666, over 3047913.42 frames. ], batch size: 64, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:47:32,229 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581150 2023-11-29 07:47:33,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3874320.0, ans=0.1 2023-11-29 07:47:38,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3874320.0, ans=0.125 2023-11-29 07:47:38,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3874320.0, ans=0.0 2023-11-29 07:47:46,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3874386.6666666665, ans=0.125 2023-11-29 07:48:06,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3874453.3333333335, ans=0.125 2023-11-29 07:48:08,322 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.251e+01 9.122e+01 9.589e+01 1.060e+02 2.170e+02, threshold=1.918e+02, percent-clipped=1.0 2023-11-29 07:48:13,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3874520.0, ans=0.125 2023-11-29 07:48:17,131 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=22.5 2023-11-29 07:48:18,278 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2023-11-29 07:48:23,516 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=12.0 2023-11-29 07:48:31,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3874586.6666666665, ans=0.125 2023-11-29 07:48:33,318 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4050, loss[loss=0.08088, simple_loss=0.1171, pruned_loss=0.01539, audio_tagging_loss=0.006947, over 15941.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.08969, pruned_loss=0.0121, audio_tagging_loss=0.008815, over 3047182.09 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:48:33,411 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581200 2023-11-29 07:48:36,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3874653.3333333335, ans=0.1 2023-11-29 07:48:37,482 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:48:37,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3874653.3333333335, ans=0.0 2023-11-29 07:48:58,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3874786.6666666665, ans=0.125 2023-11-29 07:49:06,887 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=12.0 2023-11-29 07:49:09,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3874853.3333333335, ans=0.0 2023-11-29 07:49:17,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3874853.3333333335, ans=0.125 2023-11-29 07:49:35,700 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4100, loss[loss=0.06523, simple_loss=0.09142, pruned_loss=0.0115, audio_tagging_loss=0.008014, over 14934.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09019, pruned_loss=0.0121, audio_tagging_loss=0.008744, over 3047472.63 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:49:35,795 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581250 2023-11-29 07:49:40,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3874986.6666666665, ans=0.0 2023-11-29 07:49:54,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3875053.3333333335, ans=0.1 2023-11-29 07:49:54,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3875053.3333333335, ans=0.1 2023-11-29 07:49:56,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3875053.3333333335, ans=0.1 2023-11-29 07:50:11,081 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.045e+01 9.112e+01 9.700e+01 1.031e+02 1.226e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 07:50:21,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3875186.6666666665, ans=0.1 2023-11-29 07:50:32,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3875253.3333333335, ans=0.125 2023-11-29 07:50:33,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3875253.3333333335, ans=0.2 2023-11-29 07:50:36,481 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4150, loss[loss=0.0675, simple_loss=0.09544, pruned_loss=0.01078, audio_tagging_loss=0.009002, over 16017.00 frames. ], tot_loss[loss=0.06575, simple_loss=0.0899, pruned_loss=0.01214, audio_tagging_loss=0.008654, over 3049416.75 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:50:36,561 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581300 2023-11-29 07:50:36,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3875320.0, ans=0.1 2023-11-29 07:51:02,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3875453.3333333335, ans=0.125 2023-11-29 07:51:22,066 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:51:22,908 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.15 vs. limit=15.0 2023-11-29 07:51:25,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3875586.6666666665, ans=0.0 2023-11-29 07:51:37,790 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4200, loss[loss=0.06783, simple_loss=0.08685, pruned_loss=0.01412, audio_tagging_loss=0.01029, over 15544.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08937, pruned_loss=0.01187, audio_tagging_loss=0.008543, over 3046232.82 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:51:37,891 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581350 2023-11-29 07:51:38,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3875653.3333333335, ans=0.125 2023-11-29 07:51:59,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3875720.0, ans=0.0 2023-11-29 07:51:59,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3875720.0, ans=0.125 2023-11-29 07:52:12,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3875786.6666666665, ans=0.0 2023-11-29 07:52:13,211 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 9.081e+01 9.650e+01 1.017e+02 1.202e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 07:52:31,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3875920.0, ans=0.0 2023-11-29 07:52:39,526 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4250, loss[loss=0.07867, simple_loss=0.1071, pruned_loss=0.01841, audio_tagging_loss=0.006695, over 15524.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.09031, pruned_loss=0.0119, audio_tagging_loss=0.00847, over 3044340.46 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:52:39,601 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581400 2023-11-29 07:53:11,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3876120.0, ans=0.125 2023-11-29 07:53:22,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3876186.6666666665, ans=0.0 2023-11-29 07:53:28,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3876253.3333333335, ans=0.125 2023-11-29 07:53:41,496 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4300, loss[loss=0.06954, simple_loss=0.09827, pruned_loss=0.01434, audio_tagging_loss=0.006067, over 15184.00 frames. ], tot_loss[loss=0.06528, simple_loss=0.0899, pruned_loss=0.01191, audio_tagging_loss=0.00842, over 3047304.00 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:53:41,570 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581450 2023-11-29 07:54:16,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 9.277e+01 9.932e+01 1.054e+02 1.240e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-29 07:54:36,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3876586.6666666665, ans=0.04949747468305833 2023-11-29 07:54:41,286 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2023-11-29 07:54:42,947 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4350, loss[loss=0.05181, simple_loss=0.06708, pruned_loss=0.00696, audio_tagging_loss=0.01131, over 14064.00 frames. ], tot_loss[loss=0.06585, simple_loss=0.09059, pruned_loss=0.01221, audio_tagging_loss=0.008343, over 3045999.93 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:54:43,037 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581500 2023-11-29 07:54:45,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3876653.3333333335, ans=0.0 2023-11-29 07:54:59,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3876720.0, ans=0.025 2023-11-29 07:55:04,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3876720.0, ans=0.0 2023-11-29 07:55:14,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3876786.6666666665, ans=0.125 2023-11-29 07:55:17,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3876786.6666666665, ans=0.1 2023-11-29 07:55:19,310 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2023-11-29 07:55:44,976 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4400, loss[loss=0.04233, simple_loss=0.05507, pruned_loss=0.006704, audio_tagging_loss=0.008092, over 15058.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08959, pruned_loss=0.01202, audio_tagging_loss=0.008322, over 3049459.55 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 07:55:45,050 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581550 2023-11-29 07:55:48,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3876986.6666666665, ans=0.2 2023-11-29 07:55:58,866 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.79 vs. limit=15.0 2023-11-29 07:55:59,098 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.80 vs. limit=6.0 2023-11-29 07:56:03,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3877053.3333333335, ans=0.95 2023-11-29 07:56:03,778 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.74 vs. limit=22.5 2023-11-29 07:56:14,850 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.83 vs. limit=22.5 2023-11-29 07:56:20,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3877186.6666666665, ans=0.1 2023-11-29 07:56:21,165 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.065e+01 9.242e+01 9.842e+01 1.066e+02 1.310e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-29 07:56:31,343 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=15.0 2023-11-29 07:56:36,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3877253.3333333335, ans=0.025 2023-11-29 07:56:41,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3877253.3333333335, ans=0.125 2023-11-29 07:56:46,463 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4450, loss[loss=0.05824, simple_loss=0.07642, pruned_loss=0.01059, audio_tagging_loss=0.009448, over 14546.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.0899, pruned_loss=0.01201, audio_tagging_loss=0.008206, over 3048543.41 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:56:46,562 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581600 2023-11-29 07:57:21,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3877453.3333333335, ans=0.0 2023-11-29 07:57:48,375 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4500, loss[loss=0.07673, simple_loss=0.1104, pruned_loss=0.01478, audio_tagging_loss=0.006746, over 15620.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.09021, pruned_loss=0.01213, audio_tagging_loss=0.008242, over 3056067.85 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:57:48,477 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581650 2023-11-29 07:58:19,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3877786.6666666665, ans=0.0 2023-11-29 07:58:25,182 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 9.167e+01 9.852e+01 1.040e+02 1.276e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 07:58:36,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3877920.0, ans=0.125 2023-11-29 07:58:39,088 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 07:58:50,539 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4550, loss[loss=0.07466, simple_loss=0.1052, pruned_loss=0.01384, audio_tagging_loss=0.008237, over 16498.00 frames. ], tot_loss[loss=0.0657, simple_loss=0.09052, pruned_loss=0.01219, audio_tagging_loss=0.008249, over 3055485.42 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:58:50,615 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581700 2023-11-29 07:58:59,290 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.75 vs. limit=15.0 2023-11-29 07:59:09,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3878053.3333333335, ans=0.125 2023-11-29 07:59:12,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3878053.3333333335, ans=0.2 2023-11-29 07:59:27,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3878186.6666666665, ans=0.0 2023-11-29 07:59:30,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3878186.6666666665, ans=0.125 2023-11-29 07:59:38,638 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 07:59:48,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3878253.3333333335, ans=0.0 2023-11-29 07:59:51,486 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4600, loss[loss=0.05629, simple_loss=0.07193, pruned_loss=0.01112, audio_tagging_loss=0.009207, over 17059.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08957, pruned_loss=0.01192, audio_tagging_loss=0.008419, over 3055806.15 frames. ], batch size: 66, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 07:59:51,576 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581750 2023-11-29 07:59:57,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3878320.0, ans=0.0 2023-11-29 07:59:59,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3878320.0, ans=0.125 2023-11-29 08:00:00,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3878320.0, ans=0.0 2023-11-29 08:00:13,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3878386.6666666665, ans=0.125 2023-11-29 08:00:17,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3878453.3333333335, ans=0.0 2023-11-29 08:00:25,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3878453.3333333335, ans=0.0 2023-11-29 08:00:29,073 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.065e+01 8.974e+01 9.623e+01 1.050e+02 1.439e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-29 08:00:52,984 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4650, loss[loss=0.08334, simple_loss=0.1186, pruned_loss=0.01742, audio_tagging_loss=0.00661, over 14285.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.0899, pruned_loss=0.01206, audio_tagging_loss=0.00848, over 3045792.65 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:00:53,048 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581800 2023-11-29 08:00:53,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3878653.3333333335, ans=0.125 2023-11-29 08:01:34,954 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2023-11-29 08:01:36,281 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.33 vs. limit=15.0 2023-11-29 08:01:41,498 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2023-11-29 08:01:52,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3878920.0, ans=0.1 2023-11-29 08:01:56,875 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4700, loss[loss=0.04189, simple_loss=0.04828, pruned_loss=0.008432, audio_tagging_loss=0.00932, over 14035.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08826, pruned_loss=0.01187, audio_tagging_loss=0.008664, over 3043870.21 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:01:56,979 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581850 2023-11-29 08:01:59,031 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:02:22,787 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2023-11-29 08:02:33,808 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.525e+01 9.091e+01 9.646e+01 1.031e+02 1.253e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 08:02:41,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3879186.6666666665, ans=0.1 2023-11-29 08:02:48,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3879253.3333333335, ans=0.1 2023-11-29 08:02:58,738 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4750, loss[loss=0.07427, simple_loss=0.1078, pruned_loss=0.01451, audio_tagging_loss=0.005867, over 13630.00 frames. ], tot_loss[loss=0.06412, simple_loss=0.08725, pruned_loss=0.0117, audio_tagging_loss=0.008797, over 3039022.77 frames. ], batch size: 53, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:02:58,819 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581900 2023-11-29 08:03:05,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2023-11-29 08:03:06,188 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=15.0 2023-11-29 08:03:24,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3879453.3333333335, ans=0.0 2023-11-29 08:03:35,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3879520.0, ans=0.0 2023-11-29 08:03:51,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3879586.6666666665, ans=0.125 2023-11-29 08:03:52,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3879586.6666666665, ans=0.04949747468305833 2023-11-29 08:03:58,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3879653.3333333335, ans=0.0 2023-11-29 08:03:59,269 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4800, loss[loss=0.05948, simple_loss=0.07714, pruned_loss=0.009837, audio_tagging_loss=0.01108, over 15475.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.08725, pruned_loss=0.01173, audio_tagging_loss=0.008829, over 3038865.91 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:03:59,359 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 581950 2023-11-29 08:04:01,169 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2023-11-29 08:04:04,025 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=12.0 2023-11-29 08:04:21,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3879720.0, ans=0.1 2023-11-29 08:04:36,437 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.055e+01 9.178e+01 9.692e+01 1.041e+02 1.280e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 08:04:45,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3879853.3333333335, ans=0.1 2023-11-29 08:04:46,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3879853.3333333335, ans=0.1 2023-11-29 08:05:01,408 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4850, loss[loss=0.06179, simple_loss=0.08363, pruned_loss=0.01376, audio_tagging_loss=0.006221, over 14831.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08779, pruned_loss=0.01185, audio_tagging_loss=0.008818, over 3046312.06 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:05:01,503 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582000 2023-11-29 08:05:05,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3879986.6666666665, ans=0.0 2023-11-29 08:05:12,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3879986.6666666665, ans=0.0 2023-11-29 08:05:19,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3880053.3333333335, ans=0.0 2023-11-29 08:05:26,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3880120.0, ans=0.0 2023-11-29 08:05:36,585 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=15.0 2023-11-29 08:05:43,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3880186.6666666665, ans=0.0 2023-11-29 08:06:04,470 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4900, loss[loss=0.05813, simple_loss=0.08366, pruned_loss=0.009933, audio_tagging_loss=0.006362, over 14739.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08903, pruned_loss=0.0119, audio_tagging_loss=0.008702, over 3045146.53 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:06:04,559 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582050 2023-11-29 08:06:09,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3880320.0, ans=15.0 2023-11-29 08:06:43,227 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.902e+01 9.348e+01 9.931e+01 1.050e+02 1.310e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-29 08:06:46,948 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:06:56,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3880586.6666666665, ans=0.125 2023-11-29 08:07:05,337 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 4950, loss[loss=0.04952, simple_loss=0.06043, pruned_loss=0.008614, audio_tagging_loss=0.01069, over 15264.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08869, pruned_loss=0.01175, audio_tagging_loss=0.008559, over 3047295.78 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:07:05,416 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582100 2023-11-29 08:07:15,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3880653.3333333335, ans=0.125 2023-11-29 08:07:22,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3880720.0, ans=0.0 2023-11-29 08:07:29,626 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-29 08:08:07,554 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5000, loss[loss=0.06487, simple_loss=0.08403, pruned_loss=0.01158, audio_tagging_loss=0.01128, over 15281.00 frames. ], tot_loss[loss=0.06401, simple_loss=0.08793, pruned_loss=0.01153, audio_tagging_loss=0.008517, over 3042581.90 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:08:07,659 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582150 2023-11-29 08:08:07,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3880986.6666666665, ans=0.125 2023-11-29 08:08:19,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3881053.3333333335, ans=0.125 2023-11-29 08:08:21,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3881053.3333333335, ans=0.0 2023-11-29 08:08:27,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3881053.3333333335, ans=0.0 2023-11-29 08:08:28,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3881053.3333333335, ans=0.2 2023-11-29 08:08:45,869 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 9.159e+01 9.676e+01 1.038e+02 1.226e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-29 08:09:01,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.63 vs. limit=15.0 2023-11-29 08:09:10,350 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5050, loss[loss=0.06113, simple_loss=0.08109, pruned_loss=0.01169, audio_tagging_loss=0.008903, over 15680.00 frames. ], tot_loss[loss=0.06351, simple_loss=0.08706, pruned_loss=0.01147, audio_tagging_loss=0.008504, over 3048235.16 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:09:10,436 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582200 2023-11-29 08:09:17,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3881320.0, ans=0.125 2023-11-29 08:09:23,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3881386.6666666665, ans=0.125 2023-11-29 08:09:32,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3881386.6666666665, ans=0.1 2023-11-29 08:09:58,204 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.09 vs. limit=10.0 2023-11-29 08:10:10,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3881653.3333333335, ans=0.125 2023-11-29 08:10:11,775 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5100, loss[loss=0.07824, simple_loss=0.1037, pruned_loss=0.01749, audio_tagging_loss=0.008906, over 14595.00 frames. ], tot_loss[loss=0.06381, simple_loss=0.08736, pruned_loss=0.01163, audio_tagging_loss=0.008501, over 3048393.42 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:10:11,847 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582250 2023-11-29 08:10:23,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3881720.0, ans=0.0 2023-11-29 08:10:25,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3881720.0, ans=0.125 2023-11-29 08:10:49,748 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.896e+01 8.838e+01 9.435e+01 1.031e+02 1.429e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-29 08:10:56,237 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2023-11-29 08:11:13,112 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5150, loss[loss=0.08593, simple_loss=0.1254, pruned_loss=0.01684, audio_tagging_loss=0.006391, over 15716.00 frames. ], tot_loss[loss=0.06424, simple_loss=0.08815, pruned_loss=0.01171, audio_tagging_loss=0.00846, over 3045876.91 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:11:13,195 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582300 2023-11-29 08:11:21,647 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=15.0 2023-11-29 08:11:26,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3882053.3333333335, ans=0.125 2023-11-29 08:11:50,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=3882186.6666666665, ans=0.1 2023-11-29 08:11:57,431 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=12.0 2023-11-29 08:12:04,285 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2023-11-29 08:12:07,367 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2023-11-29 08:12:12,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3882253.3333333335, ans=0.125 2023-11-29 08:12:15,502 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5200, loss[loss=0.06421, simple_loss=0.09025, pruned_loss=0.01266, audio_tagging_loss=0.00642, over 15123.00 frames. ], tot_loss[loss=0.06428, simple_loss=0.08813, pruned_loss=0.01175, audio_tagging_loss=0.008465, over 3045876.46 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:12:15,581 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582350 2023-11-29 08:12:29,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3882386.6666666665, ans=0.125 2023-11-29 08:12:29,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3882386.6666666665, ans=15.0 2023-11-29 08:12:52,838 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.957e+01 8.958e+01 9.640e+01 1.041e+02 1.476e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-29 08:12:56,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3882520.0, ans=0.0 2023-11-29 08:13:10,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3882586.6666666665, ans=0.1 2023-11-29 08:13:16,406 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5250, loss[loss=0.05807, simple_loss=0.08278, pruned_loss=0.008212, audio_tagging_loss=0.008471, over 15050.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08944, pruned_loss=0.012, audio_tagging_loss=0.00831, over 3046018.87 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:13:16,489 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582400 2023-11-29 08:13:16,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3882653.3333333335, ans=0.0 2023-11-29 08:13:36,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3882720.0, ans=0.09899494936611666 2023-11-29 08:13:36,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3882720.0, ans=0.0 2023-11-29 08:13:38,014 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.93 vs. limit=22.5 2023-11-29 08:13:41,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3882786.6666666665, ans=0.1 2023-11-29 08:13:47,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3882786.6666666665, ans=0.1 2023-11-29 08:13:54,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3882853.3333333335, ans=0.125 2023-11-29 08:14:03,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3882853.3333333335, ans=0.0 2023-11-29 08:14:18,858 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5300, loss[loss=0.06349, simple_loss=0.0919, pruned_loss=0.01061, audio_tagging_loss=0.006933, over 14898.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.09, pruned_loss=0.01198, audio_tagging_loss=0.008261, over 3046442.82 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:14:18,945 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582450 2023-11-29 08:14:27,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3882986.6666666665, ans=0.125 2023-11-29 08:14:33,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3883053.3333333335, ans=0.1 2023-11-29 08:14:37,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3883053.3333333335, ans=0.5 2023-11-29 08:14:38,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3883053.3333333335, ans=0.125 2023-11-29 08:14:57,523 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.980e+01 9.053e+01 9.676e+01 1.034e+02 1.415e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-29 08:15:12,691 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.36 vs. limit=15.0 2023-11-29 08:15:15,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3883253.3333333335, ans=0.0 2023-11-29 08:15:20,458 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5350, loss[loss=0.04804, simple_loss=0.06743, pruned_loss=0.006684, audio_tagging_loss=0.007641, over 14457.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08933, pruned_loss=0.01188, audio_tagging_loss=0.008326, over 3041438.36 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:15:20,552 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582500 2023-11-29 08:15:20,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3883320.0, ans=0.09899494936611666 2023-11-29 08:15:21,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3883320.0, ans=0.04949747468305833 2023-11-29 08:15:23,970 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.05 vs. limit=15.0 2023-11-29 08:15:26,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3883320.0, ans=0.125 2023-11-29 08:15:58,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3883520.0, ans=0.0 2023-11-29 08:15:58,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3883520.0, ans=0.125 2023-11-29 08:16:21,989 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5400, loss[loss=0.07596, simple_loss=0.1058, pruned_loss=0.017, audio_tagging_loss=0.006044, over 15279.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08961, pruned_loss=0.01203, audio_tagging_loss=0.008386, over 3041796.51 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:16:22,091 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582550 2023-11-29 08:16:32,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3883720.0, ans=0.125 2023-11-29 08:16:53,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3883786.6666666665, ans=0.125 2023-11-29 08:16:54,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3883786.6666666665, ans=0.0 2023-11-29 08:16:54,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3883786.6666666665, ans=0.1 2023-11-29 08:17:01,376 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.077e+01 9.215e+01 9.741e+01 1.047e+02 1.328e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 08:17:23,093 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5450, loss[loss=0.05847, simple_loss=0.08156, pruned_loss=0.01068, audio_tagging_loss=0.007017, over 14147.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08956, pruned_loss=0.01193, audio_tagging_loss=0.0084, over 3043212.32 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:17:23,191 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582600 2023-11-29 08:17:31,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3883986.6666666665, ans=0.2 2023-11-29 08:17:51,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3884120.0, ans=0.1 2023-11-29 08:18:00,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3884186.6666666665, ans=0.125 2023-11-29 08:18:01,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3884186.6666666665, ans=15.0 2023-11-29 08:18:13,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3884253.3333333335, ans=0.125 2023-11-29 08:18:24,672 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5500, loss[loss=0.07107, simple_loss=0.09877, pruned_loss=0.01335, audio_tagging_loss=0.008339, over 15628.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.09026, pruned_loss=0.01208, audio_tagging_loss=0.00835, over 3041550.67 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:18:24,752 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582650 2023-11-29 08:18:27,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3884320.0, ans=0.125 2023-11-29 08:19:00,242 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:19:00,359 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:19:03,451 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 9.075e+01 9.683e+01 1.060e+02 1.497e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-29 08:19:06,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3884520.0, ans=0.125 2023-11-29 08:19:11,461 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.35 vs. limit=15.0 2023-11-29 08:19:11,529 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=15.0 2023-11-29 08:19:17,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3884586.6666666665, ans=0.0 2023-11-29 08:19:25,598 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5550, loss[loss=0.07707, simple_loss=0.09974, pruned_loss=0.01643, audio_tagging_loss=0.01078, over 15508.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08936, pruned_loss=0.01196, audio_tagging_loss=0.008531, over 3047524.63 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:19:25,689 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582700 2023-11-29 08:19:29,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3884653.3333333335, ans=0.0 2023-11-29 08:19:36,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3884720.0, ans=0.125 2023-11-29 08:19:48,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3884786.6666666665, ans=0.2 2023-11-29 08:19:52,702 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=22.5 2023-11-29 08:20:05,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3884853.3333333335, ans=0.0 2023-11-29 08:20:21,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3884920.0, ans=0.2 2023-11-29 08:20:26,546 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5600, loss[loss=0.07771, simple_loss=0.1107, pruned_loss=0.01452, audio_tagging_loss=0.007835, over 14496.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08931, pruned_loss=0.01197, audio_tagging_loss=0.00864, over 3044205.18 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:20:26,654 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582750 2023-11-29 08:20:46,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2023-11-29 08:20:56,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3885120.0, ans=0.1 2023-11-29 08:21:06,732 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.841e+01 9.122e+01 9.786e+01 1.074e+02 1.432e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-29 08:21:10,421 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 08:21:28,664 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5650, loss[loss=0.06308, simple_loss=0.08539, pruned_loss=0.01163, audio_tagging_loss=0.008749, over 15495.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.08995, pruned_loss=0.01211, audio_tagging_loss=0.008583, over 3050575.86 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:21:28,742 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582800 2023-11-29 08:21:48,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3885386.6666666665, ans=0.0 2023-11-29 08:21:55,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3885453.3333333335, ans=0.125 2023-11-29 08:22:02,372 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2023-11-29 08:22:16,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3885520.0, ans=0.125 2023-11-29 08:22:24,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3885586.6666666665, ans=0.04949747468305833 2023-11-29 08:22:29,886 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5700, loss[loss=0.06864, simple_loss=0.1026, pruned_loss=0.01192, audio_tagging_loss=0.005443, over 15021.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08882, pruned_loss=0.01186, audio_tagging_loss=0.008607, over 3050900.91 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:22:29,967 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582850 2023-11-29 08:23:05,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3885786.6666666665, ans=0.2 2023-11-29 08:23:10,802 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 9.240e+01 1.005e+02 1.081e+02 1.357e+02, threshold=2.009e+02, percent-clipped=0.0 2023-11-29 08:23:25,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3885920.0, ans=0.125 2023-11-29 08:23:26,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3885920.0, ans=0.125 2023-11-29 08:23:31,321 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5750, loss[loss=0.08395, simple_loss=0.1163, pruned_loss=0.01832, audio_tagging_loss=0.007483, over 15686.00 frames. ], tot_loss[loss=0.0649, simple_loss=0.08886, pruned_loss=0.01196, audio_tagging_loss=0.008511, over 3047554.18 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:23:31,409 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582900 2023-11-29 08:24:08,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3886186.6666666665, ans=0.125 2023-11-29 08:24:14,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3886186.6666666665, ans=0.125 2023-11-29 08:24:15,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3886186.6666666665, ans=0.125 2023-11-29 08:24:32,447 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5800, loss[loss=0.055, simple_loss=0.07846, pruned_loss=0.006577, audio_tagging_loss=0.009192, over 15644.00 frames. ], tot_loss[loss=0.06385, simple_loss=0.08736, pruned_loss=0.01166, audio_tagging_loss=0.008514, over 3044896.40 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:24:32,637 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 582950 2023-11-29 08:24:40,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3886320.0, ans=0.125 2023-11-29 08:24:44,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3886386.6666666665, ans=0.0 2023-11-29 08:24:45,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3886386.6666666665, ans=0.125 2023-11-29 08:24:59,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3886453.3333333335, ans=0.0 2023-11-29 08:25:11,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3886520.0, ans=0.125 2023-11-29 08:25:13,365 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.598e+01 9.182e+01 9.577e+01 1.050e+02 1.266e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 08:25:19,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3886520.0, ans=0.0 2023-11-29 08:25:33,508 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5850, loss[loss=0.05055, simple_loss=0.06509, pruned_loss=0.009407, audio_tagging_loss=0.0086, over 14048.00 frames. ], tot_loss[loss=0.06374, simple_loss=0.08706, pruned_loss=0.01171, audio_tagging_loss=0.008498, over 3042656.11 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:25:33,587 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583000 2023-11-29 08:25:33,757 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:25:36,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3886653.3333333335, ans=0.0 2023-11-29 08:25:45,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3886720.0, ans=0.0 2023-11-29 08:25:53,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3886720.0, ans=0.125 2023-11-29 08:26:12,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3886853.3333333335, ans=0.07 2023-11-29 08:26:14,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2023-11-29 08:26:31,010 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:26:36,844 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5900, loss[loss=0.06669, simple_loss=0.08829, pruned_loss=0.01113, audio_tagging_loss=0.01142, over 14896.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08856, pruned_loss=0.01189, audio_tagging_loss=0.008412, over 3045660.58 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:26:36,951 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583050 2023-11-29 08:26:58,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3887053.3333333335, ans=0.09899494936611666 2023-11-29 08:27:02,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-11-29 08:27:17,923 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.091e+01 9.266e+01 9.950e+01 1.077e+02 1.290e+02, threshold=1.990e+02, percent-clipped=0.0 2023-11-29 08:27:24,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3887186.6666666665, ans=22.5 2023-11-29 08:27:31,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3887253.3333333335, ans=0.1 2023-11-29 08:27:32,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3887253.3333333335, ans=0.1 2023-11-29 08:27:38,602 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 5950, loss[loss=0.05059, simple_loss=0.06898, pruned_loss=0.006707, audio_tagging_loss=0.009389, over 15423.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08813, pruned_loss=0.01174, audio_tagging_loss=0.008524, over 3052146.78 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:27:38,705 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583100 2023-11-29 08:27:48,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3887320.0, ans=0.125 2023-11-29 08:27:49,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3887320.0, ans=0.0 2023-11-29 08:27:50,644 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.05 vs. limit=15.0 2023-11-29 08:27:55,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3887386.6666666665, ans=0.1 2023-11-29 08:28:03,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3887453.3333333335, ans=0.125 2023-11-29 08:28:20,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3887520.0, ans=0.125 2023-11-29 08:28:25,405 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2023-11-29 08:28:40,698 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6000, loss[loss=0.07255, simple_loss=0.1093, pruned_loss=0.01199, audio_tagging_loss=0.005913, over 15536.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08871, pruned_loss=0.01185, audio_tagging_loss=0.008533, over 3043818.54 frames. ], batch size: 54, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:28:40,699 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 08:29:20,062 INFO [train_asr.py:1267] (2/4) Epoch 49, validation: loss=0.05758, simple_loss=0.05041, pruned_loss=0.005303, audio_tagging_loss=0.02707, over 4681554.00 frames. 2023-11-29 08:29:20,062 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 08:29:20,149 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583150 2023-11-29 08:29:31,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3887720.0, ans=0.0 2023-11-29 08:30:00,787 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.749e+01 8.890e+01 9.631e+01 1.036e+02 1.251e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 08:30:05,511 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 08:30:07,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3887853.3333333335, ans=10.0 2023-11-29 08:30:07,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3887853.3333333335, ans=0.0 2023-11-29 08:30:15,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3887920.0, ans=0.125 2023-11-29 08:30:19,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3887920.0, ans=0.1 2023-11-29 08:30:22,501 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6050, loss[loss=0.06767, simple_loss=0.08633, pruned_loss=0.01589, audio_tagging_loss=0.008614, over 14687.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08879, pruned_loss=0.0118, audio_tagging_loss=0.008516, over 3047467.30 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:30:22,580 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583200 2023-11-29 08:31:10,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3888253.3333333335, ans=0.0 2023-11-29 08:31:24,411 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6100, loss[loss=0.05317, simple_loss=0.06567, pruned_loss=0.01038, audio_tagging_loss=0.009953, over 14171.00 frames. ], tot_loss[loss=0.0646, simple_loss=0.08866, pruned_loss=0.01175, audio_tagging_loss=0.008518, over 3049301.89 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:31:24,494 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583250 2023-11-29 08:31:34,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3888320.0, ans=0.125 2023-11-29 08:31:57,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3888453.3333333335, ans=0.125 2023-11-29 08:32:00,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3888453.3333333335, ans=0.125 2023-11-29 08:32:05,552 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 9.225e+01 1.004e+02 1.045e+02 1.351e+02, threshold=2.008e+02, percent-clipped=0.0 2023-11-29 08:32:06,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3888520.0, ans=0.2 2023-11-29 08:32:07,410 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=15.0 2023-11-29 08:32:13,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3888586.6666666665, ans=0.125 2023-11-29 08:32:19,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3888586.6666666665, ans=0.125 2023-11-29 08:32:20,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2023-11-29 08:32:25,085 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.03 vs. limit=5.0 2023-11-29 08:32:25,401 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6150, loss[loss=0.059, simple_loss=0.07384, pruned_loss=0.01118, audio_tagging_loss=0.0109, over 16564.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.08908, pruned_loss=0.01189, audio_tagging_loss=0.008448, over 3047055.25 frames. ], batch size: 65, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:32:25,521 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583300 2023-11-29 08:32:30,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3888653.3333333335, ans=0.125 2023-11-29 08:32:40,868 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2023-11-29 08:32:43,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3888720.0, ans=0.125 2023-11-29 08:33:26,849 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6200, loss[loss=0.06457, simple_loss=0.0844, pruned_loss=0.01307, audio_tagging_loss=0.009295, over 15514.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.0897, pruned_loss=0.01207, audio_tagging_loss=0.008494, over 3048608.12 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:33:26,956 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583350 2023-11-29 08:34:08,790 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.749e+01 9.108e+01 9.848e+01 1.055e+02 1.947e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 08:34:29,489 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6250, loss[loss=0.06078, simple_loss=0.08641, pruned_loss=0.009404, audio_tagging_loss=0.008166, over 14756.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08914, pruned_loss=0.01191, audio_tagging_loss=0.00855, over 3048653.47 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:34:29,607 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583400 2023-11-29 08:34:39,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3889320.0, ans=0.125 2023-11-29 08:34:39,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3889320.0, ans=0.125 2023-11-29 08:34:51,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3889386.6666666665, ans=0.1 2023-11-29 08:34:55,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3889453.3333333335, ans=0.125 2023-11-29 08:35:30,172 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6300, loss[loss=0.06132, simple_loss=0.08826, pruned_loss=0.01042, audio_tagging_loss=0.006766, over 14892.00 frames. ], tot_loss[loss=0.06488, simple_loss=0.0888, pruned_loss=0.01186, audio_tagging_loss=0.008624, over 3044679.33 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:35:30,265 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583450 2023-11-29 08:35:51,423 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=15.0 2023-11-29 08:35:54,196 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.32 vs. limit=15.0 2023-11-29 08:35:59,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3889786.6666666665, ans=0.09899494936611666 2023-11-29 08:36:02,209 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.34 vs. limit=10.0 2023-11-29 08:36:13,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.769e+01 9.155e+01 9.755e+01 1.058e+02 1.266e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 08:36:32,763 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6350, loss[loss=0.07349, simple_loss=0.1087, pruned_loss=0.01196, audio_tagging_loss=0.007163, over 16233.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08953, pruned_loss=0.01191, audio_tagging_loss=0.008582, over 3044478.49 frames. ], batch size: 61, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:36:32,838 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583500 2023-11-29 08:36:33,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3889986.6666666665, ans=0.125 2023-11-29 08:36:44,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3889986.6666666665, ans=0.1 2023-11-29 08:37:13,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3890186.6666666665, ans=0.125 2023-11-29 08:37:18,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3890186.6666666665, ans=0.125 2023-11-29 08:37:25,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3890253.3333333335, ans=0.125 2023-11-29 08:37:29,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3890253.3333333335, ans=0.09899494936611666 2023-11-29 08:37:35,998 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6400, loss[loss=0.08529, simple_loss=0.1163, pruned_loss=0.01986, audio_tagging_loss=0.007289, over 15572.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.08958, pruned_loss=0.012, audio_tagging_loss=0.008607, over 3045025.26 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:37:36,079 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583550 2023-11-29 08:37:50,518 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=15.0 2023-11-29 08:38:07,440 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=15.0 2023-11-29 08:38:17,386 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 8.981e+01 9.586e+01 1.023e+02 1.257e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 08:38:24,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3890586.6666666665, ans=0.125 2023-11-29 08:38:31,738 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.22 vs. limit=22.5 2023-11-29 08:38:36,783 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6450, loss[loss=0.06806, simple_loss=0.09188, pruned_loss=0.01442, audio_tagging_loss=0.007703, over 15168.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08842, pruned_loss=0.01182, audio_tagging_loss=0.008802, over 3048880.46 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:38:36,890 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583600 2023-11-29 08:38:38,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3890653.3333333335, ans=0.0 2023-11-29 08:38:54,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3890720.0, ans=0.125 2023-11-29 08:39:10,974 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.65 vs. limit=12.0 2023-11-29 08:39:12,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3890786.6666666665, ans=0.125 2023-11-29 08:39:36,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3890920.0, ans=0.0 2023-11-29 08:39:39,024 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6500, loss[loss=0.08551, simple_loss=0.1188, pruned_loss=0.0218, audio_tagging_loss=0.004328, over 15353.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08833, pruned_loss=0.01179, audio_tagging_loss=0.008703, over 3054853.58 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:39:39,109 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583650 2023-11-29 08:39:39,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.63 vs. limit=15.0 2023-11-29 08:39:52,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3891053.3333333335, ans=0.0 2023-11-29 08:40:22,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.191e+01 9.311e+01 1.000e+02 1.077e+02 1.349e+02, threshold=2.001e+02, percent-clipped=0.0 2023-11-29 08:40:28,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3891253.3333333335, ans=0.125 2023-11-29 08:40:41,240 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6550, loss[loss=0.05376, simple_loss=0.0812, pruned_loss=0.006462, audio_tagging_loss=0.006699, over 14367.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08853, pruned_loss=0.01204, audio_tagging_loss=0.008637, over 3051900.77 frames. ], batch size: 56, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:40:41,327 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583700 2023-11-29 08:41:01,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3891386.6666666665, ans=0.125 2023-11-29 08:41:12,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3891453.3333333335, ans=0.1 2023-11-29 08:41:16,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3891520.0, ans=0.2 2023-11-29 08:41:40,059 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.79 vs. limit=15.0 2023-11-29 08:41:40,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3891586.6666666665, ans=0.0 2023-11-29 08:41:40,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3891586.6666666665, ans=0.125 2023-11-29 08:41:43,062 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6600, loss[loss=0.07483, simple_loss=0.1089, pruned_loss=0.0138, audio_tagging_loss=0.006579, over 15831.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08858, pruned_loss=0.01199, audio_tagging_loss=0.0085, over 3049215.05 frames. ], batch size: 58, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:41:43,153 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583750 2023-11-29 08:41:44,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3891653.3333333335, ans=0.0 2023-11-29 08:41:56,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2023-11-29 08:42:26,973 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 9.392e+01 1.005e+02 1.057e+02 1.408e+02, threshold=2.010e+02, percent-clipped=0.0 2023-11-29 08:42:28,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3891853.3333333335, ans=0.125 2023-11-29 08:42:34,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=22.5 2023-11-29 08:42:45,104 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6650, loss[loss=0.07056, simple_loss=0.09706, pruned_loss=0.01611, audio_tagging_loss=0.005926, over 15546.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08869, pruned_loss=0.01206, audio_tagging_loss=0.008528, over 3046243.94 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:42:45,183 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583800 2023-11-29 08:43:10,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3892120.0, ans=0.2 2023-11-29 08:43:13,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3892120.0, ans=0.0 2023-11-29 08:43:19,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3892120.0, ans=0.0 2023-11-29 08:43:36,847 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2023-11-29 08:43:41,795 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=15.0 2023-11-29 08:43:47,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3892320.0, ans=0.125 2023-11-29 08:43:48,138 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6700, loss[loss=0.06462, simple_loss=0.08674, pruned_loss=0.009879, audio_tagging_loss=0.01137, over 15593.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08822, pruned_loss=0.01177, audio_tagging_loss=0.00846, over 3045283.52 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:43:48,215 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583850 2023-11-29 08:44:10,675 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.92 vs. limit=22.5 2023-11-29 08:44:18,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3892453.3333333335, ans=0.0 2023-11-29 08:44:20,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3892453.3333333335, ans=0.0 2023-11-29 08:44:31,112 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 9.029e+01 9.606e+01 1.021e+02 1.283e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-29 08:44:31,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3892520.0, ans=0.5 2023-11-29 08:44:48,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3892653.3333333335, ans=0.125 2023-11-29 08:44:49,262 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6750, loss[loss=0.05954, simple_loss=0.07876, pruned_loss=0.01316, audio_tagging_loss=0.007001, over 14954.00 frames. ], tot_loss[loss=0.06444, simple_loss=0.08832, pruned_loss=0.01187, audio_tagging_loss=0.008408, over 3042858.42 frames. ], batch size: 57, lr: 1.38e-03, grad_scale: 16.0 2023-11-29 08:44:49,348 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583900 2023-11-29 08:44:57,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3892653.3333333335, ans=0.0 2023-11-29 08:45:16,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3892786.6666666665, ans=0.125 2023-11-29 08:45:31,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3892853.3333333335, ans=0.125 2023-11-29 08:45:42,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3892920.0, ans=0.125 2023-11-29 08:45:51,285 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6800, loss[loss=0.05891, simple_loss=0.08017, pruned_loss=0.01241, audio_tagging_loss=0.006413, over 14317.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08892, pruned_loss=0.01186, audio_tagging_loss=0.008389, over 3043098.36 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:45:51,367 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 583950 2023-11-29 08:46:18,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3893120.0, ans=0.0 2023-11-29 08:46:28,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3893186.6666666665, ans=0.125 2023-11-29 08:46:34,383 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 9.158e+01 9.734e+01 1.076e+02 1.387e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-29 08:46:53,209 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6850, loss[loss=0.06928, simple_loss=0.1017, pruned_loss=0.009948, audio_tagging_loss=0.008502, over 16264.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.0892, pruned_loss=0.01188, audio_tagging_loss=0.008267, over 3037568.17 frames. ], batch size: 59, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:46:53,293 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584000 2023-11-29 08:47:04,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3893320.0, ans=0.2 2023-11-29 08:47:04,726 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=22.5 2023-11-29 08:47:10,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3893386.6666666665, ans=10.0 2023-11-29 08:47:27,547 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.75 vs. limit=15.0 2023-11-29 08:47:28,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3893453.3333333335, ans=0.125 2023-11-29 08:47:32,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3893520.0, ans=0.1 2023-11-29 08:47:34,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3893520.0, ans=0.2 2023-11-29 08:47:40,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3893520.0, ans=0.0 2023-11-29 08:47:52,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3893586.6666666665, ans=0.2 2023-11-29 08:47:56,379 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6900, loss[loss=0.07001, simple_loss=0.09546, pruned_loss=0.01304, audio_tagging_loss=0.009237, over 16036.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08881, pruned_loss=0.0118, audio_tagging_loss=0.008282, over 3035308.65 frames. ], batch size: 60, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:47:56,476 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584050 2023-11-29 08:48:01,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3893653.3333333335, ans=0.125 2023-11-29 08:48:02,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3893653.3333333335, ans=0.125 2023-11-29 08:48:08,382 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.61 vs. limit=10.0 2023-11-29 08:48:23,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3893786.6666666665, ans=0.0 2023-11-29 08:48:39,322 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.820e+01 9.126e+01 9.494e+01 1.002e+02 1.227e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-29 08:48:45,171 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 08:48:58,266 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 6950, loss[loss=0.08749, simple_loss=0.1162, pruned_loss=0.01927, audio_tagging_loss=0.01012, over 14849.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.0905, pruned_loss=0.01217, audio_tagging_loss=0.008268, over 3034937.19 frames. ], batch size: 55, lr: 1.38e-03, grad_scale: 32.0 2023-11-29 08:48:58,347 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584100 2023-11-29 08:49:12,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3894053.3333333335, ans=0.2 2023-11-29 08:49:45,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3894186.6666666665, ans=0.0 2023-11-29 08:49:46,712 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2023-11-29 08:49:47,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3894253.3333333335, ans=0.125 2023-11-29 08:49:52,705 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.63 vs. limit=15.0 2023-11-29 08:49:53,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3894253.3333333335, ans=0.1 2023-11-29 08:49:58,666 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7000, loss[loss=0.08337, simple_loss=0.1234, pruned_loss=0.01529, audio_tagging_loss=0.006397, over 15111.00 frames. ], tot_loss[loss=0.06531, simple_loss=0.08973, pruned_loss=0.01202, audio_tagging_loss=0.008428, over 3031939.31 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:49:58,734 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584150 2023-11-29 08:50:12,656 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.11 vs. limit=10.0 2023-11-29 08:50:30,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3894453.3333333335, ans=0.125 2023-11-29 08:50:34,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3894453.3333333335, ans=0.1 2023-11-29 08:50:38,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3894520.0, ans=0.125 2023-11-29 08:50:42,687 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 8.892e+01 9.505e+01 1.031e+02 1.228e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-29 08:50:58,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3894586.6666666665, ans=0.2 2023-11-29 08:51:01,080 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7050, loss[loss=0.04077, simple_loss=0.051, pruned_loss=0.00586, audio_tagging_loss=0.009408, over 14528.00 frames. ], tot_loss[loss=0.06435, simple_loss=0.08808, pruned_loss=0.01173, audio_tagging_loss=0.008577, over 3035774.96 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:51:01,199 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584200 2023-11-29 08:51:06,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3894653.3333333335, ans=0.125 2023-11-29 08:51:06,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3894653.3333333335, ans=0.1 2023-11-29 08:51:08,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3894653.3333333335, ans=0.2 2023-11-29 08:51:12,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3894720.0, ans=0.125 2023-11-29 08:51:15,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3894720.0, ans=0.125 2023-11-29 08:51:18,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3894720.0, ans=0.125 2023-11-29 08:51:41,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3894853.3333333335, ans=0.035 2023-11-29 08:51:46,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3894853.3333333335, ans=0.2 2023-11-29 08:51:53,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3894920.0, ans=0.125 2023-11-29 08:51:55,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3894920.0, ans=0.1 2023-11-29 08:51:59,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3894920.0, ans=0.09899494936611666 2023-11-29 08:52:01,494 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7100, loss[loss=0.05286, simple_loss=0.07283, pruned_loss=0.006644, audio_tagging_loss=0.009795, over 14969.00 frames. ], tot_loss[loss=0.06405, simple_loss=0.08762, pruned_loss=0.01158, audio_tagging_loss=0.008663, over 3042246.92 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 08:52:01,613 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584250 2023-11-29 08:52:02,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3894986.6666666665, ans=0.0 2023-11-29 08:52:06,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3894986.6666666665, ans=0.04949747468305833 2023-11-29 08:52:06,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3894986.6666666665, ans=0.2 2023-11-29 08:52:30,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3895120.0, ans=0.5 2023-11-29 08:52:32,925 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.59 vs. limit=22.5 2023-11-29 08:52:40,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3895186.6666666665, ans=0.125 2023-11-29 08:52:40,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3895186.6666666665, ans=0.0 2023-11-29 08:52:45,223 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.440e+01 9.114e+01 9.630e+01 1.033e+02 1.554e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 08:52:56,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3895253.3333333335, ans=0.125 2023-11-29 08:53:03,107 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7150, loss[loss=0.06619, simple_loss=0.09398, pruned_loss=0.01281, audio_tagging_loss=0.006393, over 16348.00 frames. ], tot_loss[loss=0.06437, simple_loss=0.08796, pruned_loss=0.01167, audio_tagging_loss=0.008718, over 3045436.71 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 08:53:03,198 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584300 2023-11-29 08:53:09,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3895320.0, ans=0.025 2023-11-29 08:53:14,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3895386.6666666665, ans=0.125 2023-11-29 08:53:31,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3895453.3333333335, ans=0.125 2023-11-29 08:53:43,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3895520.0, ans=0.125 2023-11-29 08:53:46,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3895520.0, ans=0.1 2023-11-29 08:53:49,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3895520.0, ans=0.0 2023-11-29 08:54:01,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3895586.6666666665, ans=0.125 2023-11-29 08:54:04,765 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7200, loss[loss=0.07545, simple_loss=0.1063, pruned_loss=0.0146, audio_tagging_loss=0.007714, over 15277.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08796, pruned_loss=0.01168, audio_tagging_loss=0.008713, over 3043080.70 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:54:04,846 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584350 2023-11-29 08:54:07,894 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-11-29 08:54:45,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3895853.3333333335, ans=0.125 2023-11-29 08:54:48,862 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 8.925e+01 9.835e+01 1.040e+02 1.813e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-29 08:54:55,187 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.70 vs. limit=15.0 2023-11-29 08:54:57,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=3895920.0, ans=15.0 2023-11-29 08:54:58,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3895920.0, ans=0.1 2023-11-29 08:55:03,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3895920.0, ans=0.1 2023-11-29 08:55:05,296 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7250, loss[loss=0.05976, simple_loss=0.08081, pruned_loss=0.009872, audio_tagging_loss=0.00948, over 17028.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08901, pruned_loss=0.01173, audio_tagging_loss=0.008721, over 3057056.78 frames. ], batch size: 63, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:55:05,370 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584400 2023-11-29 08:55:17,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3896053.3333333335, ans=0.1 2023-11-29 08:55:39,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3896120.0, ans=0.125 2023-11-29 08:55:44,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3896186.6666666665, ans=0.0 2023-11-29 08:56:05,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3896253.3333333335, ans=0.0 2023-11-29 08:56:07,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3896320.0, ans=0.2 2023-11-29 08:56:07,916 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7300, loss[loss=0.06556, simple_loss=0.09206, pruned_loss=0.01136, audio_tagging_loss=0.008171, over 14612.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08901, pruned_loss=0.01175, audio_tagging_loss=0.008612, over 3053805.26 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:56:07,989 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584450 2023-11-29 08:56:15,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3896320.0, ans=0.2 2023-11-29 08:56:20,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3896386.6666666665, ans=0.0 2023-11-29 08:56:51,122 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 9.158e+01 9.688e+01 1.038e+02 1.242e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 08:56:53,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3896520.0, ans=0.125 2023-11-29 08:57:02,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3896586.6666666665, ans=0.125 2023-11-29 08:57:08,961 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7350, loss[loss=0.07686, simple_loss=0.1045, pruned_loss=0.0173, audio_tagging_loss=0.007322, over 14097.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08963, pruned_loss=0.01194, audio_tagging_loss=0.008473, over 3053574.55 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:57:09,052 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584500 2023-11-29 08:57:20,303 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2023-11-29 08:57:22,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3896720.0, ans=0.125 2023-11-29 08:57:34,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3896786.6666666665, ans=0.0 2023-11-29 08:57:47,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3896853.3333333335, ans=0.0 2023-11-29 08:58:02,819 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 08:58:05,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3896920.0, ans=0.125 2023-11-29 08:58:09,727 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7400, loss[loss=0.05916, simple_loss=0.08146, pruned_loss=0.01102, audio_tagging_loss=0.007417, over 15019.00 frames. ], tot_loss[loss=0.06485, simple_loss=0.08915, pruned_loss=0.01178, audio_tagging_loss=0.008494, over 3048086.08 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 08:58:09,809 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584550 2023-11-29 08:58:17,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3896986.6666666665, ans=0.1 2023-11-29 08:58:22,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3897053.3333333335, ans=0.125 2023-11-29 08:58:47,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3897186.6666666665, ans=0.125 2023-11-29 08:58:53,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3897186.6666666665, ans=0.125 2023-11-29 08:58:54,833 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.663e+01 9.320e+01 9.934e+01 1.095e+02 1.320e+02, threshold=1.987e+02, percent-clipped=0.0 2023-11-29 08:59:10,433 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7450, loss[loss=0.0547, simple_loss=0.07799, pruned_loss=0.008378, audio_tagging_loss=0.007328, over 13794.00 frames. ], tot_loss[loss=0.06461, simple_loss=0.08901, pruned_loss=0.01171, audio_tagging_loss=0.008398, over 3039428.14 frames. ], batch size: 52, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 08:59:10,517 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584600 2023-11-29 08:59:11,011 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.83 vs. limit=15.0 2023-11-29 08:59:26,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3897386.6666666665, ans=0.2 2023-11-29 08:59:47,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.33 vs. limit=5.0 2023-11-29 08:59:53,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3897520.0, ans=0.0 2023-11-29 09:00:04,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3897586.6666666665, ans=0.1 2023-11-29 09:00:10,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3897653.3333333335, ans=0.0 2023-11-29 09:00:11,661 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7500, loss[loss=0.06761, simple_loss=0.09721, pruned_loss=0.01075, audio_tagging_loss=0.008256, over 15228.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08955, pruned_loss=0.01186, audio_tagging_loss=0.008272, over 3039144.23 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:00:11,747 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584650 2023-11-29 09:00:15,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3897653.3333333335, ans=0.0 2023-11-29 09:00:48,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3897853.3333333335, ans=0.125 2023-11-29 09:00:49,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3897853.3333333335, ans=0.0 2023-11-29 09:00:55,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3897853.3333333335, ans=0.0 2023-11-29 09:00:57,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.707e+01 9.073e+01 9.674e+01 1.060e+02 1.310e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-29 09:01:01,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3897920.0, ans=0.125 2023-11-29 09:01:04,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3897920.0, ans=0.125 2023-11-29 09:01:12,415 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7550, loss[loss=0.07588, simple_loss=0.1091, pruned_loss=0.01232, audio_tagging_loss=0.008982, over 15131.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08903, pruned_loss=0.01162, audio_tagging_loss=0.008258, over 3041513.09 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:01:12,502 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584700 2023-11-29 09:01:15,454 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2023-11-29 09:01:58,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3898186.6666666665, ans=0.0 2023-11-29 09:02:06,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3898253.3333333335, ans=0.2 2023-11-29 09:02:13,862 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7600, loss[loss=0.0821, simple_loss=0.1227, pruned_loss=0.01355, audio_tagging_loss=0.007189, over 15659.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08933, pruned_loss=0.01167, audio_tagging_loss=0.008251, over 3044395.88 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:02:13,964 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584750 2023-11-29 09:02:37,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3898386.6666666665, ans=0.04949747468305833 2023-11-29 09:02:52,700 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:02:58,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3898520.0, ans=0.2 2023-11-29 09:02:59,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3898520.0, ans=0.125 2023-11-29 09:03:00,045 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.761e+01 9.018e+01 9.691e+01 1.036e+02 1.365e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 09:03:04,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3898586.6666666665, ans=0.125 2023-11-29 09:03:07,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3898586.6666666665, ans=0.125 2023-11-29 09:03:10,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3898586.6666666665, ans=0.125 2023-11-29 09:03:16,493 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7650, loss[loss=0.06063, simple_loss=0.0828, pruned_loss=0.009384, audio_tagging_loss=0.009847, over 14310.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08921, pruned_loss=0.01166, audio_tagging_loss=0.008195, over 3042366.78 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:03:16,583 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584800 2023-11-29 09:03:21,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3898653.3333333335, ans=0.125 2023-11-29 09:03:31,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3898720.0, ans=0.1 2023-11-29 09:03:50,432 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.55 vs. limit=10.0 2023-11-29 09:03:56,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3898853.3333333335, ans=0.2 2023-11-29 09:04:07,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=22.5 2023-11-29 09:04:11,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3898920.0, ans=0.125 2023-11-29 09:04:18,688 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7700, loss[loss=0.06719, simple_loss=0.09332, pruned_loss=0.01164, audio_tagging_loss=0.008888, over 15544.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.09005, pruned_loss=0.01188, audio_tagging_loss=0.008137, over 3048596.56 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:04:18,794 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584850 2023-11-29 09:04:23,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3898986.6666666665, ans=0.125 2023-11-29 09:04:33,564 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:04:35,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3899053.3333333335, ans=0.0 2023-11-29 09:05:05,205 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.265e+01 9.378e+01 9.780e+01 1.035e+02 1.508e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-29 09:05:05,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3899186.6666666665, ans=15.0 2023-11-29 09:05:19,942 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7750, loss[loss=0.07863, simple_loss=0.106, pruned_loss=0.01797, audio_tagging_loss=0.007668, over 15112.00 frames. ], tot_loss[loss=0.06572, simple_loss=0.0907, pruned_loss=0.0121, audio_tagging_loss=0.008275, over 3050606.54 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:05:20,068 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584900 2023-11-29 09:05:52,959 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.59 vs. limit=15.0 2023-11-29 09:06:02,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3899520.0, ans=0.125 2023-11-29 09:06:21,802 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7800, loss[loss=0.07599, simple_loss=0.1112, pruned_loss=0.011, audio_tagging_loss=0.00941, over 16388.00 frames. ], tot_loss[loss=0.0659, simple_loss=0.09119, pruned_loss=0.01207, audio_tagging_loss=0.008228, over 3049275.78 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:06:21,903 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 584950 2023-11-29 09:06:30,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3899653.3333333335, ans=0.2 2023-11-29 09:06:49,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3899786.6666666665, ans=0.0 2023-11-29 09:06:59,884 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=12.0 2023-11-29 09:07:08,633 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.073e+01 9.693e+01 1.045e+02 1.224e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-29 09:07:10,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3899920.0, ans=0.2 2023-11-29 09:07:11,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3899920.0, ans=0.125 2023-11-29 09:07:12,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3899920.0, ans=0.1 2023-11-29 09:07:15,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3899920.0, ans=0.0 2023-11-29 09:07:16,081 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.99 vs. limit=15.0 2023-11-29 09:07:23,379 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7850, loss[loss=0.07727, simple_loss=0.1138, pruned_loss=0.0147, audio_tagging_loss=0.005668, over 16602.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.09057, pruned_loss=0.01209, audio_tagging_loss=0.008315, over 3043794.71 frames. ], batch size: 63, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:07:23,485 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585000 2023-11-29 09:07:40,045 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-29 09:07:59,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3900186.6666666665, ans=0.125 2023-11-29 09:08:02,563 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:08:08,012 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2023-11-29 09:08:19,793 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2023-11-29 09:08:24,373 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7900, loss[loss=0.07437, simple_loss=0.09758, pruned_loss=0.01671, audio_tagging_loss=0.00887, over 15921.00 frames. ], tot_loss[loss=0.06567, simple_loss=0.09042, pruned_loss=0.01205, audio_tagging_loss=0.008407, over 3045825.63 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:08:24,477 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585050 2023-11-29 09:08:46,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3900386.6666666665, ans=0.0 2023-11-29 09:08:54,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3900453.3333333335, ans=0.125 2023-11-29 09:08:54,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3900453.3333333335, ans=0.125 2023-11-29 09:09:10,028 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.168e+01 9.323e+01 9.871e+01 1.069e+02 1.326e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-29 09:09:11,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3900586.6666666665, ans=0.2 2023-11-29 09:09:16,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3900586.6666666665, ans=0.1 2023-11-29 09:09:20,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3900586.6666666665, ans=0.125 2023-11-29 09:09:23,929 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 7950, loss[loss=0.05766, simple_loss=0.07909, pruned_loss=0.008521, audio_tagging_loss=0.009595, over 15280.00 frames. ], tot_loss[loss=0.06569, simple_loss=0.09027, pruned_loss=0.01193, audio_tagging_loss=0.008619, over 3048095.23 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:09:24,004 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585100 2023-11-29 09:09:30,759 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2023-11-29 09:09:36,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3900720.0, ans=0.2 2023-11-29 09:09:40,430 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:09:50,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3900786.6666666665, ans=0.125 2023-11-29 09:10:24,884 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8000, loss[loss=0.07747, simple_loss=0.09762, pruned_loss=0.02153, audio_tagging_loss=0.007131, over 14757.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08972, pruned_loss=0.01206, audio_tagging_loss=0.008658, over 3042351.34 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:10:24,966 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585150 2023-11-29 09:10:41,942 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2023-11-29 09:10:47,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3901053.3333333335, ans=0.0 2023-11-29 09:10:47,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3901053.3333333335, ans=0.0 2023-11-29 09:10:52,107 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:10:59,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3901120.0, ans=0.04949747468305833 2023-11-29 09:11:11,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 9.013e+01 9.705e+01 1.030e+02 1.171e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-29 09:11:11,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3901186.6666666665, ans=0.1 2023-11-29 09:11:25,772 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8050, loss[loss=0.03941, simple_loss=0.05499, pruned_loss=0.004044, audio_tagging_loss=0.007868, over 16176.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08962, pruned_loss=0.01199, audio_tagging_loss=0.008667, over 3037605.43 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:11:25,859 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585200 2023-11-29 09:11:30,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3901320.0, ans=0.125 2023-11-29 09:11:33,163 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.46 vs. limit=12.0 2023-11-29 09:11:57,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3901453.3333333335, ans=0.2 2023-11-29 09:12:03,887 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2023-11-29 09:12:17,057 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2023-11-29 09:12:18,176 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.07 vs. limit=15.0 2023-11-29 09:12:18,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3901586.6666666665, ans=0.1 2023-11-29 09:12:28,018 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8100, loss[loss=0.06586, simple_loss=0.09378, pruned_loss=0.01086, audio_tagging_loss=0.008106, over 15378.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08909, pruned_loss=0.01208, audio_tagging_loss=0.008666, over 3030189.74 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:12:28,107 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585250 2023-11-29 09:12:30,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3901653.3333333335, ans=0.0 2023-11-29 09:12:42,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3901720.0, ans=0.125 2023-11-29 09:13:02,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3901786.6666666665, ans=0.0 2023-11-29 09:13:07,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3901853.3333333335, ans=0.0 2023-11-29 09:13:16,475 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.652e+01 9.257e+01 9.923e+01 1.057e+02 1.359e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-29 09:13:21,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3901920.0, ans=0.1 2023-11-29 09:13:24,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3901920.0, ans=0.1 2023-11-29 09:13:29,947 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8150, loss[loss=0.06687, simple_loss=0.09411, pruned_loss=0.009291, audio_tagging_loss=0.01052, over 15197.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.08977, pruned_loss=0.01206, audio_tagging_loss=0.00846, over 3033379.80 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:13:30,033 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585300 2023-11-29 09:13:59,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3902120.0, ans=0.125 2023-11-29 09:14:18,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3902253.3333333335, ans=0.2 2023-11-29 09:14:22,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3902253.3333333335, ans=0.0 2023-11-29 09:14:26,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2023-11-29 09:14:31,340 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8200, loss[loss=0.05288, simple_loss=0.06376, pruned_loss=0.009667, audio_tagging_loss=0.01133, over 15391.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08897, pruned_loss=0.01192, audio_tagging_loss=0.008405, over 3034864.44 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:14:31,415 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585350 2023-11-29 09:14:33,616 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:14:37,393 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.86 vs. limit=22.5 2023-11-29 09:14:38,264 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=15.0 2023-11-29 09:14:44,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3902386.6666666665, ans=0.1 2023-11-29 09:14:47,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3902386.6666666665, ans=0.09899494936611666 2023-11-29 09:14:55,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3902453.3333333335, ans=0.125 2023-11-29 09:14:58,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3902453.3333333335, ans=0.125 2023-11-29 09:15:00,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3902453.3333333335, ans=0.125 2023-11-29 09:15:19,729 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.775e+01 9.273e+01 9.882e+01 1.047e+02 1.240e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 09:15:34,286 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8250, loss[loss=0.04524, simple_loss=0.055, pruned_loss=0.004365, audio_tagging_loss=0.01337, over 17012.00 frames. ], tot_loss[loss=0.06426, simple_loss=0.08826, pruned_loss=0.01167, audio_tagging_loss=0.008457, over 3031885.41 frames. ], batch size: 66, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:15:34,402 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585400 2023-11-29 09:15:35,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3902653.3333333335, ans=0.2 2023-11-29 09:15:38,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3902653.3333333335, ans=0.07 2023-11-29 09:15:49,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3902720.0, ans=0.1 2023-11-29 09:15:50,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3902720.0, ans=0.125 2023-11-29 09:15:51,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3902720.0, ans=0.125 2023-11-29 09:16:07,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3902786.6666666665, ans=0.1 2023-11-29 09:16:36,704 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8300, loss[loss=0.04407, simple_loss=0.0579, pruned_loss=0.00685, audio_tagging_loss=0.008271, over 15373.00 frames. ], tot_loss[loss=0.06415, simple_loss=0.08805, pruned_loss=0.01165, audio_tagging_loss=0.008481, over 3037262.73 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:16:36,793 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585450 2023-11-29 09:16:49,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3903053.3333333335, ans=0.0 2023-11-29 09:16:54,947 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2023-11-29 09:16:59,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3903053.3333333335, ans=0.04949747468305833 2023-11-29 09:17:05,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3903120.0, ans=10.0 2023-11-29 09:17:19,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3903186.6666666665, ans=0.1 2023-11-29 09:17:20,433 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2023-11-29 09:17:24,337 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 8.975e+01 9.727e+01 1.046e+02 1.425e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-29 09:17:37,212 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8350, loss[loss=0.05791, simple_loss=0.08334, pruned_loss=0.009719, audio_tagging_loss=0.006519, over 15653.00 frames. ], tot_loss[loss=0.06413, simple_loss=0.08803, pruned_loss=0.01169, audio_tagging_loss=0.008423, over 3040738.37 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:17:37,303 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585500 2023-11-29 09:18:10,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3903453.3333333335, ans=0.125 2023-11-29 09:18:14,501 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-29 09:18:15,420 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2023-11-29 09:18:39,274 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8400, loss[loss=0.06698, simple_loss=0.09205, pruned_loss=0.01194, audio_tagging_loss=0.009017, over 15775.00 frames. ], tot_loss[loss=0.06405, simple_loss=0.08799, pruned_loss=0.01169, audio_tagging_loss=0.008358, over 3049586.81 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:18:39,351 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585550 2023-11-29 09:19:00,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3903720.0, ans=0.125 2023-11-29 09:19:10,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=12.0 2023-11-29 09:19:17,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3903853.3333333335, ans=0.0 2023-11-29 09:19:17,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3903853.3333333335, ans=0.125 2023-11-29 09:19:18,078 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2023-11-29 09:19:28,559 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 8.927e+01 9.448e+01 1.050e+02 1.277e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-29 09:19:38,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3903920.0, ans=0.125 2023-11-29 09:19:41,545 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8450, loss[loss=0.05334, simple_loss=0.07388, pruned_loss=0.007464, audio_tagging_loss=0.008941, over 15529.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08904, pruned_loss=0.01176, audio_tagging_loss=0.008286, over 3052320.20 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:19:41,649 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585600 2023-11-29 09:19:43,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3903986.6666666665, ans=0.2 2023-11-29 09:19:45,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3903986.6666666665, ans=0.0 2023-11-29 09:19:55,547 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=15.0 2023-11-29 09:20:18,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3904186.6666666665, ans=0.1 2023-11-29 09:20:42,690 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8500, loss[loss=0.04659, simple_loss=0.05634, pruned_loss=0.008189, audio_tagging_loss=0.01023, over 13984.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08926, pruned_loss=0.0119, audio_tagging_loss=0.008326, over 3048426.51 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:20:42,774 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585650 2023-11-29 09:21:02,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3904386.6666666665, ans=0.0 2023-11-29 09:21:14,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3904453.3333333335, ans=0.1 2023-11-29 09:21:31,782 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.672e+01 9.118e+01 9.879e+01 1.041e+02 1.425e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 09:21:35,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3904586.6666666665, ans=0.125 2023-11-29 09:21:38,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3904586.6666666665, ans=0.07 2023-11-29 09:21:44,141 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8550, loss[loss=0.05019, simple_loss=0.06654, pruned_loss=0.006941, audio_tagging_loss=0.009972, over 15260.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08946, pruned_loss=0.01201, audio_tagging_loss=0.00836, over 3052203.64 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:21:44,219 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585700 2023-11-29 09:21:44,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3904653.3333333335, ans=0.125 2023-11-29 09:21:47,164 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2023-11-29 09:21:52,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3904653.3333333335, ans=0.0 2023-11-29 09:22:13,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.14 vs. limit=10.0 2023-11-29 09:22:16,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3904786.6666666665, ans=0.125 2023-11-29 09:22:42,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3904920.0, ans=0.0 2023-11-29 09:22:46,539 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8600, loss[loss=0.04927, simple_loss=0.06664, pruned_loss=0.008433, audio_tagging_loss=0.007514, over 16323.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08887, pruned_loss=0.01181, audio_tagging_loss=0.008445, over 3045646.94 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:22:46,639 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585750 2023-11-29 09:22:52,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3904986.6666666665, ans=0.2 2023-11-29 09:22:54,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.09 vs. limit=15.0 2023-11-29 09:23:00,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3905053.3333333335, ans=0.125 2023-11-29 09:23:15,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3905120.0, ans=0.125 2023-11-29 09:23:17,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3905120.0, ans=0.0 2023-11-29 09:23:24,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3905186.6666666665, ans=0.125 2023-11-29 09:23:36,039 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 9.131e+01 9.509e+01 1.045e+02 1.246e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-29 09:23:43,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3905253.3333333335, ans=0.0 2023-11-29 09:23:47,810 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8650, loss[loss=0.06346, simple_loss=0.08721, pruned_loss=0.01034, audio_tagging_loss=0.009519, over 15902.00 frames. ], tot_loss[loss=0.06457, simple_loss=0.08843, pruned_loss=0.01182, audio_tagging_loss=0.008537, over 3049549.91 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:23:47,879 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585800 2023-11-29 09:24:23,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3905453.3333333335, ans=0.0 2023-11-29 09:24:49,342 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8700, loss[loss=0.06186, simple_loss=0.08645, pruned_loss=0.009034, audio_tagging_loss=0.0096, over 16228.00 frames. ], tot_loss[loss=0.06517, simple_loss=0.08959, pruned_loss=0.01187, audio_tagging_loss=0.008503, over 3049346.80 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:24:49,444 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585850 2023-11-29 09:24:53,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3905653.3333333335, ans=0.125 2023-11-29 09:25:27,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3905853.3333333335, ans=0.125 2023-11-29 09:25:38,432 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.006e+01 9.124e+01 9.803e+01 1.053e+02 1.295e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-29 09:25:51,253 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8750, loss[loss=0.06984, simple_loss=0.0925, pruned_loss=0.01488, audio_tagging_loss=0.008703, over 15505.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08983, pruned_loss=0.01194, audio_tagging_loss=0.008582, over 3050776.52 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:25:51,329 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585900 2023-11-29 09:26:04,207 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.08 vs. limit=15.0 2023-11-29 09:26:04,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3906053.3333333335, ans=0.0 2023-11-29 09:26:10,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3906053.3333333335, ans=0.2 2023-11-29 09:26:13,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3906053.3333333335, ans=0.0 2023-11-29 09:26:16,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3906120.0, ans=0.125 2023-11-29 09:26:41,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3906253.3333333335, ans=0.125 2023-11-29 09:26:45,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3906253.3333333335, ans=0.0 2023-11-29 09:26:51,799 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8800, loss[loss=0.09235, simple_loss=0.1252, pruned_loss=0.01967, audio_tagging_loss=0.01006, over 15574.00 frames. ], tot_loss[loss=0.06547, simple_loss=0.08971, pruned_loss=0.01196, audio_tagging_loss=0.008655, over 3053341.61 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:26:51,893 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 585950 2023-11-29 09:26:55,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2023-11-29 09:27:09,644 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=15.0 2023-11-29 09:27:31,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3906520.0, ans=0.0 2023-11-29 09:27:40,491 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.520e+01 9.118e+01 9.794e+01 1.059e+02 1.251e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-29 09:27:52,760 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8850, loss[loss=0.08051, simple_loss=0.1176, pruned_loss=0.01552, audio_tagging_loss=0.006204, over 16935.00 frames. ], tot_loss[loss=0.06633, simple_loss=0.09111, pruned_loss=0.01218, audio_tagging_loss=0.00859, over 3051098.47 frames. ], batch size: 63, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:27:52,834 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586000 2023-11-29 09:27:54,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3906653.3333333335, ans=0.0 2023-11-29 09:27:55,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3906653.3333333335, ans=0.125 2023-11-29 09:27:57,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3906653.3333333335, ans=0.125 2023-11-29 09:28:05,781 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:28:53,392 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8900, loss[loss=0.06248, simple_loss=0.08784, pruned_loss=0.01095, audio_tagging_loss=0.007612, over 15953.00 frames. ], tot_loss[loss=0.06649, simple_loss=0.09157, pruned_loss=0.01221, audio_tagging_loss=0.008492, over 3056850.47 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:28:53,481 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586050 2023-11-29 09:29:04,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3906986.6666666665, ans=0.125 2023-11-29 09:29:38,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3907186.6666666665, ans=0.1 2023-11-29 09:29:42,067 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=22.5 2023-11-29 09:29:42,542 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 9.012e+01 9.649e+01 1.060e+02 1.281e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 09:29:55,273 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 8950, loss[loss=0.0728, simple_loss=0.1107, pruned_loss=0.01166, audio_tagging_loss=0.005772, over 16878.00 frames. ], tot_loss[loss=0.06604, simple_loss=0.09095, pruned_loss=0.01214, audio_tagging_loss=0.008427, over 3057437.04 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:29:55,349 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586100 2023-11-29 09:29:56,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3907320.0, ans=0.0 2023-11-29 09:29:59,372 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2023-11-29 09:30:04,927 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2023-11-29 09:30:13,986 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2023-11-29 09:30:16,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3907386.6666666665, ans=0.0 2023-11-29 09:30:28,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3907453.3333333335, ans=0.125 2023-11-29 09:30:31,328 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=15.0 2023-11-29 09:30:48,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3907586.6666666665, ans=0.1 2023-11-29 09:30:55,548 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=22.5 2023-11-29 09:30:56,177 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9000, loss[loss=0.05917, simple_loss=0.07834, pruned_loss=0.01127, audio_tagging_loss=0.008725, over 15065.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.09053, pruned_loss=0.01204, audio_tagging_loss=0.008309, over 3056231.98 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:30:56,177 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 09:31:26,584 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.3916, 3.0180, 3.2861, 2.9772, 3.6257, 3.7719, 3.2377, 3.2272], device='cuda:2') 2023-11-29 09:31:36,005 INFO [train_asr.py:1267] (2/4) Epoch 49, validation: loss=0.05863, simple_loss=0.05047, pruned_loss=0.00547, audio_tagging_loss=0.02792, over 4681554.00 frames. 2023-11-29 09:31:36,005 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 09:31:36,091 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586150 2023-11-29 09:31:45,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3907653.3333333335, ans=0.125 2023-11-29 09:31:46,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3907653.3333333335, ans=0.125 2023-11-29 09:31:54,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3907720.0, ans=0.0 2023-11-29 09:32:13,234 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2023-11-29 09:32:25,437 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.432e+01 9.291e+01 1.003e+02 1.087e+02 1.506e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-29 09:32:37,781 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9050, loss[loss=0.06613, simple_loss=0.09637, pruned_loss=0.01129, audio_tagging_loss=0.00666, over 14799.00 frames. ], tot_loss[loss=0.06541, simple_loss=0.09018, pruned_loss=0.012, audio_tagging_loss=0.008317, over 3059148.24 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:32:37,892 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586200 2023-11-29 09:32:41,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3907986.6666666665, ans=0.07 2023-11-29 09:32:43,305 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-29 09:32:54,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3908053.3333333335, ans=0.125 2023-11-29 09:33:02,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3908120.0, ans=0.2 2023-11-29 09:33:29,037 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.41 vs. limit=22.5 2023-11-29 09:33:39,612 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9100, loss[loss=0.05544, simple_loss=0.07183, pruned_loss=0.01033, audio_tagging_loss=0.009196, over 14949.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.09016, pruned_loss=0.012, audio_tagging_loss=0.008279, over 3059397.34 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:33:39,697 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586250 2023-11-29 09:33:39,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3908320.0, ans=0.1 2023-11-29 09:33:50,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3908386.6666666665, ans=0.125 2023-11-29 09:34:01,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3908386.6666666665, ans=10.0 2023-11-29 09:34:13,817 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=22.5 2023-11-29 09:34:14,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3908453.3333333335, ans=0.125 2023-11-29 09:34:25,981 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:34:30,300 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.316e+01 9.126e+01 9.745e+01 1.074e+02 1.723e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-29 09:34:41,027 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9150, loss[loss=0.07743, simple_loss=0.113, pruned_loss=0.01541, audio_tagging_loss=0.005498, over 15712.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08956, pruned_loss=0.01187, audio_tagging_loss=0.0083, over 3057002.15 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:34:41,100 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586300 2023-11-29 09:34:41,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3908653.3333333335, ans=0.125 2023-11-29 09:34:47,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3908653.3333333335, ans=0.1 2023-11-29 09:35:11,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3908786.6666666665, ans=0.2 2023-11-29 09:35:21,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3908853.3333333335, ans=22.5 2023-11-29 09:35:35,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3908920.0, ans=0.0 2023-11-29 09:35:44,162 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9200, loss[loss=0.06118, simple_loss=0.08735, pruned_loss=0.01019, audio_tagging_loss=0.007316, over 16582.00 frames. ], tot_loss[loss=0.06451, simple_loss=0.08868, pruned_loss=0.01177, audio_tagging_loss=0.008395, over 3048335.80 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:35:44,243 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586350 2023-11-29 09:36:06,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3909053.3333333335, ans=0.0 2023-11-29 09:36:34,381 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 8.956e+01 9.492e+01 1.042e+02 1.619e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-29 09:36:45,675 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9250, loss[loss=0.07157, simple_loss=0.1006, pruned_loss=0.01316, audio_tagging_loss=0.008116, over 16635.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08851, pruned_loss=0.01169, audio_tagging_loss=0.008385, over 3051964.13 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:36:45,758 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586400 2023-11-29 09:36:46,182 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-11-29 09:36:54,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3909320.0, ans=0.1 2023-11-29 09:37:12,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3909453.3333333335, ans=0.125 2023-11-29 09:37:22,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3909520.0, ans=0.2 2023-11-29 09:37:25,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3909520.0, ans=0.125 2023-11-29 09:37:30,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3909520.0, ans=0.025 2023-11-29 09:37:34,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3909586.6666666665, ans=0.05 2023-11-29 09:37:47,388 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9300, loss[loss=0.0498, simple_loss=0.06803, pruned_loss=0.006017, audio_tagging_loss=0.009763, over 14718.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08894, pruned_loss=0.01184, audio_tagging_loss=0.0084, over 3054801.75 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:37:47,535 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586450 2023-11-29 09:37:55,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3909653.3333333335, ans=0.0 2023-11-29 09:37:59,592 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2023-11-29 09:38:16,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3909786.6666666665, ans=0.125 2023-11-29 09:38:34,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3909853.3333333335, ans=0.1 2023-11-29 09:38:38,306 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.899e+01 9.042e+01 9.889e+01 1.074e+02 1.345e+02, threshold=1.978e+02, percent-clipped=0.0 2023-11-29 09:38:49,381 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9350, loss[loss=0.07985, simple_loss=0.1123, pruned_loss=0.01791, audio_tagging_loss=0.005762, over 14485.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08837, pruned_loss=0.01171, audio_tagging_loss=0.008431, over 3055726.37 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:38:49,474 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586500 2023-11-29 09:38:54,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3909986.6666666665, ans=0.1 2023-11-29 09:38:54,339 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:39:10,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3910053.3333333335, ans=0.125 2023-11-29 09:39:32,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3910186.6666666665, ans=0.125 2023-11-29 09:39:48,938 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.03 vs. limit=15.0 2023-11-29 09:39:51,667 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9400, loss[loss=0.07601, simple_loss=0.1109, pruned_loss=0.01279, audio_tagging_loss=0.007778, over 16114.00 frames. ], tot_loss[loss=0.06427, simple_loss=0.08828, pruned_loss=0.01155, audio_tagging_loss=0.008587, over 3052789.02 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:39:51,757 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586550 2023-11-29 09:40:00,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3910320.0, ans=0.0 2023-11-29 09:40:06,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3910386.6666666665, ans=0.1 2023-11-29 09:40:08,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3910386.6666666665, ans=0.0 2023-11-29 09:40:22,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3910453.3333333335, ans=0.04949747468305833 2023-11-29 09:40:42,132 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2023-11-29 09:40:42,637 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.237e+01 8.989e+01 9.545e+01 1.042e+02 1.282e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-29 09:40:51,746 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:40:52,740 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:40:53,894 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9450, loss[loss=0.06722, simple_loss=0.09561, pruned_loss=0.01218, audio_tagging_loss=0.007235, over 16000.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.0889, pruned_loss=0.01163, audio_tagging_loss=0.008638, over 3048385.90 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:40:53,991 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586600 2023-11-29 09:41:01,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3910653.3333333335, ans=0.0 2023-11-29 09:41:12,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3910720.0, ans=0.125 2023-11-29 09:41:24,802 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=22.5 2023-11-29 09:41:31,725 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2023-11-29 09:41:43,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3910920.0, ans=0.0 2023-11-29 09:41:51,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3910920.0, ans=0.0 2023-11-29 09:41:55,135 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9500, loss[loss=0.07055, simple_loss=0.1021, pruned_loss=0.01176, audio_tagging_loss=0.007727, over 15769.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08917, pruned_loss=0.0117, audio_tagging_loss=0.008629, over 3049866.64 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:41:55,213 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586650 2023-11-29 09:41:59,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3910986.6666666665, ans=0.125 2023-11-29 09:42:04,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3910986.6666666665, ans=0.1 2023-11-29 09:42:23,614 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:42:44,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3911253.3333333335, ans=0.0 2023-11-29 09:42:45,156 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.304e+01 8.983e+01 9.496e+01 1.015e+02 1.388e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-29 09:42:50,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3911253.3333333335, ans=0.125 2023-11-29 09:42:52,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3911253.3333333335, ans=0.2 2023-11-29 09:42:55,719 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9550, loss[loss=0.06491, simple_loss=0.08211, pruned_loss=0.01173, audio_tagging_loss=0.01213, over 15616.00 frames. ], tot_loss[loss=0.06609, simple_loss=0.09078, pruned_loss=0.01197, audio_tagging_loss=0.008726, over 3050000.38 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:42:55,800 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586700 2023-11-29 09:42:57,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3911320.0, ans=0.125 2023-11-29 09:43:04,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3911320.0, ans=0.5 2023-11-29 09:43:13,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3911386.6666666665, ans=0.1 2023-11-29 09:43:25,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3911453.3333333335, ans=0.1 2023-11-29 09:43:52,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3911586.6666666665, ans=0.04949747468305833 2023-11-29 09:43:53,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3911586.6666666665, ans=0.125 2023-11-29 09:43:58,368 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9600, loss[loss=0.08186, simple_loss=0.111, pruned_loss=0.01901, audio_tagging_loss=0.007318, over 15519.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09031, pruned_loss=0.01199, audio_tagging_loss=0.008689, over 3051666.03 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 09:43:58,482 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586750 2023-11-29 09:44:20,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3911720.0, ans=0.0 2023-11-29 09:44:26,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3911786.6666666665, ans=0.0 2023-11-29 09:44:29,317 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.45 vs. limit=22.5 2023-11-29 09:44:49,663 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.510e+01 9.041e+01 9.598e+01 1.061e+02 1.358e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 09:45:00,424 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9650, loss[loss=0.06866, simple_loss=0.099, pruned_loss=0.01171, audio_tagging_loss=0.007449, over 15677.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.09006, pruned_loss=0.01198, audio_tagging_loss=0.008636, over 3054421.31 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:45:00,519 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586800 2023-11-29 09:45:02,500 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=8.0 2023-11-29 09:45:08,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3911986.6666666665, ans=0.125 2023-11-29 09:45:34,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3912120.0, ans=0.0 2023-11-29 09:45:54,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3912253.3333333335, ans=0.2 2023-11-29 09:46:01,566 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9700, loss[loss=0.07873, simple_loss=0.1049, pruned_loss=0.01897, audio_tagging_loss=0.007305, over 15712.00 frames. ], tot_loss[loss=0.0658, simple_loss=0.0904, pruned_loss=0.01206, audio_tagging_loss=0.008543, over 3055067.61 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:46:01,672 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586850 2023-11-29 09:46:06,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3912320.0, ans=0.07 2023-11-29 09:46:19,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3912386.6666666665, ans=0.0 2023-11-29 09:46:22,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3912386.6666666665, ans=0.0 2023-11-29 09:46:26,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3912453.3333333335, ans=0.5 2023-11-29 09:46:35,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.40 vs. limit=6.0 2023-11-29 09:46:48,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3912520.0, ans=0.0 2023-11-29 09:46:53,246 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 9.022e+01 9.629e+01 1.062e+02 1.416e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 09:47:01,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3912586.6666666665, ans=0.0 2023-11-29 09:47:03,541 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9750, loss[loss=0.06361, simple_loss=0.09745, pruned_loss=0.008223, audio_tagging_loss=0.006659, over 16060.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.09021, pruned_loss=0.01193, audio_tagging_loss=0.008452, over 3059193.33 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:47:03,656 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586900 2023-11-29 09:47:09,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3912653.3333333335, ans=0.125 2023-11-29 09:47:15,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3912720.0, ans=0.5 2023-11-29 09:47:52,641 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.21 vs. limit=15.0 2023-11-29 09:47:55,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3912920.0, ans=0.0 2023-11-29 09:48:07,003 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9800, loss[loss=0.06939, simple_loss=0.09097, pruned_loss=0.01307, audio_tagging_loss=0.01083, over 15886.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.09031, pruned_loss=0.01189, audio_tagging_loss=0.008403, over 3054739.47 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:48:07,080 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 586950 2023-11-29 09:48:21,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3913053.3333333335, ans=0.125 2023-11-29 09:48:52,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3913186.6666666665, ans=0.125 2023-11-29 09:48:57,978 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2023-11-29 09:48:59,763 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.885e+01 9.275e+01 1.009e+02 1.063e+02 1.388e+02, threshold=2.019e+02, percent-clipped=0.0 2023-11-29 09:49:02,190 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:49:08,180 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9850, loss[loss=0.0721, simple_loss=0.1015, pruned_loss=0.012, audio_tagging_loss=0.009333, over 14985.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.09006, pruned_loss=0.01192, audio_tagging_loss=0.008374, over 3047179.03 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 8.0 2023-11-29 09:49:08,262 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587000 2023-11-29 09:49:17,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3913320.0, ans=0.125 2023-11-29 09:50:10,920 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9900, loss[loss=0.07312, simple_loss=0.09699, pruned_loss=0.01709, audio_tagging_loss=0.007542, over 14864.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08904, pruned_loss=0.01182, audio_tagging_loss=0.008424, over 3049229.21 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 8.0 2023-11-29 09:50:11,007 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587050 2023-11-29 09:50:18,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3913653.3333333335, ans=0.95 2023-11-29 09:50:18,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3913653.3333333335, ans=0.125 2023-11-29 09:50:19,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3913653.3333333335, ans=0.125 2023-11-29 09:50:28,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3913720.0, ans=15.0 2023-11-29 09:50:31,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3913720.0, ans=0.0 2023-11-29 09:50:42,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.89 vs. limit=15.0 2023-11-29 09:50:48,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3913853.3333333335, ans=0.125 2023-11-29 09:50:54,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3913853.3333333335, ans=0.125 2023-11-29 09:50:58,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3913853.3333333335, ans=0.125 2023-11-29 09:51:03,914 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 9.220e+01 9.633e+01 1.039e+02 1.360e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-29 09:51:07,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3913920.0, ans=0.125 2023-11-29 09:51:12,750 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 9950, loss[loss=0.07872, simple_loss=0.1007, pruned_loss=0.01572, audio_tagging_loss=0.01267, over 15665.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.08946, pruned_loss=0.01193, audio_tagging_loss=0.008371, over 3047813.70 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 8.0 2023-11-29 09:51:12,833 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587100 2023-11-29 09:51:19,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3913986.6666666665, ans=0.0 2023-11-29 09:51:30,426 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.42 vs. limit=22.5 2023-11-29 09:51:40,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3914120.0, ans=0.0 2023-11-29 09:51:49,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3914186.6666666665, ans=0.125 2023-11-29 09:52:09,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3914253.3333333335, ans=0.1 2023-11-29 09:52:14,038 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10000, loss[loss=0.07461, simple_loss=0.1108, pruned_loss=0.01274, audio_tagging_loss=0.006455, over 14619.00 frames. ], tot_loss[loss=0.06493, simple_loss=0.08934, pruned_loss=0.01196, audio_tagging_loss=0.008297, over 3046081.48 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:52:14,162 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587150 2023-11-29 09:52:23,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3914320.0, ans=0.0 2023-11-29 09:52:38,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3914453.3333333335, ans=0.0 2023-11-29 09:52:38,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3914453.3333333335, ans=0.07 2023-11-29 09:52:48,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3914453.3333333335, ans=0.125 2023-11-29 09:52:57,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3914520.0, ans=0.2 2023-11-29 09:53:05,661 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.30 vs. limit=15.0 2023-11-29 09:53:07,085 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 9.076e+01 9.668e+01 1.049e+02 3.214e+02, threshold=1.934e+02, percent-clipped=1.0 2023-11-29 09:53:14,742 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2023-11-29 09:53:15,304 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10050, loss[loss=0.07348, simple_loss=0.1083, pruned_loss=0.01185, audio_tagging_loss=0.007449, over 15565.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.0891, pruned_loss=0.01186, audio_tagging_loss=0.008351, over 3050126.46 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:53:15,390 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587200 2023-11-29 09:53:28,168 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.43 vs. limit=10.0 2023-11-29 09:53:38,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3914720.0, ans=0.125 2023-11-29 09:53:47,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3914786.6666666665, ans=0.1 2023-11-29 09:53:49,927 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.53 vs. limit=6.0 2023-11-29 09:54:16,943 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10100, loss[loss=0.06134, simple_loss=0.09051, pruned_loss=0.008134, audio_tagging_loss=0.007952, over 15866.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08886, pruned_loss=0.01182, audio_tagging_loss=0.008407, over 3049308.61 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:54:17,059 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587250 2023-11-29 09:54:19,392 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.81 vs. limit=15.0 2023-11-29 09:54:36,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3915053.3333333335, ans=0.1 2023-11-29 09:54:36,454 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 09:54:55,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3915186.6666666665, ans=0.125 2023-11-29 09:54:59,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3915186.6666666665, ans=0.1 2023-11-29 09:55:06,667 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:55:10,575 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.962e+01 9.039e+01 9.666e+01 1.026e+02 1.279e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-29 09:55:19,726 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10150, loss[loss=0.06331, simple_loss=0.08339, pruned_loss=0.01328, audio_tagging_loss=0.00834, over 14798.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08953, pruned_loss=0.01195, audio_tagging_loss=0.008434, over 3047857.02 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:55:19,812 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587300 2023-11-29 09:55:43,498 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-11-29 09:55:46,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3915453.3333333335, ans=0.0 2023-11-29 09:55:48,043 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=15.0 2023-11-29 09:55:48,751 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:55:50,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3915453.3333333335, ans=0.125 2023-11-29 09:55:56,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3915520.0, ans=0.125 2023-11-29 09:56:02,142 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.34 vs. limit=15.0 2023-11-29 09:56:20,701 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10200, loss[loss=0.06628, simple_loss=0.09201, pruned_loss=0.01206, audio_tagging_loss=0.008215, over 15190.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08843, pruned_loss=0.01175, audio_tagging_loss=0.008666, over 3044062.51 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:56:20,798 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587350 2023-11-29 09:56:25,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3915653.3333333335, ans=15.0 2023-11-29 09:56:32,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3915720.0, ans=0.125 2023-11-29 09:56:44,811 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 09:56:49,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3915786.6666666665, ans=0.0 2023-11-29 09:56:54,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3915786.6666666665, ans=0.125 2023-11-29 09:56:59,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3915853.3333333335, ans=0.1 2023-11-29 09:57:01,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.60 vs. limit=22.5 2023-11-29 09:57:14,095 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.735e+01 8.990e+01 9.483e+01 1.035e+02 1.443e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-29 09:57:22,318 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10250, loss[loss=0.07115, simple_loss=0.09316, pruned_loss=0.01611, audio_tagging_loss=0.008464, over 16521.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08958, pruned_loss=0.01191, audio_tagging_loss=0.008633, over 3052445.71 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:57:22,441 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587400 2023-11-29 09:57:42,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3916053.3333333335, ans=0.125 2023-11-29 09:58:06,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3916186.6666666665, ans=0.125 2023-11-29 09:58:09,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3916186.6666666665, ans=0.125 2023-11-29 09:58:25,270 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10300, loss[loss=0.05891, simple_loss=0.07493, pruned_loss=0.009767, audio_tagging_loss=0.01168, over 16082.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.08964, pruned_loss=0.01201, audio_tagging_loss=0.00868, over 3049276.54 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:58:25,348 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587450 2023-11-29 09:58:33,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3916320.0, ans=0.125 2023-11-29 09:58:59,139 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=22.5 2023-11-29 09:59:10,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3916520.0, ans=0.125 2023-11-29 09:59:18,314 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.769e+01 9.346e+01 9.831e+01 1.081e+02 1.349e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-29 09:59:18,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3916586.6666666665, ans=0.125 2023-11-29 09:59:27,079 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10350, loss[loss=0.05281, simple_loss=0.07156, pruned_loss=0.006771, audio_tagging_loss=0.01026, over 15359.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08984, pruned_loss=0.01189, audio_tagging_loss=0.008798, over 3049550.35 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 09:59:27,156 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587500 2023-11-29 09:59:34,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3916653.3333333335, ans=0.125 2023-11-29 09:59:36,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3916653.3333333335, ans=0.125 2023-11-29 09:59:50,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3916720.0, ans=0.125 2023-11-29 10:00:02,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3916786.6666666665, ans=0.0 2023-11-29 10:00:06,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3916853.3333333335, ans=10.0 2023-11-29 10:00:16,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3916920.0, ans=15.0 2023-11-29 10:00:23,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3916920.0, ans=0.125 2023-11-29 10:00:28,913 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10400, loss[loss=0.06645, simple_loss=0.09675, pruned_loss=0.01083, audio_tagging_loss=0.007248, over 14801.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08951, pruned_loss=0.01179, audio_tagging_loss=0.008794, over 3043719.27 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:00:28,998 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587550 2023-11-29 10:00:32,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3916986.6666666665, ans=0.1 2023-11-29 10:00:40,936 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:00:49,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3917053.3333333335, ans=0.125 2023-11-29 10:00:53,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3917120.0, ans=0.125 2023-11-29 10:01:11,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3917186.6666666665, ans=0.0 2023-11-29 10:01:13,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3917186.6666666665, ans=0.1 2023-11-29 10:01:22,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 9.130e+01 9.665e+01 1.040e+02 1.258e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-29 10:01:31,585 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10450, loss[loss=0.07598, simple_loss=0.1072, pruned_loss=0.01478, audio_tagging_loss=0.00761, over 15984.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.0885, pruned_loss=0.01171, audio_tagging_loss=0.008809, over 3054429.21 frames. ], batch size: 59, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:01:31,679 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587600 2023-11-29 10:01:44,487 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.50 vs. limit=6.0 2023-11-29 10:01:54,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3917386.6666666665, ans=0.125 2023-11-29 10:01:59,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3917453.3333333335, ans=0.0 2023-11-29 10:02:10,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3917520.0, ans=0.125 2023-11-29 10:02:23,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3917586.6666666665, ans=0.0 2023-11-29 10:02:24,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3917586.6666666665, ans=0.125 2023-11-29 10:02:33,199 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10500, loss[loss=0.05049, simple_loss=0.0769, pruned_loss=0.005827, audio_tagging_loss=0.006214, over 15440.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08835, pruned_loss=0.01165, audio_tagging_loss=0.008634, over 3049874.51 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:02:33,273 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587650 2023-11-29 10:02:34,924 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=22.5 2023-11-29 10:02:34,992 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.64 vs. limit=15.0 2023-11-29 10:02:41,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3917653.3333333335, ans=0.0 2023-11-29 10:02:45,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3917720.0, ans=0.125 2023-11-29 10:02:51,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3917720.0, ans=0.0 2023-11-29 10:03:26,858 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.916e+01 9.691e+01 1.026e+02 1.437e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 10:03:33,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3917920.0, ans=0.1 2023-11-29 10:03:35,733 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10550, loss[loss=0.06621, simple_loss=0.09932, pruned_loss=0.008199, audio_tagging_loss=0.008348, over 16744.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08896, pruned_loss=0.01171, audio_tagging_loss=0.00854, over 3047985.95 frames. ], batch size: 61, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:03:35,804 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587700 2023-11-29 10:03:50,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3918053.3333333335, ans=0.125 2023-11-29 10:04:22,703 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.54 vs. limit=15.0 2023-11-29 10:04:28,494 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=22.5 2023-11-29 10:04:38,651 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10600, loss[loss=0.08824, simple_loss=0.1244, pruned_loss=0.01936, audio_tagging_loss=0.006679, over 16004.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08918, pruned_loss=0.01193, audio_tagging_loss=0.00849, over 3041808.42 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:04:38,742 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587750 2023-11-29 10:04:38,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3918320.0, ans=0.125 2023-11-29 10:04:43,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3918320.0, ans=0.2 2023-11-29 10:04:46,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3918320.0, ans=0.125 2023-11-29 10:04:46,969 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2023-11-29 10:04:56,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3918386.6666666665, ans=0.125 2023-11-29 10:05:06,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3918453.3333333335, ans=0.0 2023-11-29 10:05:31,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.005e+01 8.953e+01 9.680e+01 1.038e+02 1.330e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 10:05:33,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3918586.6666666665, ans=0.2 2023-11-29 10:05:39,665 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2023-11-29 10:05:40,134 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10650, loss[loss=0.07937, simple_loss=0.1069, pruned_loss=0.01792, audio_tagging_loss=0.007984, over 14945.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08872, pruned_loss=0.01185, audio_tagging_loss=0.008491, over 3041114.84 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:05:40,222 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587800 2023-11-29 10:05:57,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3918720.0, ans=0.125 2023-11-29 10:06:15,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3918786.6666666665, ans=0.125 2023-11-29 10:06:22,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3918853.3333333335, ans=0.125 2023-11-29 10:06:42,300 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10700, loss[loss=0.07399, simple_loss=0.1033, pruned_loss=0.01313, audio_tagging_loss=0.009229, over 16778.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08971, pruned_loss=0.0119, audio_tagging_loss=0.00838, over 3046261.48 frames. ], batch size: 58, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:06:42,397 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587850 2023-11-29 10:06:54,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3919053.3333333335, ans=0.125 2023-11-29 10:06:55,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3919053.3333333335, ans=0.125 2023-11-29 10:06:58,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3919053.3333333335, ans=0.125 2023-11-29 10:07:04,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3919053.3333333335, ans=0.125 2023-11-29 10:07:17,868 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2023-11-29 10:07:28,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3919186.6666666665, ans=0.2 2023-11-29 10:07:36,076 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.832e+01 9.064e+01 9.584e+01 1.018e+02 1.338e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 10:07:44,329 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10750, loss[loss=0.05441, simple_loss=0.06631, pruned_loss=0.009425, audio_tagging_loss=0.01183, over 14997.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08997, pruned_loss=0.01192, audio_tagging_loss=0.008315, over 3043272.62 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:07:44,421 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587900 2023-11-29 10:07:52,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3919320.0, ans=0.125 2023-11-29 10:07:56,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3919386.6666666665, ans=0.0 2023-11-29 10:08:20,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3919520.0, ans=0.015 2023-11-29 10:08:20,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3919520.0, ans=0.125 2023-11-29 10:08:25,611 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.43 vs. limit=15.0 2023-11-29 10:08:33,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3919586.6666666665, ans=0.125 2023-11-29 10:08:44,883 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10800, loss[loss=0.07093, simple_loss=0.1045, pruned_loss=0.01261, audio_tagging_loss=0.006068, over 15571.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08936, pruned_loss=0.01185, audio_tagging_loss=0.008282, over 3045029.79 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:08:44,978 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 587950 2023-11-29 10:09:14,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3919786.6666666665, ans=0.07 2023-11-29 10:09:39,265 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.936e+01 9.085e+01 9.620e+01 1.015e+02 1.229e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 10:09:47,015 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10850, loss[loss=0.06802, simple_loss=0.09764, pruned_loss=0.01191, audio_tagging_loss=0.007296, over 14911.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08933, pruned_loss=0.01187, audio_tagging_loss=0.008328, over 3037355.31 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:09:47,110 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588000 2023-11-29 10:09:48,878 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2023-11-29 10:10:00,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3919986.6666666665, ans=0.2 2023-11-29 10:10:01,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3920053.3333333335, ans=0.0 2023-11-29 10:10:29,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3920186.6666666665, ans=0.125 2023-11-29 10:10:31,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3920186.6666666665, ans=0.125 2023-11-29 10:10:35,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3920186.6666666665, ans=0.125 2023-11-29 10:10:43,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3920253.3333333335, ans=0.04949747468305833 2023-11-29 10:10:46,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3920253.3333333335, ans=0.125 2023-11-29 10:10:48,987 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:10:51,868 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10900, loss[loss=0.0657, simple_loss=0.09267, pruned_loss=0.009787, audio_tagging_loss=0.009578, over 15295.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08913, pruned_loss=0.01189, audio_tagging_loss=0.008405, over 3030859.75 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:10:51,960 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588050 2023-11-29 10:10:52,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3920320.0, ans=0.125 2023-11-29 10:10:52,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=22.5 2023-11-29 10:11:10,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.47 vs. limit=15.0 2023-11-29 10:11:29,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3920520.0, ans=0.125 2023-11-29 10:11:33,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3920520.0, ans=0.1 2023-11-29 10:11:33,921 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.65 vs. limit=22.5 2023-11-29 10:11:37,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3920520.0, ans=0.0 2023-11-29 10:11:41,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3920586.6666666665, ans=0.0 2023-11-29 10:11:44,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3920586.6666666665, ans=0.1 2023-11-29 10:11:44,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3920586.6666666665, ans=0.125 2023-11-29 10:11:47,358 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.916e+01 9.221e+01 9.905e+01 1.064e+02 1.744e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 10:11:53,344 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 10950, loss[loss=0.06486, simple_loss=0.09461, pruned_loss=0.00918, audio_tagging_loss=0.008371, over 15021.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08965, pruned_loss=0.01192, audio_tagging_loss=0.0085, over 3036252.23 frames. ], batch size: 54, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:11:53,430 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588100 2023-11-29 10:11:53,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3920653.3333333335, ans=0.125 2023-11-29 10:12:09,668 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.31 vs. limit=15.0 2023-11-29 10:12:15,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3920720.0, ans=0.0 2023-11-29 10:12:34,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3920853.3333333335, ans=0.125 2023-11-29 10:12:35,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3920853.3333333335, ans=0.2 2023-11-29 10:12:38,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3920853.3333333335, ans=0.125 2023-11-29 10:12:50,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3920920.0, ans=0.1 2023-11-29 10:12:54,881 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11000, loss[loss=0.04098, simple_loss=0.05581, pruned_loss=0.003413, audio_tagging_loss=0.009666, over 14601.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08857, pruned_loss=0.01172, audio_tagging_loss=0.008559, over 3043521.70 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:12:54,976 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588150 2023-11-29 10:12:59,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3920986.6666666665, ans=0.5 2023-11-29 10:13:06,259 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:13:17,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3921053.3333333335, ans=0.125 2023-11-29 10:13:50,730 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.176e+01 9.054e+01 9.771e+01 1.053e+02 1.277e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 10:13:55,924 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.94 vs. limit=10.0 2023-11-29 10:13:56,544 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11050, loss[loss=0.07525, simple_loss=0.1045, pruned_loss=0.0148, audio_tagging_loss=0.008218, over 15160.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08881, pruned_loss=0.01192, audio_tagging_loss=0.008627, over 3043661.36 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:13:56,621 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588200 2023-11-29 10:14:33,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3921520.0, ans=0.025 2023-11-29 10:14:56,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3921586.6666666665, ans=0.0 2023-11-29 10:14:58,982 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11100, loss[loss=0.06063, simple_loss=0.08552, pruned_loss=0.009449, audio_tagging_loss=0.008424, over 15063.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09024, pruned_loss=0.01219, audio_tagging_loss=0.008618, over 3044457.58 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:14:59,075 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588250 2023-11-29 10:15:00,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3921653.3333333335, ans=0.2 2023-11-29 10:15:21,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3921786.6666666665, ans=0.2 2023-11-29 10:15:45,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3921853.3333333335, ans=10.0 2023-11-29 10:15:53,233 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.915e+01 9.126e+01 9.732e+01 1.035e+02 1.401e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 10:15:59,062 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11150, loss[loss=0.06233, simple_loss=0.09036, pruned_loss=0.009159, audio_tagging_loss=0.00799, over 15423.00 frames. ], tot_loss[loss=0.06535, simple_loss=0.08927, pruned_loss=0.01188, audio_tagging_loss=0.008831, over 3049793.95 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:15:59,144 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588300 2023-11-29 10:16:07,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3921986.6666666665, ans=0.125 2023-11-29 10:16:26,700 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.92 vs. limit=15.0 2023-11-29 10:16:33,831 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.06 vs. limit=15.0 2023-11-29 10:17:00,560 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11200, loss[loss=0.06924, simple_loss=0.09413, pruned_loss=0.01443, audio_tagging_loss=0.007746, over 14442.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08917, pruned_loss=0.0118, audio_tagging_loss=0.008917, over 3055456.86 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:17:00,661 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588350 2023-11-29 10:17:00,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3922320.0, ans=0.0 2023-11-29 10:17:56,309 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.891e+01 9.269e+01 9.742e+01 1.059e+02 1.357e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 10:18:02,841 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11250, loss[loss=0.06763, simple_loss=0.08798, pruned_loss=0.01319, audio_tagging_loss=0.01045, over 15198.00 frames. ], tot_loss[loss=0.06534, simple_loss=0.08924, pruned_loss=0.01186, audio_tagging_loss=0.008864, over 3059782.77 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:18:02,963 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588400 2023-11-29 10:18:23,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3922720.0, ans=0.125 2023-11-29 10:18:54,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3922920.0, ans=0.2 2023-11-29 10:19:00,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3922920.0, ans=0.0 2023-11-29 10:19:03,737 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11300, loss[loss=0.08618, simple_loss=0.1247, pruned_loss=0.01675, audio_tagging_loss=0.007082, over 15491.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08871, pruned_loss=0.01177, audio_tagging_loss=0.008794, over 3061062.17 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:19:03,829 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588450 2023-11-29 10:19:04,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2023-11-29 10:19:31,183 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=22.5 2023-11-29 10:19:32,049 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.40 vs. limit=15.0 2023-11-29 10:19:37,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3923120.0, ans=0.0 2023-11-29 10:19:45,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3923186.6666666665, ans=0.125 2023-11-29 10:19:46,873 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.46 vs. limit=15.0 2023-11-29 10:19:55,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3923253.3333333335, ans=0.0 2023-11-29 10:19:58,513 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 9.209e+01 9.911e+01 1.088e+02 1.767e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-29 10:19:58,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3923253.3333333335, ans=0.2 2023-11-29 10:20:04,400 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11350, loss[loss=0.05572, simple_loss=0.07929, pruned_loss=0.008164, audio_tagging_loss=0.007908, over 14884.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08904, pruned_loss=0.01194, audio_tagging_loss=0.008605, over 3057096.88 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:20:04,483 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588500 2023-11-29 10:20:30,066 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:20:56,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3923586.6666666665, ans=0.125 2023-11-29 10:21:06,489 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11400, loss[loss=0.07073, simple_loss=0.1024, pruned_loss=0.01356, audio_tagging_loss=0.005943, over 16264.00 frames. ], tot_loss[loss=0.06525, simple_loss=0.08959, pruned_loss=0.01196, audio_tagging_loss=0.008492, over 3051245.10 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:21:06,574 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588550 2023-11-29 10:21:11,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3923653.3333333335, ans=0.0 2023-11-29 10:21:24,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3923720.0, ans=0.125 2023-11-29 10:21:25,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3923720.0, ans=0.125 2023-11-29 10:21:31,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3923786.6666666665, ans=0.0 2023-11-29 10:21:34,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3923786.6666666665, ans=0.125 2023-11-29 10:22:01,948 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 8.974e+01 9.585e+01 1.035e+02 2.029e+02, threshold=1.917e+02, percent-clipped=1.0 2023-11-29 10:22:07,787 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11450, loss[loss=0.07857, simple_loss=0.1151, pruned_loss=0.01566, audio_tagging_loss=0.00534, over 14486.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08863, pruned_loss=0.01182, audio_tagging_loss=0.008542, over 3048998.21 frames. ], batch size: 53, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:22:07,872 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588600 2023-11-29 10:22:20,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3924053.3333333335, ans=0.125 2023-11-29 10:22:55,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3924186.6666666665, ans=0.025 2023-11-29 10:23:02,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3924253.3333333335, ans=0.1 2023-11-29 10:23:09,783 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11500, loss[loss=0.06131, simple_loss=0.08993, pruned_loss=0.009056, audio_tagging_loss=0.007288, over 14952.00 frames. ], tot_loss[loss=0.06503, simple_loss=0.0894, pruned_loss=0.01182, audio_tagging_loss=0.008508, over 3053677.03 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:23:09,863 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588650 2023-11-29 10:23:39,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3924453.3333333335, ans=0.125 2023-11-29 10:23:42,508 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-29 10:24:01,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3924586.6666666665, ans=0.125 2023-11-29 10:24:06,305 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.997e+01 9.583e+01 1.037e+02 1.357e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 10:24:11,577 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11550, loss[loss=0.06512, simple_loss=0.08952, pruned_loss=0.01241, audio_tagging_loss=0.007951, over 14650.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.09016, pruned_loss=0.01195, audio_tagging_loss=0.008405, over 3054135.04 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:24:11,669 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588700 2023-11-29 10:24:25,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3924720.0, ans=0.125 2023-11-29 10:24:26,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3924720.0, ans=0.0 2023-11-29 10:24:27,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3924720.0, ans=0.1 2023-11-29 10:24:33,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3924720.0, ans=0.2 2023-11-29 10:24:42,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3924786.6666666665, ans=0.04949747468305833 2023-11-29 10:24:49,077 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:24:55,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3924853.3333333335, ans=0.1 2023-11-29 10:25:01,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3924920.0, ans=0.125 2023-11-29 10:25:11,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3924986.6666666665, ans=0.2 2023-11-29 10:25:12,268 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11600, loss[loss=0.06558, simple_loss=0.07471, pruned_loss=0.01379, audio_tagging_loss=0.01443, over 14955.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09022, pruned_loss=0.01213, audio_tagging_loss=0.008506, over 3054594.19 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:25:12,344 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588750 2023-11-29 10:26:09,225 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.110e+01 9.153e+01 9.919e+01 1.070e+02 2.477e+02, threshold=1.984e+02, percent-clipped=1.0 2023-11-29 10:26:12,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3925253.3333333335, ans=15.0 2023-11-29 10:26:14,594 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11650, loss[loss=0.07364, simple_loss=0.09693, pruned_loss=0.01487, audio_tagging_loss=0.0103, over 15278.00 frames. ], tot_loss[loss=0.06579, simple_loss=0.09024, pruned_loss=0.01217, audio_tagging_loss=0.0085, over 3055391.26 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:26:14,680 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588800 2023-11-29 10:26:21,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3925320.0, ans=0.125 2023-11-29 10:26:30,551 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.92 vs. limit=22.5 2023-11-29 10:26:38,993 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.92 vs. limit=15.0 2023-11-29 10:26:43,509 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-11-29 10:27:03,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3925586.6666666665, ans=0.125 2023-11-29 10:27:17,048 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11700, loss[loss=0.08236, simple_loss=0.1095, pruned_loss=0.02095, audio_tagging_loss=0.006644, over 15118.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08968, pruned_loss=0.0121, audio_tagging_loss=0.008556, over 3058117.17 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:27:17,135 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588850 2023-11-29 10:27:19,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3925653.3333333335, ans=0.1 2023-11-29 10:27:40,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3925786.6666666665, ans=0.5 2023-11-29 10:28:04,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3925853.3333333335, ans=0.2 2023-11-29 10:28:04,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3925853.3333333335, ans=0.0 2023-11-29 10:28:14,504 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 9.126e+01 9.606e+01 1.033e+02 1.375e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-29 10:28:17,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3925986.6666666665, ans=0.125 2023-11-29 10:28:17,891 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11750, loss[loss=0.04792, simple_loss=0.06398, pruned_loss=0.008986, audio_tagging_loss=0.006939, over 14965.00 frames. ], tot_loss[loss=0.0654, simple_loss=0.0897, pruned_loss=0.012, audio_tagging_loss=0.008553, over 3055462.02 frames. ], batch size: 56, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:28:17,958 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588900 2023-11-29 10:28:34,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3926053.3333333335, ans=0.0 2023-11-29 10:28:48,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3926120.0, ans=0.0 2023-11-29 10:29:03,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3926186.6666666665, ans=0.0 2023-11-29 10:29:20,679 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11800, loss[loss=0.05825, simple_loss=0.07835, pruned_loss=0.00974, audio_tagging_loss=0.009338, over 15734.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08969, pruned_loss=0.01193, audio_tagging_loss=0.008612, over 3050803.25 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:29:20,761 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 588950 2023-11-29 10:29:39,179 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.19 vs. limit=10.0 2023-11-29 10:29:49,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3926453.3333333335, ans=0.0 2023-11-29 10:29:51,961 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.29 vs. limit=6.0 2023-11-29 10:29:58,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3926520.0, ans=0.07 2023-11-29 10:30:11,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3926586.6666666665, ans=0.0 2023-11-29 10:30:12,948 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2023-11-29 10:30:18,160 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.681e+01 9.043e+01 9.603e+01 1.049e+02 1.292e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-29 10:30:21,754 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11850, loss[loss=0.06839, simple_loss=0.1007, pruned_loss=0.009245, audio_tagging_loss=0.008781, over 15134.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08889, pruned_loss=0.01175, audio_tagging_loss=0.008678, over 3042394.30 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:30:21,834 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589000 2023-11-29 10:30:26,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3926653.3333333335, ans=0.125 2023-11-29 10:30:37,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3926720.0, ans=0.1 2023-11-29 10:31:00,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3926853.3333333335, ans=0.125 2023-11-29 10:31:09,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3926853.3333333335, ans=0.0 2023-11-29 10:31:22,940 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11900, loss[loss=0.0705, simple_loss=0.08798, pruned_loss=0.01637, audio_tagging_loss=0.01014, over 15476.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08965, pruned_loss=0.01182, audio_tagging_loss=0.008628, over 3049656.03 frames. ], batch size: 60, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:31:23,023 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589050 2023-11-29 10:31:28,538 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.56 vs. limit=5.0 2023-11-29 10:31:47,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3927120.0, ans=0.025 2023-11-29 10:31:58,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3927186.6666666665, ans=0.125 2023-11-29 10:32:13,910 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.26 vs. limit=5.0 2023-11-29 10:32:15,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3927253.3333333335, ans=0.125 2023-11-29 10:32:18,718 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.646e+01 9.048e+01 9.528e+01 1.019e+02 1.404e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-29 10:32:22,296 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 11950, loss[loss=0.08047, simple_loss=0.1231, pruned_loss=0.01276, audio_tagging_loss=0.006171, over 14942.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08933, pruned_loss=0.01172, audio_tagging_loss=0.008713, over 3049028.68 frames. ], batch size: 57, lr: 1.37e-03, grad_scale: 16.0 2023-11-29 10:32:22,381 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589100 2023-11-29 10:32:26,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3927320.0, ans=0.125 2023-11-29 10:32:27,129 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.75 vs. limit=22.5 2023-11-29 10:33:00,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2023-11-29 10:33:10,683 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:33:11,690 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:33:11,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3927586.6666666665, ans=0.125 2023-11-29 10:33:15,326 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.21 vs. limit=6.0 2023-11-29 10:33:16,556 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=15.0 2023-11-29 10:33:20,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3927653.3333333335, ans=0.025 2023-11-29 10:33:21,416 INFO [train_asr.py:1235] (2/4) Epoch 49, batch 12000, loss[loss=0.0627, simple_loss=0.09278, pruned_loss=0.009212, audio_tagging_loss=0.0071, over 14518.00 frames. ], tot_loss[loss=0.06576, simple_loss=0.09021, pruned_loss=0.0119, audio_tagging_loss=0.00875, over 3054281.35 frames. ], batch size: 55, lr: 1.37e-03, grad_scale: 32.0 2023-11-29 10:33:21,417 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 10:33:39,657 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.2229, 5.0973, 4.4181, 4.9106], device='cuda:2') 2023-11-29 10:34:01,196 INFO [train_asr.py:1267] (2/4) Epoch 49, validation: loss=0.0581, simple_loss=0.05045, pruned_loss=0.005444, audio_tagging_loss=0.02743, over 4681554.00 frames. 2023-11-29 10:34:01,197 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 10:34:01,234 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589150 2023-11-29 10:34:01,943 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.07 vs. limit=15.0 2023-11-29 10:34:09,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3927653.3333333335, ans=0.125 2023-11-29 10:34:46,424 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 0, loss[loss=0.08693, simple_loss=0.102, pruned_loss=0.0128, audio_tagging_loss=0.02314, over 14771.00 frames. ], tot_loss[loss=0.08693, simple_loss=0.102, pruned_loss=0.0128, audio_tagging_loss=0.02314, over 14771.00 frames. ], batch size: 57, lr: 1.36e-03, grad_scale: 32.0 2023-11-29 10:34:46,426 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 10:35:02,786 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1750, 4.6266, 5.2337, 4.8416], device='cuda:2') 2023-11-29 10:35:22,084 INFO [train_asr.py:1267] (2/4) Epoch 50, validation: loss=0.05785, simple_loss=0.05049, pruned_loss=0.005519, audio_tagging_loss=0.02709, over 4681554.00 frames. 2023-11-29 10:35:22,085 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 10:35:54,862 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.763e+01 9.469e+01 1.029e+02 1.110e+02 1.447e+02, threshold=2.058e+02, percent-clipped=0.0 2023-11-29 10:35:57,199 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589200 2023-11-29 10:35:57,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3927940.0, ans=0.125 2023-11-29 10:36:15,193 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.04 vs. limit=22.5 2023-11-29 10:36:25,847 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 50, loss[loss=0.07341, simple_loss=0.08318, pruned_loss=0.01478, audio_tagging_loss=0.01705, over 15317.00 frames. ], tot_loss[loss=0.07348, simple_loss=0.08988, pruned_loss=0.01198, audio_tagging_loss=0.01657, over 690345.01 frames. ], batch size: 59, lr: 1.36e-03, grad_scale: 16.0 2023-11-29 10:36:33,041 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2023-11-29 10:36:40,690 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2023-11-29 10:36:42,786 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-11-29 10:36:50,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3928273.3333333335, ans=0.0 2023-11-29 10:36:59,993 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589250 2023-11-29 10:37:09,176 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.06 vs. limit=15.0 2023-11-29 10:37:17,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3928406.6666666665, ans=0.1 2023-11-29 10:37:18,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3928406.6666666665, ans=0.1 2023-11-29 10:37:29,730 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 100, loss[loss=0.08399, simple_loss=0.1097, pruned_loss=0.0155, audio_tagging_loss=0.01366, over 16054.00 frames. ], tot_loss[loss=0.07146, simple_loss=0.0879, pruned_loss=0.01163, audio_tagging_loss=0.01588, over 1204273.13 frames. ], batch size: 59, lr: 1.36e-03, grad_scale: 16.0 2023-11-29 10:37:41,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3928540.0, ans=0.125 2023-11-29 10:38:00,143 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 9.044e+01 1.010e+02 1.060e+02 1.133e+02 1.839e+02, threshold=2.120e+02, percent-clipped=0.0 2023-11-29 10:38:02,564 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589300 2023-11-29 10:38:02,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3928606.6666666665, ans=0.2 2023-11-29 10:38:03,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3928606.6666666665, ans=0.125 2023-11-29 10:38:10,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2023-11-29 10:38:13,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3928673.3333333335, ans=0.5 2023-11-29 10:38:24,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3928740.0, ans=0.125 2023-11-29 10:38:31,794 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 150, loss[loss=0.06979, simple_loss=0.1024, pruned_loss=0.01074, audio_tagging_loss=0.007865, over 15955.00 frames. ], tot_loss[loss=0.0701, simple_loss=0.08813, pruned_loss=0.01159, audio_tagging_loss=0.01444, over 1618256.97 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:38:39,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3928806.6666666665, ans=0.125 2023-11-29 10:39:05,946 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589350 2023-11-29 10:39:13,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3929006.6666666665, ans=0.2 2023-11-29 10:39:31,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3929073.3333333335, ans=0.2 2023-11-29 10:39:34,094 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 200, loss[loss=0.06576, simple_loss=0.08922, pruned_loss=0.01279, audio_tagging_loss=0.008357, over 14659.00 frames. ], tot_loss[loss=0.06847, simple_loss=0.08832, pruned_loss=0.01156, audio_tagging_loss=0.01275, over 1931019.46 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:39:38,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3929140.0, ans=0.0 2023-11-29 10:39:48,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3929206.6666666665, ans=0.0 2023-11-29 10:39:48,591 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2023-11-29 10:40:05,151 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 9.198e+01 9.931e+01 1.061e+02 1.225e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-29 10:40:07,729 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589400 2023-11-29 10:40:13,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3929340.0, ans=0.2 2023-11-29 10:40:15,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3929340.0, ans=0.125 2023-11-29 10:40:32,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3929406.6666666665, ans=0.125 2023-11-29 10:40:35,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3929473.3333333335, ans=0.1 2023-11-29 10:40:36,881 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 250, loss[loss=0.05149, simple_loss=0.07347, pruned_loss=0.007066, audio_tagging_loss=0.007689, over 16349.00 frames. ], tot_loss[loss=0.06742, simple_loss=0.08858, pruned_loss=0.01168, audio_tagging_loss=0.01145, over 2181482.55 frames. ], batch size: 63, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:40:45,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3929473.3333333335, ans=0.125 2023-11-29 10:40:52,996 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-29 10:40:58,134 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.37 vs. limit=12.0 2023-11-29 10:41:05,085 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2023-11-29 10:41:07,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3929606.6666666665, ans=0.125 2023-11-29 10:41:11,050 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589450 2023-11-29 10:41:24,300 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.83 vs. limit=15.0 2023-11-29 10:41:32,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3929740.0, ans=0.125 2023-11-29 10:41:39,909 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.84 vs. limit=6.0 2023-11-29 10:41:40,649 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 300, loss[loss=0.04821, simple_loss=0.06847, pruned_loss=0.006471, audio_tagging_loss=0.007506, over 14468.00 frames. ], tot_loss[loss=0.06703, simple_loss=0.08929, pruned_loss=0.01182, audio_tagging_loss=0.01056, over 2373491.17 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:41:43,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3929806.6666666665, ans=0.0 2023-11-29 10:41:43,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3929806.6666666665, ans=10.0 2023-11-29 10:42:08,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3929940.0, ans=0.125 2023-11-29 10:42:11,427 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.312e+01 9.170e+01 9.850e+01 1.054e+02 1.427e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 10:42:14,536 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589500 2023-11-29 10:42:18,520 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=15.0 2023-11-29 10:42:21,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3930006.6666666665, ans=0.07 2023-11-29 10:42:22,440 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:42:28,691 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-11-29 10:42:34,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3930073.3333333335, ans=0.125 2023-11-29 10:42:42,525 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 350, loss[loss=0.06839, simple_loss=0.08844, pruned_loss=0.01571, audio_tagging_loss=0.008458, over 14675.00 frames. ], tot_loss[loss=0.06663, simple_loss=0.08963, pruned_loss=0.01188, audio_tagging_loss=0.00994, over 2524130.31 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:42:57,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3930206.6666666665, ans=0.125 2023-11-29 10:43:16,876 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589550 2023-11-29 10:43:27,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3930340.0, ans=0.0 2023-11-29 10:43:34,779 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2023-11-29 10:43:41,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3930406.6666666665, ans=0.125 2023-11-29 10:43:44,369 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 400, loss[loss=0.04815, simple_loss=0.05884, pruned_loss=0.007168, audio_tagging_loss=0.01156, over 15538.00 frames. ], tot_loss[loss=0.06587, simple_loss=0.08888, pruned_loss=0.01179, audio_tagging_loss=0.009638, over 2639847.77 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:44:08,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3930606.6666666665, ans=0.0 2023-11-29 10:44:16,510 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.818e+01 9.147e+01 9.646e+01 1.038e+02 1.524e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-29 10:44:18,414 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589600 2023-11-29 10:44:23,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3930673.3333333335, ans=0.125 2023-11-29 10:44:47,884 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 450, loss[loss=0.05108, simple_loss=0.0735, pruned_loss=0.006133, audio_tagging_loss=0.008192, over 15145.00 frames. ], tot_loss[loss=0.06565, simple_loss=0.08922, pruned_loss=0.01169, audio_tagging_loss=0.009349, over 2735764.30 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:44:51,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3930806.6666666665, ans=0.0 2023-11-29 10:44:54,287 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-29 10:44:58,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3930873.3333333335, ans=0.2 2023-11-29 10:45:20,581 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589650 2023-11-29 10:45:41,237 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.54 vs. limit=10.0 2023-11-29 10:45:48,710 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 500, loss[loss=0.08949, simple_loss=0.1224, pruned_loss=0.02084, audio_tagging_loss=0.007435, over 14650.00 frames. ], tot_loss[loss=0.06596, simple_loss=0.08992, pruned_loss=0.01191, audio_tagging_loss=0.009088, over 2801059.87 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:46:03,807 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.53 vs. limit=22.5 2023-11-29 10:46:12,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3931273.3333333335, ans=0.0 2023-11-29 10:46:15,639 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2023-11-29 10:46:21,775 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 9.167e+01 9.711e+01 1.057e+02 1.221e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-29 10:46:23,046 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589700 2023-11-29 10:46:38,473 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.49 vs. limit=10.0 2023-11-29 10:46:41,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3931406.6666666665, ans=0.0 2023-11-29 10:46:49,356 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:46:50,324 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 550, loss[loss=0.05408, simple_loss=0.0719, pruned_loss=0.009309, audio_tagging_loss=0.008825, over 15250.00 frames. ], tot_loss[loss=0.06509, simple_loss=0.08886, pruned_loss=0.01165, audio_tagging_loss=0.009002, over 2857612.40 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:46:52,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3931473.3333333335, ans=0.125 2023-11-29 10:47:13,931 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.54 vs. limit=10.0 2023-11-29 10:47:15,200 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.18 vs. limit=15.0 2023-11-29 10:47:23,867 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589750 2023-11-29 10:47:52,411 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 600, loss[loss=0.04358, simple_loss=0.0659, pruned_loss=0.00396, audio_tagging_loss=0.006671, over 15400.00 frames. ], tot_loss[loss=0.06433, simple_loss=0.08765, pruned_loss=0.01147, audio_tagging_loss=0.009029, over 2895443.04 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:47:52,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3931806.6666666665, ans=0.125 2023-11-29 10:48:05,216 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 10:48:19,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3931940.0, ans=0.0 2023-11-29 10:48:24,196 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.204e+01 8.984e+01 9.771e+01 1.059e+02 2.081e+02, threshold=1.954e+02, percent-clipped=1.0 2023-11-29 10:48:25,491 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589800 2023-11-29 10:48:48,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3932073.3333333335, ans=0.125 2023-11-29 10:48:49,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3932073.3333333335, ans=0.125 2023-11-29 10:48:50,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3932073.3333333335, ans=0.125 2023-11-29 10:48:54,146 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 650, loss[loss=0.05536, simple_loss=0.07749, pruned_loss=0.008513, audio_tagging_loss=0.008096, over 15959.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.0883, pruned_loss=0.01165, audio_tagging_loss=0.008988, over 2931411.80 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:48:54,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3932140.0, ans=0.1 2023-11-29 10:49:20,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3932273.3333333335, ans=0.125 2023-11-29 10:49:27,306 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589850 2023-11-29 10:49:27,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3932273.3333333335, ans=0.125 2023-11-29 10:49:47,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3932406.6666666665, ans=0.125 2023-11-29 10:49:55,247 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 700, loss[loss=0.0825, simple_loss=0.1202, pruned_loss=0.01416, audio_tagging_loss=0.008235, over 15405.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08873, pruned_loss=0.01167, audio_tagging_loss=0.008903, over 2955133.15 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:50:00,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3932473.3333333335, ans=0.125 2023-11-29 10:50:25,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3932606.6666666665, ans=0.0 2023-11-29 10:50:27,458 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.376e+01 9.061e+01 9.779e+01 1.049e+02 1.414e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-29 10:50:28,816 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589900 2023-11-29 10:50:32,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3932673.3333333335, ans=0.125 2023-11-29 10:50:41,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3932673.3333333335, ans=0.125 2023-11-29 10:50:50,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3932740.0, ans=0.2 2023-11-29 10:50:55,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3932740.0, ans=0.125 2023-11-29 10:50:57,733 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 750, loss[loss=0.05504, simple_loss=0.08484, pruned_loss=0.004267, audio_tagging_loss=0.008351, over 15991.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08904, pruned_loss=0.01188, audio_tagging_loss=0.008792, over 2980453.12 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:51:12,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3932873.3333333335, ans=0.0 2023-11-29 10:51:28,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3932940.0, ans=0.125 2023-11-29 10:51:31,095 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 589950 2023-11-29 10:51:46,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3933073.3333333335, ans=0.125 2023-11-29 10:51:48,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3933073.3333333335, ans=0.1 2023-11-29 10:51:59,210 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 800, loss[loss=0.06446, simple_loss=0.08707, pruned_loss=0.01165, audio_tagging_loss=0.009273, over 14288.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08945, pruned_loss=0.01193, audio_tagging_loss=0.008891, over 2992925.65 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:51:59,800 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.65 vs. limit=22.5 2023-11-29 10:52:04,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3933140.0, ans=0.0 2023-11-29 10:52:07,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3933140.0, ans=0.125 2023-11-29 10:52:17,602 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.59 vs. limit=15.0 2023-11-29 10:52:32,228 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.757e+01 9.309e+01 9.917e+01 1.087e+02 1.372e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-29 10:52:33,554 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590000 2023-11-29 10:52:38,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3933340.0, ans=0.125 2023-11-29 10:52:49,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3933406.6666666665, ans=0.07 2023-11-29 10:53:01,569 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 850, loss[loss=0.04908, simple_loss=0.06542, pruned_loss=0.005371, audio_tagging_loss=0.011, over 14935.00 frames. ], tot_loss[loss=0.06476, simple_loss=0.0882, pruned_loss=0.01168, audio_tagging_loss=0.008983, over 3000602.64 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:53:18,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3933540.0, ans=0.0 2023-11-29 10:53:33,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3933606.6666666665, ans=0.1 2023-11-29 10:53:33,867 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.97 vs. limit=15.0 2023-11-29 10:53:35,604 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590050 2023-11-29 10:53:59,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3933740.0, ans=0.05 2023-11-29 10:54:03,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3933740.0, ans=0.0 2023-11-29 10:54:05,628 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 900, loss[loss=0.08774, simple_loss=0.1229, pruned_loss=0.01733, audio_tagging_loss=0.008948, over 14852.00 frames. ], tot_loss[loss=0.0655, simple_loss=0.08932, pruned_loss=0.01187, audio_tagging_loss=0.008978, over 3012704.20 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:54:19,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3933873.3333333335, ans=0.125 2023-11-29 10:54:23,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3933873.3333333335, ans=0.125 2023-11-29 10:54:30,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3933940.0, ans=0.0 2023-11-29 10:54:31,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3933940.0, ans=0.0 2023-11-29 10:54:37,699 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.767e+01 9.157e+01 9.744e+01 1.021e+02 1.316e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-29 10:54:39,038 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590100 2023-11-29 10:54:52,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3934006.6666666665, ans=0.125 2023-11-29 10:54:59,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3934073.3333333335, ans=0.125 2023-11-29 10:55:06,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3934140.0, ans=0.0 2023-11-29 10:55:07,084 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 950, loss[loss=0.0503, simple_loss=0.06, pruned_loss=0.01052, audio_tagging_loss=0.009787, over 13859.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08972, pruned_loss=0.01195, audio_tagging_loss=0.008828, over 3024003.30 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:55:30,381 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=12.0 2023-11-29 10:55:39,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3934273.3333333335, ans=0.125 2023-11-29 10:55:39,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3934273.3333333335, ans=0.125 2023-11-29 10:55:41,612 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590150 2023-11-29 10:55:43,319 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.50 vs. limit=15.0 2023-11-29 10:55:59,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3934406.6666666665, ans=0.5 2023-11-29 10:56:09,336 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1000, loss[loss=0.04739, simple_loss=0.06341, pruned_loss=0.006815, audio_tagging_loss=0.008875, over 14204.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08957, pruned_loss=0.01189, audio_tagging_loss=0.008643, over 3026353.31 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 10:56:14,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3934473.3333333335, ans=0.125 2023-11-29 10:56:33,577 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.84 vs. limit=10.0 2023-11-29 10:56:37,493 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:56:40,926 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.541e+01 9.201e+01 9.754e+01 1.071e+02 1.435e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 10:56:42,209 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590200 2023-11-29 10:56:47,998 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.15 vs. limit=10.0 2023-11-29 10:57:09,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3934740.0, ans=0.125 2023-11-29 10:57:12,238 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1050, loss[loss=0.08504, simple_loss=0.1164, pruned_loss=0.01982, audio_tagging_loss=0.007034, over 15025.00 frames. ], tot_loss[loss=0.06469, simple_loss=0.08891, pruned_loss=0.01174, audio_tagging_loss=0.008499, over 3026202.86 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:57:14,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3934806.6666666665, ans=0.125 2023-11-29 10:57:17,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3934806.6666666665, ans=0.125 2023-11-29 10:57:21,208 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.59 vs. limit=6.0 2023-11-29 10:57:45,615 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590250 2023-11-29 10:58:13,862 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1100, loss[loss=0.05772, simple_loss=0.08768, pruned_loss=0.007331, audio_tagging_loss=0.006549, over 14505.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08906, pruned_loss=0.01169, audio_tagging_loss=0.008341, over 3026535.59 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:58:14,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3935140.0, ans=0.0 2023-11-29 10:58:15,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3935140.0, ans=0.0 2023-11-29 10:58:19,944 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 10:58:36,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3935206.6666666665, ans=0.2 2023-11-29 10:58:36,954 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=12.0 2023-11-29 10:58:42,243 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.14 vs. limit=15.0 2023-11-29 10:58:48,062 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.914e+01 9.269e+01 9.621e+01 1.031e+02 1.312e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 10:58:48,197 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590300 2023-11-29 10:58:48,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3935273.3333333335, ans=0.125 2023-11-29 10:58:49,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3935273.3333333335, ans=0.125 2023-11-29 10:58:49,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3935273.3333333335, ans=0.09899494936611666 2023-11-29 10:59:06,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3935406.6666666665, ans=0.125 2023-11-29 10:59:16,212 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1150, loss[loss=0.05656, simple_loss=0.07842, pruned_loss=0.006519, audio_tagging_loss=0.01083, over 14705.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08934, pruned_loss=0.0117, audio_tagging_loss=0.008358, over 3026049.55 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 10:59:23,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3935473.3333333335, ans=15.0 2023-11-29 10:59:50,050 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590350 2023-11-29 10:59:51,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3935606.6666666665, ans=0.125 2023-11-29 11:00:18,798 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1200, loss[loss=0.0764, simple_loss=0.1117, pruned_loss=0.01397, audio_tagging_loss=0.006578, over 15287.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08919, pruned_loss=0.01167, audio_tagging_loss=0.008364, over 3027373.63 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:00:27,883 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:00:38,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3935873.3333333335, ans=0.1 2023-11-29 11:00:47,868 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:00:51,878 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590400 2023-11-29 11:00:52,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.086e+01 9.906e+01 1.090e+02 1.794e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 11:01:02,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3936006.6666666665, ans=0.2 2023-11-29 11:01:05,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3936006.6666666665, ans=0.0 2023-11-29 11:01:07,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3936006.6666666665, ans=0.07 2023-11-29 11:01:11,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3936073.3333333335, ans=0.2 2023-11-29 11:01:21,010 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1250, loss[loss=0.05724, simple_loss=0.07817, pruned_loss=0.009985, audio_tagging_loss=0.008175, over 15305.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08899, pruned_loss=0.01178, audio_tagging_loss=0.008379, over 3029562.78 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:01:41,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3936206.6666666665, ans=0.125 2023-11-29 11:01:52,407 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:01:55,015 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590450 2023-11-29 11:01:58,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3936340.0, ans=0.125 2023-11-29 11:02:15,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3936406.6666666665, ans=0.125 2023-11-29 11:02:16,797 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-29 11:02:17,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3936406.6666666665, ans=0.95 2023-11-29 11:02:22,020 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1300, loss[loss=0.06952, simple_loss=0.09292, pruned_loss=0.01604, audio_tagging_loss=0.007015, over 14856.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08869, pruned_loss=0.0118, audio_tagging_loss=0.008389, over 3027240.21 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:02:22,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3936473.3333333335, ans=0.0 2023-11-29 11:02:37,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3936540.0, ans=0.2 2023-11-29 11:02:54,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3936606.6666666665, ans=0.0 2023-11-29 11:02:55,319 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590500 2023-11-29 11:02:56,414 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.747e+01 8.938e+01 9.408e+01 1.020e+02 1.519e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-29 11:03:11,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3936740.0, ans=0.07 2023-11-29 11:03:14,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3936740.0, ans=0.125 2023-11-29 11:03:23,214 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1350, loss[loss=0.05276, simple_loss=0.06554, pruned_loss=0.008406, audio_tagging_loss=0.01158, over 15772.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08889, pruned_loss=0.01182, audio_tagging_loss=0.008441, over 3028558.74 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:03:37,018 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.92 vs. limit=10.0 2023-11-29 11:03:56,245 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590550 2023-11-29 11:04:06,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3937006.6666666665, ans=0.0 2023-11-29 11:04:11,546 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:04:25,684 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1400, loss[loss=0.07378, simple_loss=0.1024, pruned_loss=0.0137, audio_tagging_loss=0.008874, over 14690.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08848, pruned_loss=0.01175, audio_tagging_loss=0.008496, over 3024815.26 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:04:28,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3937140.0, ans=0.125 2023-11-29 11:04:37,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3937206.6666666665, ans=0.1 2023-11-29 11:04:38,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3937206.6666666665, ans=0.125 2023-11-29 11:04:43,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3937206.6666666665, ans=0.2 2023-11-29 11:04:58,840 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590600 2023-11-29 11:04:58,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3937273.3333333335, ans=0.07 2023-11-29 11:04:59,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3937273.3333333335, ans=0.2 2023-11-29 11:04:59,585 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-29 11:04:59,856 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.992e+01 9.669e+01 1.038e+02 1.341e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 11:05:14,480 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=15.0 2023-11-29 11:05:22,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3937406.6666666665, ans=0.125 2023-11-29 11:05:26,993 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1450, loss[loss=0.05201, simple_loss=0.07526, pruned_loss=0.007749, audio_tagging_loss=0.006631, over 14888.00 frames. ], tot_loss[loss=0.06489, simple_loss=0.0893, pruned_loss=0.01176, audio_tagging_loss=0.008482, over 3032698.47 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:05:27,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3937473.3333333335, ans=0.125 2023-11-29 11:05:28,873 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.01 vs. limit=15.0 2023-11-29 11:05:41,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3937540.0, ans=0.125 2023-11-29 11:05:50,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3937540.0, ans=0.125 2023-11-29 11:05:56,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3937606.6666666665, ans=0.1 2023-11-29 11:06:01,117 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590650 2023-11-29 11:06:02,684 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2023-11-29 11:06:28,791 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1500, loss[loss=0.06686, simple_loss=0.09421, pruned_loss=0.01186, audio_tagging_loss=0.007894, over 16164.00 frames. ], tot_loss[loss=0.06439, simple_loss=0.08862, pruned_loss=0.01157, audio_tagging_loss=0.008512, over 3038462.12 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:06:34,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3937806.6666666665, ans=0.1 2023-11-29 11:06:34,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3937806.6666666665, ans=0.125 2023-11-29 11:06:37,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3937806.6666666665, ans=0.0 2023-11-29 11:06:40,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3937806.6666666665, ans=0.125 2023-11-29 11:07:02,023 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590700 2023-11-29 11:07:03,054 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 9.150e+01 9.903e+01 1.059e+02 1.485e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 11:07:08,020 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.18 vs. limit=22.5 2023-11-29 11:07:21,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3938073.3333333335, ans=0.0 2023-11-29 11:07:31,375 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1550, loss[loss=0.06581, simple_loss=0.09109, pruned_loss=0.01277, audio_tagging_loss=0.007493, over 13946.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08908, pruned_loss=0.01175, audio_tagging_loss=0.008627, over 3040144.72 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:07:34,392 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2023-11-29 11:07:39,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3938140.0, ans=0.0 2023-11-29 11:07:45,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3938206.6666666665, ans=0.2 2023-11-29 11:07:49,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3938206.6666666665, ans=0.125 2023-11-29 11:08:03,577 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590750 2023-11-29 11:08:08,994 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.51 vs. limit=22.5 2023-11-29 11:08:09,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3938340.0, ans=10.0 2023-11-29 11:08:12,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3938340.0, ans=0.2 2023-11-29 11:08:18,955 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2023-11-29 11:08:28,370 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.10 vs. limit=12.0 2023-11-29 11:08:32,435 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1600, loss[loss=0.0791, simple_loss=0.1066, pruned_loss=0.01845, audio_tagging_loss=0.007379, over 15909.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08963, pruned_loss=0.01183, audio_tagging_loss=0.008633, over 3041044.10 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:08:37,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3938473.3333333335, ans=0.125 2023-11-29 11:08:41,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3938473.3333333335, ans=0.2 2023-11-29 11:08:54,318 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.24 vs. limit=15.0 2023-11-29 11:08:58,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3938606.6666666665, ans=0.125 2023-11-29 11:09:06,601 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590800 2023-11-29 11:09:07,672 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.890e+01 8.907e+01 9.577e+01 1.022e+02 1.784e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 11:09:24,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3938740.0, ans=0.125 2023-11-29 11:09:27,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3938740.0, ans=0.125 2023-11-29 11:09:34,415 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1650, loss[loss=0.08253, simple_loss=0.1074, pruned_loss=0.0212, audio_tagging_loss=0.007647, over 15586.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08895, pruned_loss=0.01167, audio_tagging_loss=0.008634, over 3038342.04 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:09:44,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3938806.6666666665, ans=0.1 2023-11-29 11:10:08,129 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590850 2023-11-29 11:10:19,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3939006.6666666665, ans=10.0 2023-11-29 11:10:21,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3939006.6666666665, ans=0.2 2023-11-29 11:10:22,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3939073.3333333335, ans=0.0 2023-11-29 11:10:23,180 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.87 vs. limit=10.0 2023-11-29 11:10:24,436 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.78 vs. limit=10.0 2023-11-29 11:10:29,945 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:10:36,635 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1700, loss[loss=0.07692, simple_loss=0.1015, pruned_loss=0.01966, audio_tagging_loss=0.006517, over 15233.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08831, pruned_loss=0.01166, audio_tagging_loss=0.008643, over 3046112.41 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:10:39,635 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.86 vs. limit=6.0 2023-11-29 11:11:05,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.12 vs. limit=15.0 2023-11-29 11:11:09,680 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590900 2023-11-29 11:11:10,082 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=12.0 2023-11-29 11:11:10,733 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.872e+01 9.146e+01 9.599e+01 1.028e+02 1.355e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 11:11:17,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3939340.0, ans=0.2 2023-11-29 11:11:19,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3939340.0, ans=0.07 2023-11-29 11:11:21,678 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:11:38,441 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1750, loss[loss=0.07923, simple_loss=0.1147, pruned_loss=0.01583, audio_tagging_loss=0.006045, over 15662.00 frames. ], tot_loss[loss=0.06411, simple_loss=0.08799, pruned_loss=0.01147, audio_tagging_loss=0.008637, over 3038893.84 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:11:43,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3939473.3333333335, ans=0.125 2023-11-29 11:11:52,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3939540.0, ans=0.0 2023-11-29 11:11:58,094 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:12:12,003 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 590950 2023-11-29 11:12:38,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3939740.0, ans=0.125 2023-11-29 11:12:40,145 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1800, loss[loss=0.0562, simple_loss=0.07664, pruned_loss=0.01026, audio_tagging_loss=0.007624, over 14405.00 frames. ], tot_loss[loss=0.06436, simple_loss=0.08815, pruned_loss=0.01169, audio_tagging_loss=0.008585, over 3040563.19 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:12:42,116 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2023-11-29 11:13:12,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3939940.0, ans=0.1 2023-11-29 11:13:13,663 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591000 2023-11-29 11:13:14,665 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.761e+01 9.256e+01 9.797e+01 1.069e+02 1.253e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-29 11:13:26,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3940006.6666666665, ans=0.0 2023-11-29 11:13:27,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3940006.6666666665, ans=0.09899494936611666 2023-11-29 11:13:42,525 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1850, loss[loss=0.05489, simple_loss=0.08477, pruned_loss=0.007386, audio_tagging_loss=0.005117, over 15162.00 frames. ], tot_loss[loss=0.06388, simple_loss=0.08765, pruned_loss=0.01159, audio_tagging_loss=0.008462, over 3040299.77 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:13:51,037 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.55 vs. limit=10.0 2023-11-29 11:13:55,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3940206.6666666665, ans=0.2 2023-11-29 11:13:57,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3940206.6666666665, ans=0.125 2023-11-29 11:13:57,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3940206.6666666665, ans=0.2 2023-11-29 11:14:03,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3940206.6666666665, ans=0.0 2023-11-29 11:14:05,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3940273.3333333335, ans=0.125 2023-11-29 11:14:08,371 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-11-29 11:14:15,327 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591050 2023-11-29 11:14:43,501 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1900, loss[loss=0.0401, simple_loss=0.05235, pruned_loss=0.00551, audio_tagging_loss=0.008416, over 13957.00 frames. ], tot_loss[loss=0.06396, simple_loss=0.08789, pruned_loss=0.01163, audio_tagging_loss=0.008389, over 3043685.58 frames. ], batch size: 52, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:14:45,149 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2023-11-29 11:14:45,286 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2023-11-29 11:14:48,444 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2023-11-29 11:15:17,957 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591100 2023-11-29 11:15:19,019 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.739e+01 9.784e+01 1.081e+02 1.359e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-29 11:15:26,683 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.81 vs. limit=15.0 2023-11-29 11:15:28,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3940673.3333333335, ans=0.1 2023-11-29 11:15:33,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3940740.0, ans=0.125 2023-11-29 11:15:37,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3940740.0, ans=0.1 2023-11-29 11:15:46,035 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 1950, loss[loss=0.0635, simple_loss=0.08637, pruned_loss=0.01197, audio_tagging_loss=0.008341, over 15349.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08897, pruned_loss=0.01181, audio_tagging_loss=0.008328, over 3047928.39 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:15:58,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3940873.3333333335, ans=0.125 2023-11-29 11:16:02,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3940873.3333333335, ans=0.0 2023-11-29 11:16:14,372 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2023-11-29 11:16:18,601 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591150 2023-11-29 11:16:41,486 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=22.5 2023-11-29 11:16:42,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3941073.3333333335, ans=0.1 2023-11-29 11:16:48,017 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2000, loss[loss=0.08112, simple_loss=0.116, pruned_loss=0.01579, audio_tagging_loss=0.007315, over 14994.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.0894, pruned_loss=0.01198, audio_tagging_loss=0.008391, over 3043847.95 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:16:50,921 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=12.0 2023-11-29 11:17:12,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3941273.3333333335, ans=0.125 2023-11-29 11:17:20,875 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591200 2023-11-29 11:17:21,862 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 9.211e+01 9.826e+01 1.048e+02 3.263e+02, threshold=1.965e+02, percent-clipped=1.0 2023-11-29 11:17:24,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3941340.0, ans=0.0 2023-11-29 11:17:27,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3941340.0, ans=0.125 2023-11-29 11:17:33,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3941340.0, ans=10.0 2023-11-29 11:17:35,377 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.09 vs. limit=10.0 2023-11-29 11:17:40,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=3941406.6666666665, ans=0.1 2023-11-29 11:17:46,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3941406.6666666665, ans=0.125 2023-11-29 11:17:49,468 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2050, loss[loss=0.06537, simple_loss=0.0915, pruned_loss=0.008466, audio_tagging_loss=0.01115, over 15427.00 frames. ], tot_loss[loss=0.0652, simple_loss=0.08978, pruned_loss=0.01195, audio_tagging_loss=0.008366, over 3047302.79 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:17:54,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3941473.3333333335, ans=0.1 2023-11-29 11:17:58,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3941473.3333333335, ans=0.125 2023-11-29 11:18:06,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3941540.0, ans=0.125 2023-11-29 11:18:24,884 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591250 2023-11-29 11:18:33,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3941673.3333333335, ans=0.0 2023-11-29 11:18:33,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3941673.3333333335, ans=0.0 2023-11-29 11:18:44,612 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.12 vs. limit=12.0 2023-11-29 11:18:53,357 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2100, loss[loss=0.07071, simple_loss=0.09418, pruned_loss=0.01441, audio_tagging_loss=0.009203, over 14456.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08982, pruned_loss=0.01193, audio_tagging_loss=0.008292, over 3056306.44 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:19:00,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3941806.6666666665, ans=0.1 2023-11-29 11:19:26,559 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591300 2023-11-29 11:19:27,577 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.479e+01 9.036e+01 9.652e+01 1.066e+02 1.265e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 11:19:29,370 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.74 vs. limit=15.0 2023-11-29 11:19:39,236 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.42 vs. limit=15.0 2023-11-29 11:19:46,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3942073.3333333335, ans=0.125 2023-11-29 11:19:51,055 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.14 vs. limit=12.0 2023-11-29 11:19:55,537 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2150, loss[loss=0.05745, simple_loss=0.07584, pruned_loss=0.009863, audio_tagging_loss=0.009664, over 16415.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08976, pruned_loss=0.01201, audio_tagging_loss=0.008328, over 3059330.78 frames. ], batch size: 64, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:20:06,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3942206.6666666665, ans=0.125 2023-11-29 11:20:08,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3942206.6666666665, ans=0.125 2023-11-29 11:20:20,318 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.98 vs. limit=22.5 2023-11-29 11:20:28,646 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591350 2023-11-29 11:20:34,426 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:20:47,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3942406.6666666665, ans=0.125 2023-11-29 11:20:56,548 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2200, loss[loss=0.0701, simple_loss=0.09616, pruned_loss=0.01431, audio_tagging_loss=0.007713, over 15723.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08995, pruned_loss=0.0121, audio_tagging_loss=0.008377, over 3055252.47 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:20:58,337 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.94 vs. limit=22.5 2023-11-29 11:21:02,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3942473.3333333335, ans=0.0 2023-11-29 11:21:09,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3942540.0, ans=0.05 2023-11-29 11:21:21,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3942606.6666666665, ans=0.125 2023-11-29 11:21:24,083 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=22.5 2023-11-29 11:21:30,750 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591400 2023-11-29 11:21:33,262 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.050e+01 9.112e+01 9.556e+01 1.057e+02 1.343e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-29 11:21:58,253 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2250, loss[loss=0.05294, simple_loss=0.06011, pruned_loss=0.008405, audio_tagging_loss=0.01448, over 14694.00 frames. ], tot_loss[loss=0.06568, simple_loss=0.0904, pruned_loss=0.0121, audio_tagging_loss=0.008384, over 3050332.96 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:22:04,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.58 vs. limit=10.0 2023-11-29 11:22:32,837 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591450 2023-11-29 11:22:35,306 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:22:35,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3943006.6666666665, ans=0.09899494936611666 2023-11-29 11:22:46,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3943006.6666666665, ans=0.0 2023-11-29 11:23:01,187 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2300, loss[loss=0.05731, simple_loss=0.07441, pruned_loss=0.01186, audio_tagging_loss=0.008238, over 15084.00 frames. ], tot_loss[loss=0.06561, simple_loss=0.08998, pruned_loss=0.01211, audio_tagging_loss=0.008506, over 3052273.67 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:23:07,449 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2023-11-29 11:23:08,952 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.05 vs. limit=22.5 2023-11-29 11:23:13,516 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-29 11:23:23,556 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2023-11-29 11:23:30,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3943273.3333333335, ans=0.125 2023-11-29 11:23:33,649 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591500 2023-11-29 11:23:36,386 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.869e+01 9.045e+01 9.649e+01 1.036e+02 1.193e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 11:23:52,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3943406.6666666665, ans=0.1 2023-11-29 11:23:58,487 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:24:03,225 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2350, loss[loss=0.06012, simple_loss=0.08175, pruned_loss=0.01085, audio_tagging_loss=0.008397, over 14242.00 frames. ], tot_loss[loss=0.06577, simple_loss=0.09046, pruned_loss=0.01208, audio_tagging_loss=0.008464, over 3054848.11 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:24:06,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3943473.3333333335, ans=0.2 2023-11-29 11:24:13,362 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.87 vs. limit=10.0 2023-11-29 11:24:26,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3943606.6666666665, ans=0.5 2023-11-29 11:24:27,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3943606.6666666665, ans=0.125 2023-11-29 11:24:32,220 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2023-11-29 11:24:36,567 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591550 2023-11-29 11:24:48,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3943673.3333333335, ans=0.125 2023-11-29 11:24:58,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3943740.0, ans=0.1 2023-11-29 11:25:01,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3943740.0, ans=0.125 2023-11-29 11:25:04,304 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2400, loss[loss=0.08526, simple_loss=0.1205, pruned_loss=0.0152, audio_tagging_loss=0.0098, over 15163.00 frames. ], tot_loss[loss=0.06566, simple_loss=0.09037, pruned_loss=0.01195, audio_tagging_loss=0.008527, over 3054244.71 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:25:08,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3943806.6666666665, ans=0.1 2023-11-29 11:25:09,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3943806.6666666665, ans=0.1 2023-11-29 11:25:17,547 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.12 vs. limit=10.0 2023-11-29 11:25:38,194 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591600 2023-11-29 11:25:40,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.183e+01 9.372e+01 9.981e+01 1.068e+02 1.267e+02, threshold=1.996e+02, percent-clipped=0.0 2023-11-29 11:25:41,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3944006.6666666665, ans=0.2 2023-11-29 11:25:58,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3944073.3333333335, ans=0.04949747468305833 2023-11-29 11:26:04,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3944073.3333333335, ans=0.04949747468305833 2023-11-29 11:26:05,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3944140.0, ans=0.1 2023-11-29 11:26:06,103 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2450, loss[loss=0.08343, simple_loss=0.1139, pruned_loss=0.01917, audio_tagging_loss=0.007327, over 15791.00 frames. ], tot_loss[loss=0.06594, simple_loss=0.09072, pruned_loss=0.01202, audio_tagging_loss=0.008558, over 3056990.37 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:26:20,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3944206.6666666665, ans=0.1 2023-11-29 11:26:22,478 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.13 vs. limit=15.0 2023-11-29 11:26:33,845 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.11 vs. limit=15.0 2023-11-29 11:26:39,237 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591650 2023-11-29 11:26:42,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3944340.0, ans=0.125 2023-11-29 11:26:51,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3944340.0, ans=0.0 2023-11-29 11:26:54,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3944406.6666666665, ans=0.0 2023-11-29 11:27:08,534 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2500, loss[loss=0.06837, simple_loss=0.09228, pruned_loss=0.01148, audio_tagging_loss=0.01075, over 15326.00 frames. ], tot_loss[loss=0.06603, simple_loss=0.09068, pruned_loss=0.01203, audio_tagging_loss=0.008664, over 3053626.86 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:27:19,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=15.0 2023-11-29 11:27:27,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3944540.0, ans=0.0 2023-11-29 11:27:40,761 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591700 2023-11-29 11:27:42,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3944606.6666666665, ans=0.125 2023-11-29 11:27:44,875 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 9.155e+01 9.688e+01 1.051e+02 1.449e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 11:27:48,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3944673.3333333335, ans=0.2 2023-11-29 11:28:01,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3944740.0, ans=0.125 2023-11-29 11:28:08,701 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2550, loss[loss=0.06008, simple_loss=0.08545, pruned_loss=0.01055, audio_tagging_loss=0.006802, over 16300.00 frames. ], tot_loss[loss=0.06545, simple_loss=0.08985, pruned_loss=0.01187, audio_tagging_loss=0.008664, over 3049706.62 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:28:11,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3944806.6666666665, ans=0.0 2023-11-29 11:28:12,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3944806.6666666665, ans=0.125 2023-11-29 11:28:17,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3944806.6666666665, ans=0.125 2023-11-29 11:28:25,855 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:28:42,510 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591750 2023-11-29 11:28:42,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3944940.0, ans=0.125 2023-11-29 11:29:08,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3945073.3333333335, ans=0.07 2023-11-29 11:29:10,114 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2600, loss[loss=0.04819, simple_loss=0.0674, pruned_loss=0.006979, audio_tagging_loss=0.007516, over 15528.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.08927, pruned_loss=0.01183, audio_tagging_loss=0.008581, over 3046103.95 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:29:21,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3945140.0, ans=0.025 2023-11-29 11:29:22,501 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:29:31,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3945206.6666666665, ans=10.0 2023-11-29 11:29:43,988 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591800 2023-11-29 11:29:44,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3945273.3333333335, ans=0.125 2023-11-29 11:29:46,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3945340.0, ans=0.125 2023-11-29 11:29:47,796 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 8.895e+01 9.478e+01 1.021e+02 1.360e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-29 11:30:13,343 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2650, loss[loss=0.05924, simple_loss=0.07377, pruned_loss=0.01323, audio_tagging_loss=0.009121, over 15752.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08865, pruned_loss=0.01174, audio_tagging_loss=0.008555, over 3047342.93 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:30:13,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3945473.3333333335, ans=0.09899494936611666 2023-11-29 11:30:32,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3945540.0, ans=0.0 2023-11-29 11:30:35,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3945606.6666666665, ans=0.125 2023-11-29 11:30:45,884 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591850 2023-11-29 11:31:01,232 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.52 vs. limit=15.0 2023-11-29 11:31:14,818 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2700, loss[loss=0.06041, simple_loss=0.09707, pruned_loss=0.007206, audio_tagging_loss=0.004675, over 15495.00 frames. ], tot_loss[loss=0.06402, simple_loss=0.08776, pruned_loss=0.0116, audio_tagging_loss=0.008541, over 3046019.19 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:31:16,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3945806.6666666665, ans=0.0 2023-11-29 11:31:27,929 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2023-11-29 11:31:31,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3945873.3333333335, ans=0.2 2023-11-29 11:31:31,368 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2023-11-29 11:31:49,041 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591900 2023-11-29 11:31:50,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3945940.0, ans=0.2 2023-11-29 11:31:53,618 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 9.168e+01 9.953e+01 1.095e+02 1.462e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-29 11:32:16,460 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2750, loss[loss=0.0737, simple_loss=0.1017, pruned_loss=0.01544, audio_tagging_loss=0.00743, over 15068.00 frames. ], tot_loss[loss=0.06353, simple_loss=0.08723, pruned_loss=0.01139, audio_tagging_loss=0.00853, over 3041090.23 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 8.0 2023-11-29 11:32:37,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3946206.6666666665, ans=0.0 2023-11-29 11:32:40,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3946273.3333333335, ans=0.2 2023-11-29 11:32:48,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3946273.3333333335, ans=0.0 2023-11-29 11:32:49,785 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 591950 2023-11-29 11:32:57,310 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-11-29 11:33:09,995 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:33:13,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3946406.6666666665, ans=0.1 2023-11-29 11:33:18,152 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2800, loss[loss=0.0573, simple_loss=0.08176, pruned_loss=0.0101, audio_tagging_loss=0.00632, over 14856.00 frames. ], tot_loss[loss=0.06345, simple_loss=0.08706, pruned_loss=0.01141, audio_tagging_loss=0.00851, over 3049436.79 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:33:37,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3946540.0, ans=0.5 2023-11-29 11:33:40,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3946540.0, ans=0.07 2023-11-29 11:33:41,776 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.71 vs. limit=15.0 2023-11-29 11:33:51,405 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592000 2023-11-29 11:33:58,929 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.825e+01 9.130e+01 9.870e+01 1.066e+02 1.963e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-29 11:33:59,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.whiten.whitening_limit, batch_count=3946673.3333333335, ans=12.0 2023-11-29 11:34:23,002 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2850, loss[loss=0.06761, simple_loss=0.09216, pruned_loss=0.01309, audio_tagging_loss=0.00845, over 15266.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08847, pruned_loss=0.01166, audio_tagging_loss=0.008399, over 3048232.32 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:34:29,505 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=12.0 2023-11-29 11:34:33,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3946873.3333333335, ans=0.2 2023-11-29 11:34:36,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3946873.3333333335, ans=0.05 2023-11-29 11:34:40,827 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2023-11-29 11:34:47,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3946940.0, ans=0.0 2023-11-29 11:34:56,190 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592050 2023-11-29 11:35:11,653 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.82 vs. limit=22.5 2023-11-29 11:35:24,261 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2900, loss[loss=0.08622, simple_loss=0.1173, pruned_loss=0.02129, audio_tagging_loss=0.006269, over 14947.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08912, pruned_loss=0.01182, audio_tagging_loss=0.008384, over 3050826.27 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:35:30,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3947140.0, ans=0.0 2023-11-29 11:35:30,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3947140.0, ans=0.1 2023-11-29 11:35:45,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3947206.6666666665, ans=0.0 2023-11-29 11:35:55,441 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.25 vs. limit=8.0 2023-11-29 11:35:58,241 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592100 2023-11-29 11:36:02,775 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 9.122e+01 9.763e+01 1.061e+02 1.440e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-29 11:36:26,624 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 2950, loss[loss=0.06891, simple_loss=0.08796, pruned_loss=0.0154, audio_tagging_loss=0.009536, over 15147.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08953, pruned_loss=0.01187, audio_tagging_loss=0.00838, over 3057439.52 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:36:53,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3947606.6666666665, ans=0.2 2023-11-29 11:36:54,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3947606.6666666665, ans=0.0 2023-11-29 11:36:59,530 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592150 2023-11-29 11:37:14,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3947740.0, ans=0.2 2023-11-29 11:37:14,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3947740.0, ans=0.125 2023-11-29 11:37:20,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3947740.0, ans=0.0 2023-11-29 11:37:27,895 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3000, loss[loss=0.06811, simple_loss=0.09863, pruned_loss=0.01106, audio_tagging_loss=0.007728, over 16101.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08931, pruned_loss=0.01175, audio_tagging_loss=0.008416, over 3057374.98 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:37:27,896 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 11:37:57,589 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8049, 5.8301, 5.9022, 5.8413], device='cuda:2') 2023-11-29 11:38:07,432 INFO [train_asr.py:1267] (2/4) Epoch 50, validation: loss=0.05782, simple_loss=0.05046, pruned_loss=0.005473, audio_tagging_loss=0.02712, over 4681554.00 frames. 2023-11-29 11:38:07,433 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 11:38:12,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3947806.6666666665, ans=0.125 2023-11-29 11:38:20,712 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=12.0 2023-11-29 11:38:40,741 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592200 2023-11-29 11:38:45,629 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.881e+01 9.188e+01 9.766e+01 1.056e+02 1.297e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-29 11:38:52,304 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=12.0 2023-11-29 11:39:00,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3948073.3333333335, ans=0.2 2023-11-29 11:39:09,582 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3050, loss[loss=0.0627, simple_loss=0.08284, pruned_loss=0.01187, audio_tagging_loss=0.009412, over 14060.00 frames. ], tot_loss[loss=0.06582, simple_loss=0.09093, pruned_loss=0.01207, audio_tagging_loss=0.008282, over 3053938.35 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:39:09,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3948140.0, ans=0.125 2023-11-29 11:39:14,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3948140.0, ans=0.125 2023-11-29 11:39:19,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3948140.0, ans=0.1 2023-11-29 11:39:31,089 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:39:32,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3948273.3333333335, ans=0.125 2023-11-29 11:39:42,119 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592250 2023-11-29 11:39:45,502 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:39:45,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3948340.0, ans=0.125 2023-11-29 11:39:56,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3948340.0, ans=0.125 2023-11-29 11:39:57,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3948406.6666666665, ans=0.1 2023-11-29 11:40:11,007 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3100, loss[loss=0.06902, simple_loss=0.09225, pruned_loss=0.01026, audio_tagging_loss=0.01264, over 14790.00 frames. ], tot_loss[loss=0.06574, simple_loss=0.09081, pruned_loss=0.01195, audio_tagging_loss=0.008388, over 3051378.45 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:40:21,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3948540.0, ans=0.0 2023-11-29 11:40:27,593 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2023-11-29 11:40:35,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3948606.6666666665, ans=0.125 2023-11-29 11:40:43,938 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592300 2023-11-29 11:40:48,581 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.753e+01 9.181e+01 9.927e+01 1.074e+02 1.864e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-29 11:41:05,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3948740.0, ans=0.125 2023-11-29 11:41:12,049 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3150, loss[loss=0.07692, simple_loss=0.1088, pruned_loss=0.01445, audio_tagging_loss=0.008078, over 15869.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.09057, pruned_loss=0.01193, audio_tagging_loss=0.008422, over 3049533.59 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:41:15,970 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=22.5 2023-11-29 11:41:18,573 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=15.0 2023-11-29 11:41:20,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3948806.6666666665, ans=0.0 2023-11-29 11:41:30,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3948873.3333333335, ans=0.0 2023-11-29 11:41:45,103 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592350 2023-11-29 11:41:45,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3948940.0, ans=0.2 2023-11-29 11:41:54,792 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=12.0 2023-11-29 11:41:56,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3949006.6666666665, ans=0.1 2023-11-29 11:42:12,835 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3200, loss[loss=0.06833, simple_loss=0.09083, pruned_loss=0.01386, audio_tagging_loss=0.009059, over 15171.00 frames. ], tot_loss[loss=0.06592, simple_loss=0.09068, pruned_loss=0.01203, audio_tagging_loss=0.008552, over 3052178.10 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:42:21,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3949140.0, ans=0.125 2023-11-29 11:42:24,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3949206.6666666665, ans=0.0 2023-11-29 11:42:43,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3949273.3333333335, ans=0.0 2023-11-29 11:42:46,838 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592400 2023-11-29 11:42:48,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3949273.3333333335, ans=0.0 2023-11-29 11:42:50,170 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=22.5 2023-11-29 11:42:52,064 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 8.966e+01 9.651e+01 1.039e+02 1.549e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 11:43:12,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3949406.6666666665, ans=0.0 2023-11-29 11:43:14,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3949473.3333333335, ans=0.1 2023-11-29 11:43:15,833 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3250, loss[loss=0.0725, simple_loss=0.1055, pruned_loss=0.01292, audio_tagging_loss=0.006809, over 16738.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.08963, pruned_loss=0.01172, audio_tagging_loss=0.00865, over 3052845.94 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:43:34,184 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=15.0 2023-11-29 11:43:40,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3949606.6666666665, ans=0.125 2023-11-29 11:43:49,395 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592450 2023-11-29 11:44:08,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3949740.0, ans=0.2 2023-11-29 11:44:15,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=3949740.0, ans=0.2 2023-11-29 11:44:17,906 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3300, loss[loss=0.06777, simple_loss=0.09557, pruned_loss=0.01277, audio_tagging_loss=0.007213, over 14790.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08979, pruned_loss=0.01172, audio_tagging_loss=0.008651, over 3053366.60 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:44:27,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3949806.6666666665, ans=0.125 2023-11-29 11:44:45,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3949940.0, ans=0.1 2023-11-29 11:44:51,419 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592500 2023-11-29 11:44:56,112 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.509e+01 9.045e+01 9.733e+01 1.044e+02 1.292e+02, threshold=1.947e+02, percent-clipped=0.0 2023-11-29 11:44:56,364 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:44:56,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3950006.6666666665, ans=0.1 2023-11-29 11:44:56,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3950006.6666666665, ans=0.1 2023-11-29 11:44:57,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3950006.6666666665, ans=0.5 2023-11-29 11:45:03,349 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.04 vs. limit=22.5 2023-11-29 11:45:14,833 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.49 vs. limit=22.5 2023-11-29 11:45:17,188 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.79 vs. limit=22.5 2023-11-29 11:45:20,722 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3350, loss[loss=0.0532, simple_loss=0.06915, pruned_loss=0.008223, audio_tagging_loss=0.0104, over 16299.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08916, pruned_loss=0.01157, audio_tagging_loss=0.008666, over 3050893.55 frames. ], batch size: 62, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:45:53,651 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592550 2023-11-29 11:46:05,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3950340.0, ans=0.1 2023-11-29 11:46:18,847 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.03 vs. limit=15.0 2023-11-29 11:46:22,686 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3400, loss[loss=0.05694, simple_loss=0.07231, pruned_loss=0.01166, audio_tagging_loss=0.009132, over 15557.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08964, pruned_loss=0.01172, audio_tagging_loss=0.008523, over 3052395.04 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:46:49,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3950606.6666666665, ans=0.2 2023-11-29 11:46:56,805 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592600 2023-11-29 11:46:57,285 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=15.0 2023-11-29 11:47:00,293 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.23 vs. limit=12.0 2023-11-29 11:47:01,268 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.09 vs. limit=22.5 2023-11-29 11:47:01,890 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.883e+01 9.007e+01 9.772e+01 1.033e+02 1.333e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 11:47:10,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3950673.3333333335, ans=0.0 2023-11-29 11:47:10,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3950673.3333333335, ans=0.09899494936611666 2023-11-29 11:47:24,647 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3450, loss[loss=0.07561, simple_loss=0.1066, pruned_loss=0.01516, audio_tagging_loss=0.007132, over 14691.00 frames. ], tot_loss[loss=0.06477, simple_loss=0.08939, pruned_loss=0.01163, audio_tagging_loss=0.008449, over 3051240.06 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:47:24,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3950806.6666666665, ans=0.0 2023-11-29 11:47:36,782 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-11-29 11:47:42,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3950873.3333333335, ans=15.0 2023-11-29 11:47:43,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3950873.3333333335, ans=0.2 2023-11-29 11:47:47,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3950873.3333333335, ans=0.125 2023-11-29 11:47:52,107 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2023-11-29 11:47:58,763 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592650 2023-11-29 11:48:05,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3951006.6666666665, ans=0.1 2023-11-29 11:48:16,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3951073.3333333335, ans=0.2 2023-11-29 11:48:27,045 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3500, loss[loss=0.07142, simple_loss=0.09383, pruned_loss=0.01766, audio_tagging_loss=0.00685, over 15197.00 frames. ], tot_loss[loss=0.06472, simple_loss=0.08926, pruned_loss=0.01165, audio_tagging_loss=0.008445, over 3048939.47 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:48:32,455 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=12.0 2023-11-29 11:48:34,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3951140.0, ans=0.125 2023-11-29 11:48:47,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3951206.6666666665, ans=0.0 2023-11-29 11:48:58,843 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:49:00,095 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592700 2023-11-29 11:49:05,880 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 9.064e+01 9.893e+01 1.052e+02 1.385e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 11:49:22,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3951406.6666666665, ans=0.0 2023-11-29 11:49:29,221 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3550, loss[loss=0.05178, simple_loss=0.06913, pruned_loss=0.008004, audio_tagging_loss=0.00921, over 14908.00 frames. ], tot_loss[loss=0.06375, simple_loss=0.08763, pruned_loss=0.01143, audio_tagging_loss=0.008503, over 3046563.10 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:49:37,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3951473.3333333335, ans=0.125 2023-11-29 11:50:01,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3951606.6666666665, ans=0.0 2023-11-29 11:50:02,796 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592750 2023-11-29 11:50:30,379 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3600, loss[loss=0.06404, simple_loss=0.08034, pruned_loss=0.01353, audio_tagging_loss=0.01034, over 15900.00 frames. ], tot_loss[loss=0.06396, simple_loss=0.08765, pruned_loss=0.01158, audio_tagging_loss=0.008551, over 3047208.63 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:51:04,612 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592800 2023-11-29 11:51:09,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.101e+01 9.681e+01 1.023e+02 1.277e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 11:51:09,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3952006.6666666665, ans=0.0 2023-11-29 11:51:10,186 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.59 vs. limit=10.0 2023-11-29 11:51:33,128 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3650, loss[loss=0.07765, simple_loss=0.1187, pruned_loss=0.01413, audio_tagging_loss=0.004194, over 16093.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08842, pruned_loss=0.01174, audio_tagging_loss=0.008436, over 3049778.13 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:51:44,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3952140.0, ans=0.2 2023-11-29 11:51:46,854 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.74 vs. limit=22.5 2023-11-29 11:52:06,067 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592850 2023-11-29 11:52:09,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3952340.0, ans=0.125 2023-11-29 11:52:10,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3952340.0, ans=0.0 2023-11-29 11:52:13,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3952340.0, ans=0.125 2023-11-29 11:52:14,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3952340.0, ans=0.0 2023-11-29 11:52:20,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3952340.0, ans=0.0 2023-11-29 11:52:34,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3952473.3333333335, ans=0.125 2023-11-29 11:52:34,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3952473.3333333335, ans=0.1 2023-11-29 11:52:35,270 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3700, loss[loss=0.04892, simple_loss=0.06315, pruned_loss=0.008635, audio_tagging_loss=0.008711, over 15369.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08926, pruned_loss=0.01194, audio_tagging_loss=0.008369, over 3054383.75 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:52:37,854 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:52:40,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3952473.3333333335, ans=0.0 2023-11-29 11:53:00,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3952606.6666666665, ans=0.125 2023-11-29 11:53:08,676 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592900 2023-11-29 11:53:13,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3952673.3333333335, ans=0.09899494936611666 2023-11-29 11:53:14,444 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.712e+01 9.192e+01 9.964e+01 1.058e+02 1.278e+02, threshold=1.993e+02, percent-clipped=0.0 2023-11-29 11:53:36,601 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3750, loss[loss=0.07885, simple_loss=0.1101, pruned_loss=0.01837, audio_tagging_loss=0.00544, over 15612.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.0894, pruned_loss=0.01197, audio_tagging_loss=0.008381, over 3055079.09 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:53:39,641 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.10 vs. limit=15.0 2023-11-29 11:53:40,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3952806.6666666665, ans=0.125 2023-11-29 11:53:48,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3952873.3333333335, ans=0.125 2023-11-29 11:54:05,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3952940.0, ans=0.125 2023-11-29 11:54:06,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3952940.0, ans=0.09899494936611666 2023-11-29 11:54:10,846 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 592950 2023-11-29 11:54:20,494 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 11:54:38,442 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3800, loss[loss=0.05373, simple_loss=0.07199, pruned_loss=0.005918, audio_tagging_loss=0.01182, over 14873.00 frames. ], tot_loss[loss=0.06543, simple_loss=0.08969, pruned_loss=0.01203, audio_tagging_loss=0.008552, over 3051235.57 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:54:41,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3953140.0, ans=0.0 2023-11-29 11:54:42,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3953140.0, ans=0.125 2023-11-29 11:54:53,232 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 11:55:12,168 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593000 2023-11-29 11:55:18,273 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.426e+01 9.089e+01 9.885e+01 1.067e+02 1.488e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-29 11:55:33,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3953406.6666666665, ans=0.1 2023-11-29 11:55:41,800 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3850, loss[loss=0.05474, simple_loss=0.07, pruned_loss=0.01032, audio_tagging_loss=0.009424, over 14016.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08972, pruned_loss=0.01206, audio_tagging_loss=0.008657, over 3052074.52 frames. ], batch size: 52, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:55:55,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3953540.0, ans=0.125 2023-11-29 11:55:57,840 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2023-11-29 11:56:01,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3953540.0, ans=0.2 2023-11-29 11:56:11,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3953606.6666666665, ans=0.0 2023-11-29 11:56:14,538 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593050 2023-11-29 11:56:20,243 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.86 vs. limit=15.0 2023-11-29 11:56:37,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3953740.0, ans=0.125 2023-11-29 11:56:43,360 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3900, loss[loss=0.08112, simple_loss=0.1097, pruned_loss=0.01624, audio_tagging_loss=0.01003, over 15527.00 frames. ], tot_loss[loss=0.06554, simple_loss=0.08962, pruned_loss=0.01206, audio_tagging_loss=0.008667, over 3049352.60 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:57:04,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3953873.3333333335, ans=0.0 2023-11-29 11:57:08,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3953940.0, ans=0.2 2023-11-29 11:57:14,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3953940.0, ans=0.1 2023-11-29 11:57:17,569 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593100 2023-11-29 11:57:18,196 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.54 vs. limit=15.0 2023-11-29 11:57:23,340 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 8.927e+01 9.561e+01 1.012e+02 1.625e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-29 11:57:33,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3954073.3333333335, ans=0.0 2023-11-29 11:57:41,552 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.76 vs. limit=10.0 2023-11-29 11:57:45,083 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 3950, loss[loss=0.08329, simple_loss=0.1151, pruned_loss=0.01689, audio_tagging_loss=0.008855, over 15913.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08943, pruned_loss=0.01203, audio_tagging_loss=0.008731, over 3049379.64 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:57:56,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3954206.6666666665, ans=0.125 2023-11-29 11:58:04,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3954206.6666666665, ans=0.125 2023-11-29 11:58:18,307 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593150 2023-11-29 11:58:27,018 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-29 11:58:27,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3954340.0, ans=0.125 2023-11-29 11:58:29,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3954340.0, ans=0.0 2023-11-29 11:58:35,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3954406.6666666665, ans=0.04949747468305833 2023-11-29 11:58:47,957 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4000, loss[loss=0.06658, simple_loss=0.09901, pruned_loss=0.009929, audio_tagging_loss=0.007147, over 15502.00 frames. ], tot_loss[loss=0.06571, simple_loss=0.08942, pruned_loss=0.01219, audio_tagging_loss=0.008805, over 3046964.39 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 11:58:50,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3954473.3333333335, ans=0.125 2023-11-29 11:59:01,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3954540.0, ans=0.0 2023-11-29 11:59:20,397 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593200 2023-11-29 11:59:26,595 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.870e+01 8.877e+01 9.527e+01 1.031e+02 1.352e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-29 11:59:29,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3954673.3333333335, ans=0.125 2023-11-29 11:59:39,491 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=15.0 2023-11-29 11:59:44,063 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=12.0 2023-11-29 11:59:49,335 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4050, loss[loss=0.05849, simple_loss=0.07312, pruned_loss=0.009752, audio_tagging_loss=0.01217, over 14412.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08904, pruned_loss=0.01217, audio_tagging_loss=0.008871, over 3045309.10 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 11:59:54,008 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:00:18,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3954940.0, ans=10.0 2023-11-29 12:00:22,988 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593250 2023-11-29 12:00:27,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3955006.6666666665, ans=0.125 2023-11-29 12:00:51,360 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4100, loss[loss=0.06687, simple_loss=0.09192, pruned_loss=0.01317, audio_tagging_loss=0.007736, over 15811.00 frames. ], tot_loss[loss=0.0653, simple_loss=0.08889, pruned_loss=0.01209, audio_tagging_loss=0.008765, over 3047013.72 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:01:03,217 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.77 vs. limit=10.0 2023-11-29 12:01:07,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3955206.6666666665, ans=0.0 2023-11-29 12:01:21,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3955273.3333333335, ans=0.0 2023-11-29 12:01:24,883 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593300 2023-11-29 12:01:29,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3955340.0, ans=0.125 2023-11-29 12:01:31,714 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.450e+01 9.221e+01 9.823e+01 1.065e+02 1.481e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-29 12:01:32,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3955340.0, ans=0.0 2023-11-29 12:01:52,926 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4150, loss[loss=0.04118, simple_loss=0.05795, pruned_loss=0.003033, audio_tagging_loss=0.009175, over 14523.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08811, pruned_loss=0.01186, audio_tagging_loss=0.008631, over 3047799.89 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:01:58,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3955473.3333333335, ans=0.125 2023-11-29 12:02:00,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3955473.3333333335, ans=0.125 2023-11-29 12:02:08,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3955540.0, ans=0.125 2023-11-29 12:02:09,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3955540.0, ans=0.1 2023-11-29 12:02:11,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3955540.0, ans=0.1 2023-11-29 12:02:17,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3955606.6666666665, ans=0.1 2023-11-29 12:02:21,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3955606.6666666665, ans=0.125 2023-11-29 12:02:22,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3955606.6666666665, ans=0.2 2023-11-29 12:02:26,200 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593350 2023-11-29 12:02:28,806 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:02:38,412 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:02:53,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3955806.6666666665, ans=0.125 2023-11-29 12:02:54,775 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4200, loss[loss=0.07003, simple_loss=0.09818, pruned_loss=0.01401, audio_tagging_loss=0.006932, over 15224.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08832, pruned_loss=0.01186, audio_tagging_loss=0.008567, over 3048891.66 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:03:03,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3955806.6666666665, ans=0.125 2023-11-29 12:03:04,294 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:03:13,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3955873.3333333335, ans=0.125 2023-11-29 12:03:19,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3955940.0, ans=0.125 2023-11-29 12:03:28,469 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593400 2023-11-29 12:03:35,591 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.100e+01 9.099e+01 9.882e+01 1.051e+02 1.333e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 12:03:38,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3956006.6666666665, ans=0.09899494936611666 2023-11-29 12:03:39,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3956006.6666666665, ans=0.125 2023-11-29 12:03:56,515 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4250, loss[loss=0.07858, simple_loss=0.1086, pruned_loss=0.01608, audio_tagging_loss=0.008192, over 14126.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08835, pruned_loss=0.0118, audio_tagging_loss=0.008468, over 3051924.15 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:04:16,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3956206.6666666665, ans=0.1 2023-11-29 12:04:23,563 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2023-11-29 12:04:26,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3956273.3333333335, ans=0.125 2023-11-29 12:04:30,816 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593450 2023-11-29 12:04:32,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3956273.3333333335, ans=0.125 2023-11-29 12:04:50,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3956406.6666666665, ans=0.0 2023-11-29 12:04:52,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3956406.6666666665, ans=0.125 2023-11-29 12:04:56,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3956406.6666666665, ans=0.0 2023-11-29 12:04:58,921 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4300, loss[loss=0.07911, simple_loss=0.1194, pruned_loss=0.01385, audio_tagging_loss=0.00556, over 15345.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08932, pruned_loss=0.01181, audio_tagging_loss=0.008392, over 3057056.05 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 8.0 2023-11-29 12:05:01,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=12.0 2023-11-29 12:05:03,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3956473.3333333335, ans=0.95 2023-11-29 12:05:07,153 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2023-11-29 12:05:09,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3956473.3333333335, ans=0.125 2023-11-29 12:05:23,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3956606.6666666665, ans=0.125 2023-11-29 12:05:31,826 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593500 2023-11-29 12:05:40,632 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.001e+01 9.077e+01 9.622e+01 1.047e+02 1.414e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 12:05:53,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3956740.0, ans=0.125 2023-11-29 12:06:00,180 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4350, loss[loss=0.05711, simple_loss=0.07555, pruned_loss=0.01147, audio_tagging_loss=0.007861, over 15097.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08907, pruned_loss=0.01188, audio_tagging_loss=0.008502, over 3051163.99 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 8.0 2023-11-29 12:06:03,196 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2023-11-29 12:06:31,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3956940.0, ans=0.0 2023-11-29 12:06:33,027 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593550 2023-11-29 12:06:33,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3956940.0, ans=0.125 2023-11-29 12:06:57,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3957073.3333333335, ans=0.0 2023-11-29 12:07:02,027 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4400, loss[loss=0.0569, simple_loss=0.07988, pruned_loss=0.009099, audio_tagging_loss=0.007859, over 15660.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08898, pruned_loss=0.01185, audio_tagging_loss=0.008482, over 3051342.44 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:07:18,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3957206.6666666665, ans=0.125 2023-11-29 12:07:35,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3957273.3333333335, ans=0.125 2023-11-29 12:07:36,519 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593600 2023-11-29 12:07:45,796 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.856e+01 9.188e+01 9.758e+01 1.053e+02 1.476e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-29 12:08:05,321 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4450, loss[loss=0.07367, simple_loss=0.1022, pruned_loss=0.01532, audio_tagging_loss=0.007233, over 15195.00 frames. ], tot_loss[loss=0.0648, simple_loss=0.08888, pruned_loss=0.01191, audio_tagging_loss=0.00845, over 3048478.83 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:08:29,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3957606.6666666665, ans=0.1 2023-11-29 12:08:31,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3957606.6666666665, ans=0.0 2023-11-29 12:08:38,792 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593650 2023-11-29 12:08:48,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3957673.3333333335, ans=0.2 2023-11-29 12:09:06,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3957806.6666666665, ans=0.1 2023-11-29 12:09:07,771 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4500, loss[loss=0.06867, simple_loss=0.09551, pruned_loss=0.01513, audio_tagging_loss=0.005783, over 14001.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08937, pruned_loss=0.01193, audio_tagging_loss=0.008373, over 3043370.54 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:09:27,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3957873.3333333335, ans=0.125 2023-11-29 12:09:33,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3957940.0, ans=0.0 2023-11-29 12:09:41,370 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593700 2023-11-29 12:09:43,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3958006.6666666665, ans=0.125 2023-11-29 12:09:50,094 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 9.149e+01 9.833e+01 1.069e+02 1.731e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-29 12:09:56,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3958073.3333333335, ans=0.2 2023-11-29 12:10:05,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3958073.3333333335, ans=0.0 2023-11-29 12:10:08,731 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4550, loss[loss=0.07171, simple_loss=0.09814, pruned_loss=0.01389, audio_tagging_loss=0.008757, over 15390.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08853, pruned_loss=0.01183, audio_tagging_loss=0.008384, over 3043371.46 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:10:15,695 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.25 vs. limit=12.0 2023-11-29 12:10:25,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3958206.6666666665, ans=0.125 2023-11-29 12:10:32,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3958206.6666666665, ans=0.0 2023-11-29 12:10:38,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3958273.3333333335, ans=0.125 2023-11-29 12:10:43,116 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593750 2023-11-29 12:10:50,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3958340.0, ans=0.125 2023-11-29 12:10:51,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3958340.0, ans=0.125 2023-11-29 12:10:56,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3958340.0, ans=0.1 2023-11-29 12:10:56,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3958340.0, ans=0.125 2023-11-29 12:10:57,162 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:11:10,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3958473.3333333335, ans=0.125 2023-11-29 12:11:11,235 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4600, loss[loss=0.08923, simple_loss=0.1174, pruned_loss=0.02123, audio_tagging_loss=0.009312, over 15029.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08902, pruned_loss=0.01188, audio_tagging_loss=0.008455, over 3042270.62 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:11:25,603 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:11:41,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3958606.6666666665, ans=0.0 2023-11-29 12:11:41,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3958606.6666666665, ans=0.0 2023-11-29 12:11:44,160 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593800 2023-11-29 12:11:53,845 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.850e+01 9.081e+01 9.672e+01 1.036e+02 1.224e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 12:12:09,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3958740.0, ans=0.0 2023-11-29 12:12:13,804 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4650, loss[loss=0.0613, simple_loss=0.0812, pruned_loss=0.01058, audio_tagging_loss=0.01012, over 14886.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08901, pruned_loss=0.01191, audio_tagging_loss=0.008529, over 3039015.24 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:12:19,366 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=22.5 2023-11-29 12:12:20,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3958806.6666666665, ans=0.09899494936611666 2023-11-29 12:12:36,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3958940.0, ans=0.0 2023-11-29 12:12:46,248 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593850 2023-11-29 12:13:09,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3959073.3333333335, ans=0.0 2023-11-29 12:13:13,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3959140.0, ans=0.125 2023-11-29 12:13:14,372 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4700, loss[loss=0.07295, simple_loss=0.0938, pruned_loss=0.01627, audio_tagging_loss=0.009778, over 16071.00 frames. ], tot_loss[loss=0.06486, simple_loss=0.08879, pruned_loss=0.01183, audio_tagging_loss=0.008641, over 3047512.16 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:13:15,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3959140.0, ans=0.1 2023-11-29 12:13:45,555 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.10 vs. limit=22.5 2023-11-29 12:13:48,588 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593900 2023-11-29 12:13:49,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3959273.3333333335, ans=0.2 2023-11-29 12:13:56,716 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 9.196e+01 9.820e+01 1.091e+02 1.389e+02, threshold=1.964e+02, percent-clipped=0.0 2023-11-29 12:14:16,793 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4750, loss[loss=0.06137, simple_loss=0.0843, pruned_loss=0.0102, audio_tagging_loss=0.009014, over 14319.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08814, pruned_loss=0.01179, audio_tagging_loss=0.008654, over 3045422.23 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:14:41,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3959606.6666666665, ans=0.125 2023-11-29 12:14:49,587 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 593950 2023-11-29 12:14:52,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3959673.3333333335, ans=0.2 2023-11-29 12:15:14,001 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=15.0 2023-11-29 12:15:19,318 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4800, loss[loss=0.05992, simple_loss=0.08332, pruned_loss=0.01026, audio_tagging_loss=0.007998, over 15642.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08865, pruned_loss=0.01168, audio_tagging_loss=0.008732, over 3053437.61 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:15:20,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3959806.6666666665, ans=0.0 2023-11-29 12:15:32,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3959873.3333333335, ans=0.0 2023-11-29 12:15:48,681 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.63 vs. limit=10.0 2023-11-29 12:15:52,361 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594000 2023-11-29 12:15:52,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3959940.0, ans=0.125 2023-11-29 12:15:56,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3960006.6666666665, ans=0.125 2023-11-29 12:16:01,114 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.99 vs. limit=22.5 2023-11-29 12:16:01,790 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 9.011e+01 9.691e+01 1.047e+02 1.422e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 12:16:04,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3960006.6666666665, ans=0.0 2023-11-29 12:16:20,296 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4850, loss[loss=0.04301, simple_loss=0.05391, pruned_loss=0.006605, audio_tagging_loss=0.009448, over 15367.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08827, pruned_loss=0.01157, audio_tagging_loss=0.008741, over 3055131.68 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:16:54,294 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594050 2023-11-29 12:16:57,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3960340.0, ans=0.09899494936611666 2023-11-29 12:17:14,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3960406.6666666665, ans=0.125 2023-11-29 12:17:21,445 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4900, loss[loss=0.07014, simple_loss=0.08125, pruned_loss=0.01727, audio_tagging_loss=0.01225, over 14455.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08846, pruned_loss=0.0117, audio_tagging_loss=0.008735, over 3049907.03 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:17:22,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3960473.3333333335, ans=0.0 2023-11-29 12:17:23,124 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2023-11-29 12:17:27,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3960473.3333333335, ans=0.1 2023-11-29 12:17:55,228 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594100 2023-11-29 12:18:04,650 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 9.116e+01 9.769e+01 1.041e+02 2.380e+02, threshold=1.954e+02, percent-clipped=1.0 2023-11-29 12:18:24,985 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 4950, loss[loss=0.05625, simple_loss=0.07298, pruned_loss=0.01039, audio_tagging_loss=0.009373, over 15833.00 frames. ], tot_loss[loss=0.06481, simple_loss=0.08888, pruned_loss=0.01178, audio_tagging_loss=0.008589, over 3050684.46 frames. ], batch size: 63, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:18:50,532 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2023-11-29 12:18:57,357 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594150 2023-11-29 12:19:02,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3961006.6666666665, ans=0.2 2023-11-29 12:19:05,646 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2023-11-29 12:19:19,690 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.31 vs. limit=15.0 2023-11-29 12:19:26,298 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5000, loss[loss=0.07873, simple_loss=0.1133, pruned_loss=0.01397, audio_tagging_loss=0.008111, over 15637.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08933, pruned_loss=0.01186, audio_tagging_loss=0.008492, over 3060175.30 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:19:59,597 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594200 2023-11-29 12:20:02,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3961340.0, ans=0.125 2023-11-29 12:20:08,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3961340.0, ans=0.2 2023-11-29 12:20:09,318 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.899e+01 8.950e+01 9.411e+01 1.015e+02 1.285e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-29 12:20:13,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3961340.0, ans=0.1 2023-11-29 12:20:20,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3961406.6666666665, ans=0.125 2023-11-29 12:20:25,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3961406.6666666665, ans=0.1 2023-11-29 12:20:27,666 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5050, loss[loss=0.05507, simple_loss=0.0662, pruned_loss=0.01296, audio_tagging_loss=0.009014, over 15417.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08961, pruned_loss=0.01196, audio_tagging_loss=0.008393, over 3060792.63 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:21:01,466 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594250 2023-11-29 12:21:20,061 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2023-11-29 12:21:30,065 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5100, loss[loss=0.06181, simple_loss=0.08572, pruned_loss=0.008534, audio_tagging_loss=0.01042, over 13993.00 frames. ], tot_loss[loss=0.06536, simple_loss=0.08995, pruned_loss=0.01196, audio_tagging_loss=0.008425, over 3053590.85 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:21:30,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3961806.6666666665, ans=0.125 2023-11-29 12:21:30,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3961806.6666666665, ans=0.0 2023-11-29 12:22:03,842 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594300 2023-11-29 12:22:13,770 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.335e+01 8.985e+01 9.588e+01 1.015e+02 1.337e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-29 12:22:16,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3962006.6666666665, ans=0.07 2023-11-29 12:22:26,252 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2023-11-29 12:22:30,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3962073.3333333335, ans=0.2 2023-11-29 12:22:32,634 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5150, loss[loss=0.0675, simple_loss=0.09547, pruned_loss=0.01325, audio_tagging_loss=0.006513, over 14917.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08891, pruned_loss=0.01189, audio_tagging_loss=0.008527, over 3052262.59 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:22:36,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3962140.0, ans=0.125 2023-11-29 12:22:41,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3962140.0, ans=0.025 2023-11-29 12:23:00,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3962273.3333333335, ans=0.125 2023-11-29 12:23:06,599 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594350 2023-11-29 12:23:13,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3962340.0, ans=0.125 2023-11-29 12:23:18,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3962340.0, ans=0.0 2023-11-29 12:23:24,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3962406.6666666665, ans=0.1 2023-11-29 12:23:34,647 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5200, loss[loss=0.07152, simple_loss=0.1051, pruned_loss=0.01305, audio_tagging_loss=0.005909, over 15008.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08919, pruned_loss=0.01174, audio_tagging_loss=0.008413, over 3049345.75 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:23:39,714 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.96 vs. limit=15.0 2023-11-29 12:23:46,104 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.30 vs. limit=22.5 2023-11-29 12:23:52,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3962540.0, ans=0.2 2023-11-29 12:23:56,144 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.42 vs. limit=5.0 2023-11-29 12:24:08,848 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594400 2023-11-29 12:24:16,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3962673.3333333335, ans=0.125 2023-11-29 12:24:16,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3962673.3333333335, ans=0.025 2023-11-29 12:24:18,556 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.794e+01 9.283e+01 9.729e+01 1.049e+02 1.320e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 12:24:29,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3962740.0, ans=0.125 2023-11-29 12:24:35,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3962740.0, ans=0.07 2023-11-29 12:24:37,190 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5250, loss[loss=0.06876, simple_loss=0.09753, pruned_loss=0.01066, audio_tagging_loss=0.009334, over 16517.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08896, pruned_loss=0.0117, audio_tagging_loss=0.008399, over 3039909.64 frames. ], batch size: 62, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:24:37,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3962806.6666666665, ans=0.125 2023-11-29 12:24:39,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3962806.6666666665, ans=0.1 2023-11-29 12:24:39,491 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.12 vs. limit=10.0 2023-11-29 12:24:52,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3962873.3333333335, ans=0.2 2023-11-29 12:24:57,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3962873.3333333335, ans=0.1 2023-11-29 12:25:06,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3962940.0, ans=0.2 2023-11-29 12:25:10,170 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594450 2023-11-29 12:25:39,443 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5300, loss[loss=0.06796, simple_loss=0.08333, pruned_loss=0.01736, audio_tagging_loss=0.00893, over 14456.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08985, pruned_loss=0.01192, audio_tagging_loss=0.008285, over 3037559.51 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:26:10,441 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:26:13,221 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594500 2023-11-29 12:26:22,669 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.843e+01 9.148e+01 9.632e+01 1.017e+02 1.264e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-29 12:26:41,276 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5350, loss[loss=0.05445, simple_loss=0.07457, pruned_loss=0.008614, audio_tagging_loss=0.008549, over 15229.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08959, pruned_loss=0.01193, audio_tagging_loss=0.008337, over 3036926.30 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:27:11,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3963606.6666666665, ans=0.0 2023-11-29 12:27:15,416 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594550 2023-11-29 12:27:19,679 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2023-11-29 12:27:20,569 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.43 vs. limit=22.5 2023-11-29 12:27:23,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3963673.3333333335, ans=0.05 2023-11-29 12:27:29,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3963740.0, ans=0.0 2023-11-29 12:27:40,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3963740.0, ans=0.2 2023-11-29 12:27:43,662 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5400, loss[loss=0.07204, simple_loss=0.1012, pruned_loss=0.01366, audio_tagging_loss=0.007803, over 15169.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08995, pruned_loss=0.01191, audio_tagging_loss=0.008354, over 3042547.13 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:28:00,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3963873.3333333335, ans=0.0 2023-11-29 12:28:16,330 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594600 2023-11-29 12:28:17,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3963940.0, ans=0.0 2023-11-29 12:28:24,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3964006.6666666665, ans=0.09899494936611666 2023-11-29 12:28:25,819 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:28:26,689 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.274e+01 9.096e+01 9.650e+01 1.029e+02 1.446e+02, threshold=1.930e+02, percent-clipped=0.0 2023-11-29 12:28:36,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3964073.3333333335, ans=0.0 2023-11-29 12:28:37,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3964073.3333333335, ans=0.09899494936611666 2023-11-29 12:28:40,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3964073.3333333335, ans=0.07 2023-11-29 12:28:45,295 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5450, loss[loss=0.07247, simple_loss=0.1104, pruned_loss=0.01236, audio_tagging_loss=0.004895, over 15008.00 frames. ], tot_loss[loss=0.06551, simple_loss=0.09025, pruned_loss=0.01207, audio_tagging_loss=0.008313, over 3036619.03 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:28:59,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3964206.6666666665, ans=0.95 2023-11-29 12:29:03,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3964206.6666666665, ans=0.125 2023-11-29 12:29:15,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3964273.3333333335, ans=0.0 2023-11-29 12:29:19,081 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594650 2023-11-29 12:29:32,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3964340.0, ans=0.2 2023-11-29 12:29:46,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3964473.3333333335, ans=0.0 2023-11-29 12:29:47,554 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5500, loss[loss=0.06099, simple_loss=0.07727, pruned_loss=0.01262, audio_tagging_loss=0.009743, over 15774.00 frames. ], tot_loss[loss=0.06497, simple_loss=0.08929, pruned_loss=0.01197, audio_tagging_loss=0.008347, over 3037101.91 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:29:51,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3964473.3333333335, ans=0.0 2023-11-29 12:30:11,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3964606.6666666665, ans=0.125 2023-11-29 12:30:21,168 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594700 2023-11-29 12:30:32,247 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.770e+01 9.260e+01 9.828e+01 1.052e+02 2.145e+02, threshold=1.966e+02, percent-clipped=1.0 2023-11-29 12:30:34,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3964673.3333333335, ans=0.0 2023-11-29 12:30:37,692 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.05 vs. limit=22.5 2023-11-29 12:30:40,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3964740.0, ans=0.125 2023-11-29 12:30:49,514 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5550, loss[loss=0.05094, simple_loss=0.06507, pruned_loss=0.009451, audio_tagging_loss=0.008953, over 14615.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08926, pruned_loss=0.01202, audio_tagging_loss=0.00849, over 3034974.05 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:31:03,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3964873.3333333335, ans=0.04949747468305833 2023-11-29 12:31:04,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3964873.3333333335, ans=0.125 2023-11-29 12:31:06,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3964873.3333333335, ans=0.0 2023-11-29 12:31:07,736 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2023-11-29 12:31:22,562 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594750 2023-11-29 12:31:22,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3964940.0, ans=0.2 2023-11-29 12:31:36,097 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.17 vs. limit=6.0 2023-11-29 12:31:40,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3965073.3333333335, ans=0.1 2023-11-29 12:31:52,110 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5600, loss[loss=0.06067, simple_loss=0.08105, pruned_loss=0.01134, audio_tagging_loss=0.008804, over 16554.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08982, pruned_loss=0.01216, audio_tagging_loss=0.008562, over 3040269.45 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:32:24,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3965273.3333333335, ans=0.0 2023-11-29 12:32:25,806 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594800 2023-11-29 12:32:29,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3965340.0, ans=0.0 2023-11-29 12:32:37,340 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.718e+01 9.303e+01 9.793e+01 1.041e+02 1.252e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-29 12:32:38,533 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:32:42,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3965406.6666666665, ans=0.015 2023-11-29 12:32:43,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3965406.6666666665, ans=0.07 2023-11-29 12:32:53,612 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5650, loss[loss=0.06619, simple_loss=0.08421, pruned_loss=0.01375, audio_tagging_loss=0.01033, over 15456.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09055, pruned_loss=0.0123, audio_tagging_loss=0.008549, over 3046901.56 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:33:28,105 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594850 2023-11-29 12:33:33,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3965673.3333333335, ans=0.125 2023-11-29 12:33:37,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3965673.3333333335, ans=0.0 2023-11-29 12:33:53,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3965740.0, ans=0.125 2023-11-29 12:33:56,396 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5700, loss[loss=0.06121, simple_loss=0.08304, pruned_loss=0.009674, audio_tagging_loss=0.01002, over 15525.00 frames. ], tot_loss[loss=0.06601, simple_loss=0.09055, pruned_loss=0.01221, audio_tagging_loss=0.008534, over 3048516.79 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:33:59,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3965806.6666666665, ans=0.125 2023-11-29 12:34:00,525 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2023-11-29 12:34:03,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3965806.6666666665, ans=0.2 2023-11-29 12:34:29,210 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594900 2023-11-29 12:34:40,778 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.829e+01 8.907e+01 9.442e+01 9.916e+01 1.221e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-29 12:34:48,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3966073.3333333335, ans=0.125 2023-11-29 12:34:53,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3966073.3333333335, ans=0.1 2023-11-29 12:34:55,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3966073.3333333335, ans=0.1 2023-11-29 12:34:58,507 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5750, loss[loss=0.0845, simple_loss=0.1087, pruned_loss=0.01971, audio_tagging_loss=0.01042, over 15188.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08942, pruned_loss=0.01202, audio_tagging_loss=0.008448, over 3051664.31 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:35:05,004 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.05 vs. limit=22.5 2023-11-29 12:35:09,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3966206.6666666665, ans=0.0 2023-11-29 12:35:31,950 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 594950 2023-11-29 12:35:37,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3966340.0, ans=0.0 2023-11-29 12:35:37,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3966340.0, ans=0.0 2023-11-29 12:35:39,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3966340.0, ans=0.035 2023-11-29 12:36:00,175 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5800, loss[loss=0.05524, simple_loss=0.07502, pruned_loss=0.009455, audio_tagging_loss=0.008273, over 15084.00 frames. ], tot_loss[loss=0.06514, simple_loss=0.08943, pruned_loss=0.012, audio_tagging_loss=0.008427, over 3044152.17 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:36:08,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3966473.3333333335, ans=0.1 2023-11-29 12:36:17,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3966540.0, ans=0.125 2023-11-29 12:36:34,110 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595000 2023-11-29 12:36:44,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3966673.3333333335, ans=15.0 2023-11-29 12:36:45,837 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 9.166e+01 9.851e+01 1.059e+02 1.504e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 12:36:47,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3966673.3333333335, ans=0.125 2023-11-29 12:37:01,718 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5850, loss[loss=0.08777, simple_loss=0.1288, pruned_loss=0.01712, audio_tagging_loss=0.00626, over 15585.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08887, pruned_loss=0.01187, audio_tagging_loss=0.008352, over 3038176.45 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:37:26,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3966940.0, ans=0.1 2023-11-29 12:37:34,413 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595050 2023-11-29 12:37:48,668 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2023-11-29 12:37:50,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3967073.3333333335, ans=0.1 2023-11-29 12:38:03,786 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5900, loss[loss=0.07274, simple_loss=0.107, pruned_loss=0.01345, audio_tagging_loss=0.005795, over 16243.00 frames. ], tot_loss[loss=0.06553, simple_loss=0.09015, pruned_loss=0.0122, audio_tagging_loss=0.008258, over 3040273.32 frames. ], batch size: 63, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:38:09,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3967140.0, ans=0.125 2023-11-29 12:38:26,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3967273.3333333335, ans=0.125 2023-11-29 12:38:34,033 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:38:36,957 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595100 2023-11-29 12:38:47,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3967340.0, ans=0.0 2023-11-29 12:38:49,678 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 9.143e+01 9.896e+01 1.087e+02 1.374e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-29 12:39:04,751 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 5950, loss[loss=0.06922, simple_loss=0.09888, pruned_loss=0.01179, audio_tagging_loss=0.007988, over 14984.00 frames. ], tot_loss[loss=0.06584, simple_loss=0.09042, pruned_loss=0.0123, audio_tagging_loss=0.008326, over 3048360.67 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:39:06,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3967473.3333333335, ans=0.125 2023-11-29 12:39:38,907 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595150 2023-11-29 12:39:40,507 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2023-11-29 12:39:46,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3967673.3333333335, ans=0.1 2023-11-29 12:40:06,620 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6000, loss[loss=0.08133, simple_loss=0.1132, pruned_loss=0.01644, audio_tagging_loss=0.00829, over 15882.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08925, pruned_loss=0.01214, audio_tagging_loss=0.008346, over 3043013.08 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:40:06,621 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 12:40:24,948 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.1390, 3.0415, 3.4719, 2.8706, 3.4544, 3.1668, 3.1334, 3.2071], device='cuda:2') 2023-11-29 12:40:25,311 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.2186, 5.0700, 4.3488, 4.8923], device='cuda:2') 2023-11-29 12:40:46,474 INFO [train_asr.py:1267] (2/4) Epoch 50, validation: loss=0.05775, simple_loss=0.05043, pruned_loss=0.005339, audio_tagging_loss=0.0272, over 4681554.00 frames. 2023-11-29 12:40:46,475 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 12:40:59,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3967873.3333333335, ans=0.1 2023-11-29 12:41:10,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3967940.0, ans=0.125 2023-11-29 12:41:12,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3967940.0, ans=0.2 2023-11-29 12:41:13,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3967940.0, ans=0.125 2023-11-29 12:41:18,945 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595200 2023-11-29 12:41:26,106 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:41:32,836 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.848e+01 9.055e+01 9.788e+01 1.026e+02 1.358e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-29 12:41:32,914 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 12:41:34,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3968006.6666666665, ans=0.125 2023-11-29 12:41:35,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3968073.3333333335, ans=0.0 2023-11-29 12:41:39,380 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2023-11-29 12:41:39,697 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.92 vs. limit=22.5 2023-11-29 12:41:42,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3968073.3333333335, ans=0.0 2023-11-29 12:41:45,284 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2023-11-29 12:41:47,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3968140.0, ans=0.125 2023-11-29 12:41:48,265 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6050, loss[loss=0.07941, simple_loss=0.1066, pruned_loss=0.01837, audio_tagging_loss=0.007746, over 15722.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08888, pruned_loss=0.01203, audio_tagging_loss=0.008259, over 3046099.63 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:41:52,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3968140.0, ans=0.07 2023-11-29 12:42:04,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3968206.6666666665, ans=0.125 2023-11-29 12:42:08,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=3968206.6666666665, ans=0.1 2023-11-29 12:42:20,159 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=15.0 2023-11-29 12:42:21,712 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595250 2023-11-29 12:42:37,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3968406.6666666665, ans=0.125 2023-11-29 12:42:39,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3968406.6666666665, ans=0.125 2023-11-29 12:42:47,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3968406.6666666665, ans=0.0 2023-11-29 12:42:49,446 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6100, loss[loss=0.06975, simple_loss=0.09276, pruned_loss=0.01729, audio_tagging_loss=0.006089, over 14501.00 frames. ], tot_loss[loss=0.06483, simple_loss=0.08922, pruned_loss=0.01197, audio_tagging_loss=0.008251, over 3047533.00 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:42:50,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3968473.3333333335, ans=0.125 2023-11-29 12:43:12,354 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:43:17,636 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.09 vs. limit=15.0 2023-11-29 12:43:22,758 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595300 2023-11-29 12:43:24,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3968606.6666666665, ans=0.125 2023-11-29 12:43:35,519 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.840e+01 9.140e+01 9.748e+01 1.061e+02 1.283e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 12:43:35,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3968673.3333333335, ans=0.1 2023-11-29 12:43:52,112 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6150, loss[loss=0.07446, simple_loss=0.1088, pruned_loss=0.01475, audio_tagging_loss=0.00533, over 15491.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08971, pruned_loss=0.01208, audio_tagging_loss=0.008141, over 3047430.74 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:44:21,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3968940.0, ans=0.1 2023-11-29 12:44:24,718 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595350 2023-11-29 12:44:26,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3968940.0, ans=0.125 2023-11-29 12:44:40,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3969073.3333333335, ans=0.0 2023-11-29 12:44:53,562 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6200, loss[loss=0.0676, simple_loss=0.08955, pruned_loss=0.01173, audio_tagging_loss=0.01109, over 15438.00 frames. ], tot_loss[loss=0.06445, simple_loss=0.08836, pruned_loss=0.01191, audio_tagging_loss=0.008361, over 3047874.89 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:45:00,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3969140.0, ans=0.125 2023-11-29 12:45:09,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3969206.6666666665, ans=0.125 2023-11-29 12:45:14,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3969206.6666666665, ans=0.125 2023-11-29 12:45:15,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3969206.6666666665, ans=0.125 2023-11-29 12:45:27,181 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595400 2023-11-29 12:45:34,402 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2023-11-29 12:45:39,701 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.103e+01 9.032e+01 9.591e+01 1.015e+02 1.293e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-29 12:45:42,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3969406.6666666665, ans=0.125 2023-11-29 12:45:55,750 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6250, loss[loss=0.06559, simple_loss=0.09289, pruned_loss=0.009255, audio_tagging_loss=0.009892, over 15757.00 frames. ], tot_loss[loss=0.06423, simple_loss=0.08794, pruned_loss=0.01177, audio_tagging_loss=0.008488, over 3044457.52 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:46:12,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3969540.0, ans=0.0 2023-11-29 12:46:25,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3969606.6666666665, ans=10.0 2023-11-29 12:46:29,242 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595450 2023-11-29 12:46:33,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3969673.3333333335, ans=0.125 2023-11-29 12:46:48,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3969740.0, ans=0.125 2023-11-29 12:46:57,360 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6300, loss[loss=0.07283, simple_loss=0.08938, pruned_loss=0.01785, audio_tagging_loss=0.01029, over 15450.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08839, pruned_loss=0.01202, audio_tagging_loss=0.008567, over 3045493.07 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:47:03,152 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2023-11-29 12:47:31,211 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595500 2023-11-29 12:47:44,500 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 8.915e+01 9.519e+01 1.033e+02 1.401e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-29 12:47:51,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3970073.3333333335, ans=0.125 2023-11-29 12:47:59,900 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6350, loss[loss=0.06209, simple_loss=0.08948, pruned_loss=0.01028, audio_tagging_loss=0.007067, over 15682.00 frames. ], tot_loss[loss=0.06492, simple_loss=0.08865, pruned_loss=0.01199, audio_tagging_loss=0.008606, over 3045396.02 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:48:28,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3970273.3333333335, ans=0.125 2023-11-29 12:48:32,924 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595550 2023-11-29 12:48:33,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3970273.3333333335, ans=0.07 2023-11-29 12:48:47,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3970340.0, ans=0.125 2023-11-29 12:49:01,853 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6400, loss[loss=0.0748, simple_loss=0.1061, pruned_loss=0.01219, audio_tagging_loss=0.00955, over 15024.00 frames. ], tot_loss[loss=0.06505, simple_loss=0.0887, pruned_loss=0.01203, audio_tagging_loss=0.008673, over 3047659.52 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:49:27,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3970606.6666666665, ans=0.2 2023-11-29 12:49:30,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3970606.6666666665, ans=0.0 2023-11-29 12:49:32,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3970606.6666666665, ans=0.0 2023-11-29 12:49:35,436 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595600 2023-11-29 12:49:50,260 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.542e+01 9.115e+01 9.887e+01 1.069e+02 1.285e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-29 12:50:03,035 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6450, loss[loss=0.08073, simple_loss=0.1111, pruned_loss=0.01755, audio_tagging_loss=0.007643, over 15904.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08907, pruned_loss=0.01196, audio_tagging_loss=0.008681, over 3041303.14 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:50:18,957 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.67 vs. limit=15.0 2023-11-29 12:50:26,622 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.53 vs. limit=10.0 2023-11-29 12:50:37,704 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595650 2023-11-29 12:50:42,947 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.70 vs. limit=10.0 2023-11-29 12:50:43,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3971006.6666666665, ans=0.125 2023-11-29 12:50:45,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3971006.6666666665, ans=0.125 2023-11-29 12:50:56,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3971073.3333333335, ans=0.5 2023-11-29 12:51:03,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3971073.3333333335, ans=0.025 2023-11-29 12:51:05,779 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6500, loss[loss=0.05502, simple_loss=0.07148, pruned_loss=0.008786, audio_tagging_loss=0.01049, over 15815.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08807, pruned_loss=0.01188, audio_tagging_loss=0.008668, over 3038621.48 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:51:17,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3971206.6666666665, ans=0.125 2023-11-29 12:51:18,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3971206.6666666665, ans=0.125 2023-11-29 12:51:30,984 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:51:39,720 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595700 2023-11-29 12:51:39,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3971273.3333333335, ans=0.1 2023-11-29 12:51:42,633 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.54 vs. limit=15.0 2023-11-29 12:51:52,387 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2023-11-29 12:51:54,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.470e+01 9.181e+01 9.755e+01 1.057e+02 1.258e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 12:52:07,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3971473.3333333335, ans=0.0 2023-11-29 12:52:07,907 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6550, loss[loss=0.04854, simple_loss=0.06083, pruned_loss=0.008454, audio_tagging_loss=0.009677, over 15777.00 frames. ], tot_loss[loss=0.06516, simple_loss=0.08893, pruned_loss=0.01214, audio_tagging_loss=0.008561, over 3048315.41 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:52:19,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3971540.0, ans=0.04949747468305833 2023-11-29 12:52:29,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=3971540.0, ans=0.02 2023-11-29 12:52:30,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3971540.0, ans=0.125 2023-11-29 12:52:41,540 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595750 2023-11-29 12:52:54,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3971673.3333333335, ans=0.125 2023-11-29 12:53:04,497 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.33 vs. limit=10.0 2023-11-29 12:53:09,513 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6600, loss[loss=0.09067, simple_loss=0.129, pruned_loss=0.0194, audio_tagging_loss=0.00676, over 15687.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08906, pruned_loss=0.01203, audio_tagging_loss=0.008397, over 3046224.24 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:53:12,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3971806.6666666665, ans=0.125 2023-11-29 12:53:23,066 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=12.0 2023-11-29 12:53:31,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3971873.3333333335, ans=0.125 2023-11-29 12:53:39,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3971940.0, ans=0.125 2023-11-29 12:53:42,792 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595800 2023-11-29 12:53:52,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3972006.6666666665, ans=0.05 2023-11-29 12:53:55,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3972006.6666666665, ans=0.0 2023-11-29 12:53:56,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3972006.6666666665, ans=0.0 2023-11-29 12:53:57,537 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.899e+01 9.360e+01 1.006e+02 1.174e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-29 12:54:02,430 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.39 vs. limit=12.0 2023-11-29 12:54:09,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3972073.3333333335, ans=0.0 2023-11-29 12:54:11,713 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6650, loss[loss=0.06885, simple_loss=0.09446, pruned_loss=0.01408, audio_tagging_loss=0.007537, over 15928.00 frames. ], tot_loss[loss=0.06495, simple_loss=0.08894, pruned_loss=0.01206, audio_tagging_loss=0.008414, over 3041728.01 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:54:35,695 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:54:44,985 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595850 2023-11-29 12:54:59,569 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.18 vs. limit=22.5 2023-11-29 12:55:13,859 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6700, loss[loss=0.06833, simple_loss=0.08987, pruned_loss=0.01368, audio_tagging_loss=0.009707, over 15376.00 frames. ], tot_loss[loss=0.06504, simple_loss=0.0892, pruned_loss=0.012, audio_tagging_loss=0.008434, over 3042523.55 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:55:16,919 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2023-11-29 12:55:28,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3972540.0, ans=0.125 2023-11-29 12:55:46,609 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595900 2023-11-29 12:56:01,560 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.562e+01 9.149e+01 9.695e+01 1.030e+02 1.289e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-29 12:56:07,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3972740.0, ans=0.125 2023-11-29 12:56:15,546 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6750, loss[loss=0.056, simple_loss=0.0762, pruned_loss=0.01008, audio_tagging_loss=0.007819, over 14127.00 frames. ], tot_loss[loss=0.06452, simple_loss=0.08841, pruned_loss=0.0118, audio_tagging_loss=0.008505, over 3032535.26 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 12:56:22,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3972806.6666666665, ans=0.0 2023-11-29 12:56:42,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3972940.0, ans=0.2 2023-11-29 12:56:49,835 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 595950 2023-11-29 12:57:18,147 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6800, loss[loss=0.07138, simple_loss=0.09248, pruned_loss=0.01469, audio_tagging_loss=0.01045, over 14887.00 frames. ], tot_loss[loss=0.06415, simple_loss=0.08784, pruned_loss=0.01168, audio_tagging_loss=0.008547, over 3039682.51 frames. ], batch size: 53, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:57:28,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3973140.0, ans=0.125 2023-11-29 12:57:29,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3973206.6666666665, ans=0.0 2023-11-29 12:57:31,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3973206.6666666665, ans=0.025 2023-11-29 12:57:42,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3973273.3333333335, ans=0.0 2023-11-29 12:57:51,583 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596000 2023-11-29 12:58:00,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3973340.0, ans=0.125 2023-11-29 12:58:02,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3973340.0, ans=0.125 2023-11-29 12:58:09,395 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.052e+01 8.975e+01 9.560e+01 1.019e+02 1.968e+02, threshold=1.912e+02, percent-clipped=1.0 2023-11-29 12:58:09,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3973406.6666666665, ans=0.125 2023-11-29 12:58:14,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3973406.6666666665, ans=0.0 2023-11-29 12:58:22,174 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6850, loss[loss=0.06819, simple_loss=0.09606, pruned_loss=0.01007, audio_tagging_loss=0.01009, over 15320.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08895, pruned_loss=0.01179, audio_tagging_loss=0.00839, over 3048012.86 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:58:56,500 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596050 2023-11-29 12:58:56,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3973606.6666666665, ans=0.125 2023-11-29 12:59:10,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3973740.0, ans=0.0 2023-11-29 12:59:20,504 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 12:59:24,850 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6900, loss[loss=0.07841, simple_loss=0.11, pruned_loss=0.01691, audio_tagging_loss=0.006477, over 16188.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08869, pruned_loss=0.01177, audio_tagging_loss=0.008385, over 3048104.70 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 12:59:35,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3973806.6666666665, ans=0.5 2023-11-29 12:59:37,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3973873.3333333335, ans=0.125 2023-11-29 12:59:38,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3973873.3333333335, ans=0.0 2023-11-29 12:59:42,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3973873.3333333335, ans=0.04949747468305833 2023-11-29 12:59:54,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3973940.0, ans=0.125 2023-11-29 12:59:57,992 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596100 2023-11-29 12:59:59,629 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.41 vs. limit=15.0 2023-11-29 13:00:02,010 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2023-11-29 13:00:13,495 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 8.949e+01 9.796e+01 1.035e+02 1.230e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-29 13:00:13,541 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:00:25,756 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 6950, loss[loss=0.0727, simple_loss=0.1029, pruned_loss=0.01318, audio_tagging_loss=0.008066, over 14672.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.08869, pruned_loss=0.01176, audio_tagging_loss=0.008454, over 3052491.14 frames. ], batch size: 55, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:00:31,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3974140.0, ans=0.125 2023-11-29 13:00:37,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3974206.6666666665, ans=0.2 2023-11-29 13:00:48,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3974206.6666666665, ans=0.0 2023-11-29 13:00:48,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3974206.6666666665, ans=0.125 2023-11-29 13:00:59,241 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596150 2023-11-29 13:01:04,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3974340.0, ans=0.0 2023-11-29 13:01:20,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3974406.6666666665, ans=0.1 2023-11-29 13:01:27,347 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7000, loss[loss=0.06944, simple_loss=0.09823, pruned_loss=0.01411, audio_tagging_loss=0.006216, over 15297.00 frames. ], tot_loss[loss=0.06482, simple_loss=0.08911, pruned_loss=0.01184, audio_tagging_loss=0.00842, over 3047206.83 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:01:44,509 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2023-11-29 13:01:56,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3974606.6666666665, ans=0.125 2023-11-29 13:02:01,298 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596200 2023-11-29 13:02:16,535 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 8.861e+01 9.690e+01 1.060e+02 1.703e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-29 13:02:29,200 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=12.0 2023-11-29 13:02:29,620 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7050, loss[loss=0.06511, simple_loss=0.09262, pruned_loss=0.01153, audio_tagging_loss=0.007265, over 15786.00 frames. ], tot_loss[loss=0.0641, simple_loss=0.08807, pruned_loss=0.0116, audio_tagging_loss=0.008463, over 3040971.57 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:02:34,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3974806.6666666665, ans=0.1 2023-11-29 13:02:38,682 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.23 vs. limit=15.0 2023-11-29 13:02:47,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3974873.3333333335, ans=0.2 2023-11-29 13:02:47,569 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.84 vs. limit=15.0 2023-11-29 13:03:01,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3974940.0, ans=0.125 2023-11-29 13:03:02,686 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596250 2023-11-29 13:03:13,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3975006.6666666665, ans=0.125 2023-11-29 13:03:31,614 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7100, loss[loss=0.05852, simple_loss=0.06884, pruned_loss=0.01425, audio_tagging_loss=0.009845, over 15313.00 frames. ], tot_loss[loss=0.06395, simple_loss=0.08762, pruned_loss=0.01159, audio_tagging_loss=0.008556, over 3044889.15 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:03:31,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3975140.0, ans=10.0 2023-11-29 13:04:05,067 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596300 2023-11-29 13:04:08,218 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.58 vs. limit=10.0 2023-11-29 13:04:19,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3975406.6666666665, ans=0.125 2023-11-29 13:04:20,794 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.508e+01 9.433e+01 1.002e+02 1.073e+02 1.406e+02, threshold=2.003e+02, percent-clipped=0.0 2023-11-29 13:04:26,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3975406.6666666665, ans=0.2 2023-11-29 13:04:32,918 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7150, loss[loss=0.07709, simple_loss=0.1154, pruned_loss=0.01327, audio_tagging_loss=0.006144, over 15943.00 frames. ], tot_loss[loss=0.06442, simple_loss=0.08818, pruned_loss=0.0117, audio_tagging_loss=0.008631, over 3049003.21 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:04:51,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3975540.0, ans=0.0 2023-11-29 13:05:06,662 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596350 2023-11-29 13:05:13,804 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.16 vs. limit=15.0 2023-11-29 13:05:30,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3975740.0, ans=0.125 2023-11-29 13:05:34,656 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7200, loss[loss=0.0532, simple_loss=0.07175, pruned_loss=0.007923, audio_tagging_loss=0.009403, over 14693.00 frames. ], tot_loss[loss=0.06467, simple_loss=0.08837, pruned_loss=0.01179, audio_tagging_loss=0.00869, over 3043654.74 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:06:08,358 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596400 2023-11-29 13:06:18,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3976006.6666666665, ans=0.0 2023-11-29 13:06:24,308 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.018e+01 9.297e+01 9.851e+01 1.057e+02 1.501e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 13:06:36,942 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7250, loss[loss=0.08896, simple_loss=0.1278, pruned_loss=0.01717, audio_tagging_loss=0.007906, over 15955.00 frames. ], tot_loss[loss=0.06487, simple_loss=0.08858, pruned_loss=0.01175, audio_tagging_loss=0.008825, over 3042552.66 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:06:37,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3976140.0, ans=0.0 2023-11-29 13:06:48,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3976206.6666666665, ans=0.125 2023-11-29 13:06:54,390 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=22.5 2023-11-29 13:07:09,938 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596450 2023-11-29 13:07:36,745 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=15.0 2023-11-29 13:07:38,437 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7300, loss[loss=0.06876, simple_loss=0.08466, pruned_loss=0.01478, audio_tagging_loss=0.01165, over 14632.00 frames. ], tot_loss[loss=0.06527, simple_loss=0.08929, pruned_loss=0.01193, audio_tagging_loss=0.008698, over 3039152.61 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:08:03,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3976606.6666666665, ans=0.0 2023-11-29 13:08:11,879 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596500 2023-11-29 13:08:13,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3976606.6666666665, ans=0.1 2023-11-29 13:08:18,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3976673.3333333335, ans=0.035 2023-11-29 13:08:28,802 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.133e+01 9.755e+01 1.029e+02 1.413e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-29 13:08:31,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3976740.0, ans=0.125 2023-11-29 13:08:36,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3976740.0, ans=0.1 2023-11-29 13:08:39,149 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7350, loss[loss=0.04387, simple_loss=0.0546, pruned_loss=0.006369, audio_tagging_loss=0.01019, over 16697.00 frames. ], tot_loss[loss=0.06443, simple_loss=0.08834, pruned_loss=0.01172, audio_tagging_loss=0.00854, over 3042345.73 frames. ], batch size: 65, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:09:13,172 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596550 2023-11-29 13:09:24,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3977006.6666666665, ans=0.2 2023-11-29 13:09:26,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=3977006.6666666665, ans=15.0 2023-11-29 13:09:27,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3977073.3333333335, ans=0.0 2023-11-29 13:09:27,676 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.88 vs. limit=15.0 2023-11-29 13:09:33,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3977073.3333333335, ans=0.125 2023-11-29 13:09:40,670 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7400, loss[loss=0.05894, simple_loss=0.08487, pruned_loss=0.008655, audio_tagging_loss=0.007855, over 15420.00 frames. ], tot_loss[loss=0.0642, simple_loss=0.08813, pruned_loss=0.01166, audio_tagging_loss=0.008473, over 3048494.73 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:09:59,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3977206.6666666665, ans=0.125 2023-11-29 13:10:14,020 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596600 2023-11-29 13:10:17,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3977340.0, ans=0.07 2023-11-29 13:10:23,538 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=22.5 2023-11-29 13:10:31,695 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.135e+01 9.143e+01 9.729e+01 1.063e+02 1.656e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 13:10:43,578 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7450, loss[loss=0.03947, simple_loss=0.04852, pruned_loss=0.007358, audio_tagging_loss=0.00785, over 15703.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.08895, pruned_loss=0.01171, audio_tagging_loss=0.008438, over 3047029.63 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:11:13,820 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.22 vs. limit=22.5 2023-11-29 13:11:16,219 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596650 2023-11-29 13:11:20,243 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2023-11-29 13:11:32,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3977740.0, ans=0.125 2023-11-29 13:11:35,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3977740.0, ans=0.125 2023-11-29 13:11:44,379 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7500, loss[loss=0.05421, simple_loss=0.06832, pruned_loss=0.008465, audio_tagging_loss=0.01159, over 15720.00 frames. ], tot_loss[loss=0.06421, simple_loss=0.08828, pruned_loss=0.01168, audio_tagging_loss=0.00839, over 3048525.18 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:11:45,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3977806.6666666665, ans=0.1 2023-11-29 13:11:49,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3977806.6666666665, ans=0.125 2023-11-29 13:12:18,361 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596700 2023-11-29 13:12:34,563 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.003e+01 9.233e+01 9.870e+01 1.058e+02 1.396e+02, threshold=1.974e+02, percent-clipped=0.0 2023-11-29 13:12:38,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3978073.3333333335, ans=0.125 2023-11-29 13:12:45,789 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7550, loss[loss=0.05735, simple_loss=0.08176, pruned_loss=0.009542, audio_tagging_loss=0.006929, over 15070.00 frames. ], tot_loss[loss=0.06425, simple_loss=0.08845, pruned_loss=0.0117, audio_tagging_loss=0.008328, over 3043536.51 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:12:47,538 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=15.0 2023-11-29 13:12:50,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3978140.0, ans=0.07 2023-11-29 13:12:51,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3978140.0, ans=0.0 2023-11-29 13:12:52,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3978140.0, ans=0.05 2023-11-29 13:13:04,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3978206.6666666665, ans=0.2 2023-11-29 13:13:18,392 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596750 2023-11-29 13:13:19,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3978273.3333333335, ans=0.0 2023-11-29 13:13:27,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3978340.0, ans=0.0 2023-11-29 13:13:33,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3978406.6666666665, ans=0.04949747468305833 2023-11-29 13:13:41,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3978406.6666666665, ans=0.0 2023-11-29 13:13:44,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3978406.6666666665, ans=0.0 2023-11-29 13:13:48,009 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7600, loss[loss=0.06271, simple_loss=0.08494, pruned_loss=0.009609, audio_tagging_loss=0.01063, over 15500.00 frames. ], tot_loss[loss=0.06377, simple_loss=0.0875, pruned_loss=0.01163, audio_tagging_loss=0.00839, over 3045795.15 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:13:48,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3978473.3333333335, ans=0.125 2023-11-29 13:13:51,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3978473.3333333335, ans=0.125 2023-11-29 13:13:53,160 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.93 vs. limit=10.0 2023-11-29 13:13:59,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3978540.0, ans=0.04949747468305833 2023-11-29 13:14:03,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3978540.0, ans=0.0 2023-11-29 13:14:19,840 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596800 2023-11-29 13:14:22,037 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-11-29 13:14:22,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3978673.3333333335, ans=0.1 2023-11-29 13:14:37,837 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.643e+01 8.734e+01 9.698e+01 1.076e+02 1.664e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 13:14:40,622 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.85 vs. limit=15.0 2023-11-29 13:14:43,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3978740.0, ans=0.125 2023-11-29 13:14:48,321 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7650, loss[loss=0.07257, simple_loss=0.1128, pruned_loss=0.0123, audio_tagging_loss=0.003882, over 16183.00 frames. ], tot_loss[loss=0.06402, simple_loss=0.08778, pruned_loss=0.01181, audio_tagging_loss=0.008317, over 3046457.32 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:14:54,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3978806.6666666665, ans=0.0 2023-11-29 13:15:04,876 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-11-29 13:15:21,920 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596850 2023-11-29 13:15:36,118 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.36 vs. limit=22.5 2023-11-29 13:15:50,306 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7700, loss[loss=0.04632, simple_loss=0.06269, pruned_loss=0.00469, audio_tagging_loss=0.01029, over 14574.00 frames. ], tot_loss[loss=0.06403, simple_loss=0.08796, pruned_loss=0.0117, audio_tagging_loss=0.008351, over 3044574.87 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:15:53,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3979140.0, ans=0.0 2023-11-29 13:15:59,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3979140.0, ans=0.0 2023-11-29 13:16:18,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3979273.3333333335, ans=0.0 2023-11-29 13:16:22,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3979273.3333333335, ans=0.125 2023-11-29 13:16:24,032 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596900 2023-11-29 13:16:40,898 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.740e+01 9.160e+01 9.812e+01 1.042e+02 1.331e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 13:16:52,059 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7750, loss[loss=0.07755, simple_loss=0.1052, pruned_loss=0.01754, audio_tagging_loss=0.007423, over 15970.00 frames. ], tot_loss[loss=0.06438, simple_loss=0.08846, pruned_loss=0.01177, audio_tagging_loss=0.008389, over 3049552.29 frames. ], batch size: 61, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:17:25,951 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 596950 2023-11-29 13:17:54,652 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7800, loss[loss=0.04948, simple_loss=0.0697, pruned_loss=0.004134, audio_tagging_loss=0.0105, over 14282.00 frames. ], tot_loss[loss=0.06448, simple_loss=0.08898, pruned_loss=0.01162, audio_tagging_loss=0.008376, over 3050863.73 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:17:59,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3979806.6666666665, ans=0.0 2023-11-29 13:18:19,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3979940.0, ans=0.125 2023-11-29 13:18:27,682 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597000 2023-11-29 13:18:27,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3979940.0, ans=0.2 2023-11-29 13:18:46,267 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 9.044e+01 9.654e+01 1.045e+02 1.431e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-29 13:18:57,701 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7850, loss[loss=0.07116, simple_loss=0.09885, pruned_loss=0.01362, audio_tagging_loss=0.008117, over 15967.00 frames. ], tot_loss[loss=0.06458, simple_loss=0.08873, pruned_loss=0.01166, audio_tagging_loss=0.008554, over 3046132.90 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:19:06,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3980140.0, ans=0.125 2023-11-29 13:19:08,305 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.91 vs. limit=15.0 2023-11-29 13:19:31,693 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597050 2023-11-29 13:19:33,290 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.83 vs. limit=22.5 2023-11-29 13:19:56,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3980406.6666666665, ans=0.0 2023-11-29 13:19:59,728 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7900, loss[loss=0.06174, simple_loss=0.07766, pruned_loss=0.01168, audio_tagging_loss=0.01122, over 15018.00 frames. ], tot_loss[loss=0.06411, simple_loss=0.08802, pruned_loss=0.01151, audio_tagging_loss=0.008593, over 3045296.28 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:20:08,459 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.75 vs. limit=10.0 2023-11-29 13:20:17,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3980540.0, ans=0.125 2023-11-29 13:20:18,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3980540.0, ans=0.125 2023-11-29 13:20:32,861 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597100 2023-11-29 13:20:43,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3980673.3333333335, ans=0.05 2023-11-29 13:20:44,980 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.98 vs. limit=15.0 2023-11-29 13:20:52,315 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.586e+01 9.084e+01 9.995e+01 1.068e+02 1.283e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-29 13:20:57,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3980740.0, ans=0.125 2023-11-29 13:20:59,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3980740.0, ans=0.125 2023-11-29 13:21:01,681 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 7950, loss[loss=0.07442, simple_loss=0.1033, pruned_loss=0.01382, audio_tagging_loss=0.008936, over 16697.00 frames. ], tot_loss[loss=0.06406, simple_loss=0.08761, pruned_loss=0.01152, audio_tagging_loss=0.008743, over 3046610.15 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:21:20,009 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:21:35,900 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597150 2023-11-29 13:21:47,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3981006.6666666665, ans=0.1 2023-11-29 13:22:04,444 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8000, loss[loss=0.06413, simple_loss=0.08427, pruned_loss=0.01157, audio_tagging_loss=0.01042, over 16884.00 frames. ], tot_loss[loss=0.06411, simple_loss=0.08762, pruned_loss=0.0115, audio_tagging_loss=0.0088, over 3052853.94 frames. ], batch size: 63, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:22:08,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3981140.0, ans=0.125 2023-11-29 13:22:35,664 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.37 vs. limit=15.0 2023-11-29 13:22:37,420 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597200 2023-11-29 13:22:42,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3981340.0, ans=0.125 2023-11-29 13:22:44,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3981340.0, ans=0.0 2023-11-29 13:22:44,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3981340.0, ans=0.2 2023-11-29 13:22:57,926 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.949e+01 8.831e+01 9.452e+01 1.032e+02 1.267e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-29 13:23:06,829 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8050, loss[loss=0.06628, simple_loss=0.08821, pruned_loss=0.01219, audio_tagging_loss=0.009984, over 14144.00 frames. ], tot_loss[loss=0.06367, simple_loss=0.08693, pruned_loss=0.01131, audio_tagging_loss=0.008892, over 3034052.20 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:23:08,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3981473.3333333335, ans=0.125 2023-11-29 13:23:24,449 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2023-11-29 13:23:35,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3981606.6666666665, ans=0.125 2023-11-29 13:23:40,529 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597250 2023-11-29 13:24:03,092 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2023-11-29 13:24:07,707 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2023-11-29 13:24:08,358 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8100, loss[loss=0.08139, simple_loss=0.1083, pruned_loss=0.01995, audio_tagging_loss=0.007295, over 16213.00 frames. ], tot_loss[loss=0.06409, simple_loss=0.08775, pruned_loss=0.01148, audio_tagging_loss=0.008732, over 3039280.36 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:24:41,033 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597300 2023-11-29 13:24:54,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3982006.6666666665, ans=0.125 2023-11-29 13:24:54,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3982006.6666666665, ans=0.05 2023-11-29 13:24:59,883 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 9.024e+01 9.571e+01 1.071e+02 1.276e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-29 13:25:01,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3982073.3333333335, ans=0.0 2023-11-29 13:25:08,617 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8150, loss[loss=0.07514, simple_loss=0.1084, pruned_loss=0.01521, audio_tagging_loss=0.005742, over 15787.00 frames. ], tot_loss[loss=0.06496, simple_loss=0.08926, pruned_loss=0.01173, audio_tagging_loss=0.008593, over 3045387.05 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:25:25,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3982206.6666666665, ans=0.1 2023-11-29 13:25:41,894 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597350 2023-11-29 13:25:58,406 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.06 vs. limit=15.0 2023-11-29 13:26:01,262 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2023-11-29 13:26:08,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3982473.3333333335, ans=0.125 2023-11-29 13:26:10,379 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8200, loss[loss=0.07382, simple_loss=0.1115, pruned_loss=0.01237, audio_tagging_loss=0.0057, over 15797.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.09002, pruned_loss=0.01195, audio_tagging_loss=0.008427, over 3041926.57 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:26:13,905 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:26:14,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3982473.3333333335, ans=0.125 2023-11-29 13:26:43,579 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597400 2023-11-29 13:26:52,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3982673.3333333335, ans=0.0 2023-11-29 13:26:55,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3982673.3333333335, ans=0.125 2023-11-29 13:27:03,617 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.841e+01 9.247e+01 9.743e+01 1.041e+02 1.327e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-29 13:27:10,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3982740.0, ans=0.2 2023-11-29 13:27:12,554 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8250, loss[loss=0.0758, simple_loss=0.1194, pruned_loss=0.009491, audio_tagging_loss=0.006601, over 16454.00 frames. ], tot_loss[loss=0.06466, simple_loss=0.08896, pruned_loss=0.01179, audio_tagging_loss=0.008392, over 3040201.84 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:27:16,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3982806.6666666665, ans=0.125 2023-11-29 13:27:16,857 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.67 vs. limit=10.0 2023-11-29 13:27:20,188 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=15.0 2023-11-29 13:27:24,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3982873.3333333335, ans=0.0 2023-11-29 13:27:24,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3982873.3333333335, ans=0.0 2023-11-29 13:27:29,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3982873.3333333335, ans=0.125 2023-11-29 13:27:45,876 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597450 2023-11-29 13:28:04,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3983073.3333333335, ans=0.0 2023-11-29 13:28:09,916 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=15.0 2023-11-29 13:28:12,930 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8300, loss[loss=0.06732, simple_loss=0.09539, pruned_loss=0.01148, audio_tagging_loss=0.008145, over 16229.00 frames. ], tot_loss[loss=0.06417, simple_loss=0.0882, pruned_loss=0.01166, audio_tagging_loss=0.008405, over 3044459.66 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:28:30,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3983206.6666666665, ans=0.0 2023-11-29 13:28:36,478 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-29 13:28:36,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3983273.3333333335, ans=0.125 2023-11-29 13:28:46,605 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597500 2023-11-29 13:28:51,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3983340.0, ans=10.0 2023-11-29 13:28:52,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3983340.0, ans=0.125 2023-11-29 13:28:59,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3983340.0, ans=0.125 2023-11-29 13:29:05,740 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.020e+01 9.254e+01 9.916e+01 1.057e+02 1.310e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-29 13:29:07,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3983406.6666666665, ans=0.125 2023-11-29 13:29:14,576 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8350, loss[loss=0.05837, simple_loss=0.07808, pruned_loss=0.008178, audio_tagging_loss=0.01115, over 16256.00 frames. ], tot_loss[loss=0.06397, simple_loss=0.08807, pruned_loss=0.01155, audio_tagging_loss=0.008382, over 3041121.00 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:29:34,833 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.53 vs. limit=10.0 2023-11-29 13:29:40,636 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.63 vs. limit=22.5 2023-11-29 13:29:47,134 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597550 2023-11-29 13:29:56,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3983673.3333333335, ans=0.125 2023-11-29 13:30:00,270 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.70 vs. limit=15.0 2023-11-29 13:30:05,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3983740.0, ans=0.125 2023-11-29 13:30:13,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3983740.0, ans=0.2 2023-11-29 13:30:16,372 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8400, loss[loss=0.05486, simple_loss=0.07266, pruned_loss=0.0101, audio_tagging_loss=0.008433, over 14293.00 frames. ], tot_loss[loss=0.0638, simple_loss=0.08779, pruned_loss=0.01151, audio_tagging_loss=0.008394, over 3041470.51 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:30:28,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3983873.3333333335, ans=0.125 2023-11-29 13:30:49,601 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597600 2023-11-29 13:31:00,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3984006.6666666665, ans=0.125 2023-11-29 13:31:10,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 9.074e+01 9.912e+01 1.068e+02 1.273e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-29 13:31:11,296 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.58 vs. limit=22.5 2023-11-29 13:31:17,399 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8450, loss[loss=0.06662, simple_loss=0.09953, pruned_loss=0.01136, audio_tagging_loss=0.005499, over 14567.00 frames. ], tot_loss[loss=0.06362, simple_loss=0.08745, pruned_loss=0.01149, audio_tagging_loss=0.008408, over 3038669.47 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:31:19,483 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=22.5 2023-11-29 13:31:21,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3984140.0, ans=0.1 2023-11-29 13:31:29,400 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-11-29 13:31:35,017 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2023-11-29 13:31:43,155 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=15.0 2023-11-29 13:31:51,342 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597650 2023-11-29 13:31:58,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3984340.0, ans=0.2 2023-11-29 13:32:14,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3984406.6666666665, ans=0.125 2023-11-29 13:32:18,960 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8500, loss[loss=0.06932, simple_loss=0.09905, pruned_loss=0.01141, audio_tagging_loss=0.008375, over 16272.00 frames. ], tot_loss[loss=0.06379, simple_loss=0.08763, pruned_loss=0.01153, audio_tagging_loss=0.008444, over 3041643.34 frames. ], batch size: 59, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:32:21,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3984473.3333333335, ans=0.125 2023-11-29 13:32:36,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3984540.0, ans=0.2 2023-11-29 13:32:51,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3984606.6666666665, ans=6.0 2023-11-29 13:32:53,064 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597700 2023-11-29 13:33:13,724 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.159e+01 8.994e+01 9.750e+01 1.041e+02 1.321e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-29 13:33:14,284 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=12.0 2023-11-29 13:33:18,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3984740.0, ans=0.125 2023-11-29 13:33:20,783 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.49 vs. limit=10.0 2023-11-29 13:33:21,337 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8550, loss[loss=0.06477, simple_loss=0.08758, pruned_loss=0.01294, audio_tagging_loss=0.008046, over 14943.00 frames. ], tot_loss[loss=0.06432, simple_loss=0.08832, pruned_loss=0.01162, audio_tagging_loss=0.008541, over 3047515.55 frames. ], batch size: 54, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:33:29,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3984806.6666666665, ans=0.125 2023-11-29 13:33:53,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3984940.0, ans=0.2 2023-11-29 13:33:54,576 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597750 2023-11-29 13:34:02,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3985006.6666666665, ans=0.1 2023-11-29 13:34:22,925 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8600, loss[loss=0.08772, simple_loss=0.1358, pruned_loss=0.01466, audio_tagging_loss=0.005149, over 15930.00 frames. ], tot_loss[loss=0.06462, simple_loss=0.0888, pruned_loss=0.01162, audio_tagging_loss=0.008597, over 3044811.16 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:34:48,366 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.31 vs. limit=15.0 2023-11-29 13:34:57,437 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597800 2023-11-29 13:35:03,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3985340.0, ans=0.125 2023-11-29 13:35:18,149 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 8.848e+01 9.575e+01 1.022e+02 1.291e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-29 13:35:25,278 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8650, loss[loss=0.055, simple_loss=0.07534, pruned_loss=0.009531, audio_tagging_loss=0.007797, over 14834.00 frames. ], tot_loss[loss=0.06479, simple_loss=0.08879, pruned_loss=0.01172, audio_tagging_loss=0.008683, over 3043069.16 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:35:30,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3985473.3333333335, ans=0.2 2023-11-29 13:35:44,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3985540.0, ans=0.0 2023-11-29 13:35:48,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3985540.0, ans=0.125 2023-11-29 13:35:54,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3985606.6666666665, ans=0.125 2023-11-29 13:35:58,854 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597850 2023-11-29 13:36:26,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3985806.6666666665, ans=0.2 2023-11-29 13:36:27,071 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8700, loss[loss=0.09082, simple_loss=0.1248, pruned_loss=0.02188, audio_tagging_loss=0.006553, over 16121.00 frames. ], tot_loss[loss=0.06559, simple_loss=0.09002, pruned_loss=0.01196, audio_tagging_loss=0.00862, over 3051701.13 frames. ], batch size: 57, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:36:59,824 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597900 2023-11-29 13:37:14,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3986073.3333333335, ans=0.125 2023-11-29 13:37:21,271 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.738e+01 9.317e+01 9.883e+01 1.072e+02 1.210e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-29 13:37:23,185 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2023-11-29 13:37:28,334 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8750, loss[loss=0.06183, simple_loss=0.08626, pruned_loss=0.009477, audio_tagging_loss=0.00922, over 14934.00 frames. ], tot_loss[loss=0.06581, simple_loss=0.09037, pruned_loss=0.01195, audio_tagging_loss=0.00867, over 3049126.50 frames. ], batch size: 56, lr: 1.35e-03, grad_scale: 16.0 2023-11-29 13:37:35,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3986140.0, ans=0.125 2023-11-29 13:37:51,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3986273.3333333335, ans=0.0 2023-11-29 13:38:01,095 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 597950 2023-11-29 13:38:13,463 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2023-11-29 13:38:17,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3986406.6666666665, ans=0.0 2023-11-29 13:38:20,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3986406.6666666665, ans=0.125 2023-11-29 13:38:29,591 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8800, loss[loss=0.068, simple_loss=0.0957, pruned_loss=0.01183, audio_tagging_loss=0.008317, over 15733.00 frames. ], tot_loss[loss=0.06612, simple_loss=0.09058, pruned_loss=0.01206, audio_tagging_loss=0.008769, over 3046692.36 frames. ], batch size: 60, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:38:34,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3986473.3333333335, ans=0.1 2023-11-29 13:38:35,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3986473.3333333335, ans=0.125 2023-11-29 13:38:42,722 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.66 vs. limit=15.0 2023-11-29 13:38:59,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3986606.6666666665, ans=0.125 2023-11-29 13:39:02,945 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598000 2023-11-29 13:39:03,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3986606.6666666665, ans=0.125 2023-11-29 13:39:23,605 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.845e+01 9.465e+01 1.025e+02 1.121e+02 1.304e+02, threshold=2.051e+02, percent-clipped=0.0 2023-11-29 13:39:31,204 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8850, loss[loss=0.06298, simple_loss=0.08044, pruned_loss=0.01093, audio_tagging_loss=0.01183, over 15243.00 frames. ], tot_loss[loss=0.06622, simple_loss=0.09092, pruned_loss=0.01202, audio_tagging_loss=0.008741, over 3052283.10 frames. ], batch size: 58, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:39:42,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3986873.3333333335, ans=0.125 2023-11-29 13:39:46,120 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:39:48,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3986873.3333333335, ans=0.0 2023-11-29 13:39:48,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3986873.3333333335, ans=0.125 2023-11-29 13:39:49,076 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.22 vs. limit=15.0 2023-11-29 13:39:50,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3986873.3333333335, ans=0.04949747468305833 2023-11-29 13:40:00,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3986940.0, ans=0.125 2023-11-29 13:40:04,284 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598050 2023-11-29 13:40:04,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3986940.0, ans=0.0 2023-11-29 13:40:10,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3987006.6666666665, ans=0.125 2023-11-29 13:40:25,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3987073.3333333335, ans=0.125 2023-11-29 13:40:32,892 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8900, loss[loss=0.04097, simple_loss=0.05063, pruned_loss=0.004951, audio_tagging_loss=0.01071, over 15589.00 frames. ], tot_loss[loss=0.06522, simple_loss=0.08979, pruned_loss=0.01173, audio_tagging_loss=0.008595, over 3048785.06 frames. ], batch size: 62, lr: 1.35e-03, grad_scale: 32.0 2023-11-29 13:40:55,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3987273.3333333335, ans=0.125 2023-11-29 13:41:05,589 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598100 2023-11-29 13:41:08,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3987340.0, ans=0.125 2023-11-29 13:41:09,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.01 vs. limit=15.0 2023-11-29 13:41:26,647 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 9.187e+01 9.849e+01 1.048e+02 1.202e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-29 13:41:32,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3987406.6666666665, ans=0.0 2023-11-29 13:41:34,327 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 8950, loss[loss=0.06948, simple_loss=0.1039, pruned_loss=0.01218, audio_tagging_loss=0.005364, over 14570.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08927, pruned_loss=0.01172, audio_tagging_loss=0.008484, over 3049954.79 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 13:41:41,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3987473.3333333335, ans=0.125 2023-11-29 13:41:52,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3987540.0, ans=0.07 2023-11-29 13:42:07,473 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598150 2023-11-29 13:42:09,033 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=22.5 2023-11-29 13:42:15,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3987673.3333333335, ans=0.0 2023-11-29 13:42:25,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3987740.0, ans=0.2 2023-11-29 13:42:26,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3987740.0, ans=0.2 2023-11-29 13:42:30,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3987740.0, ans=0.125 2023-11-29 13:42:35,624 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9000, loss[loss=0.05538, simple_loss=0.06785, pruned_loss=0.01052, audio_tagging_loss=0.01093, over 15067.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08973, pruned_loss=0.01178, audio_tagging_loss=0.008376, over 3054930.61 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:42:35,625 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 13:43:16,194 INFO [train_asr.py:1267] (2/4) Epoch 50, validation: loss=0.05899, simple_loss=0.05036, pruned_loss=0.005383, audio_tagging_loss=0.02843, over 4681554.00 frames. 2023-11-29 13:43:16,194 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 13:43:34,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3987873.3333333335, ans=0.1 2023-11-29 13:43:39,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3987940.0, ans=0.2 2023-11-29 13:43:40,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3987940.0, ans=0.0 2023-11-29 13:43:40,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3987940.0, ans=0.125 2023-11-29 13:43:43,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3987940.0, ans=0.1 2023-11-29 13:43:49,533 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598200 2023-11-29 13:44:04,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3988073.3333333335, ans=0.125 2023-11-29 13:44:05,277 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2023-11-29 13:44:12,248 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 9.277e+01 9.847e+01 1.044e+02 1.354e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-29 13:44:12,737 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=12.0 2023-11-29 13:44:18,115 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9050, loss[loss=0.05721, simple_loss=0.07797, pruned_loss=0.01037, audio_tagging_loss=0.007865, over 16142.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08947, pruned_loss=0.01163, audio_tagging_loss=0.008337, over 3051358.39 frames. ], batch size: 61, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:44:41,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3988273.3333333335, ans=0.0 2023-11-29 13:44:50,783 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598250 2023-11-29 13:45:20,157 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9100, loss[loss=0.07538, simple_loss=0.1072, pruned_loss=0.01511, audio_tagging_loss=0.006668, over 15362.00 frames. ], tot_loss[loss=0.06471, simple_loss=0.08932, pruned_loss=0.01175, audio_tagging_loss=0.008296, over 3056965.49 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:45:34,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3988540.0, ans=0.2 2023-11-29 13:45:54,248 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598300 2023-11-29 13:45:54,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3988606.6666666665, ans=0.125 2023-11-29 13:46:14,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3988740.0, ans=0.1 2023-11-29 13:46:16,761 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 9.247e+01 9.830e+01 1.081e+02 1.321e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-29 13:46:21,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3988806.6666666665, ans=0.125 2023-11-29 13:46:22,625 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9150, loss[loss=0.05146, simple_loss=0.07157, pruned_loss=0.008002, audio_tagging_loss=0.007673, over 15048.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.08881, pruned_loss=0.01175, audio_tagging_loss=0.008338, over 3053220.44 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:46:56,375 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598350 2023-11-29 13:47:25,257 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9200, loss[loss=0.0602, simple_loss=0.07717, pruned_loss=0.01416, audio_tagging_loss=0.007455, over 14750.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08863, pruned_loss=0.01179, audio_tagging_loss=0.008391, over 3057526.04 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 13:47:31,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3989140.0, ans=0.1 2023-11-29 13:47:49,333 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=15.0 2023-11-29 13:47:57,967 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598400 2023-11-29 13:48:11,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3989340.0, ans=0.125 2023-11-29 13:48:18,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3989406.6666666665, ans=0.05 2023-11-29 13:48:21,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.113e+01 8.841e+01 9.563e+01 1.025e+02 1.500e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 13:48:25,990 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9250, loss[loss=0.05436, simple_loss=0.08219, pruned_loss=0.004355, audio_tagging_loss=0.008906, over 15724.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08899, pruned_loss=0.0118, audio_tagging_loss=0.00836, over 3055358.48 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:48:26,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3989473.3333333335, ans=0.0 2023-11-29 13:48:34,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3989473.3333333335, ans=0.0 2023-11-29 13:48:40,947 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=12.0 2023-11-29 13:48:53,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3989606.6666666665, ans=0.125 2023-11-29 13:48:58,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3989606.6666666665, ans=0.125 2023-11-29 13:49:00,758 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598450 2023-11-29 13:49:28,970 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9300, loss[loss=0.06934, simple_loss=0.09666, pruned_loss=0.01283, audio_tagging_loss=0.008175, over 14066.00 frames. ], tot_loss[loss=0.0647, simple_loss=0.08891, pruned_loss=0.01183, audio_tagging_loss=0.00841, over 3048510.34 frames. ], batch size: 53, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:49:31,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3989806.6666666665, ans=0.125 2023-11-29 13:49:33,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3989806.6666666665, ans=0.05 2023-11-29 13:49:35,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3989806.6666666665, ans=0.04949747468305833 2023-11-29 13:49:36,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3989806.6666666665, ans=0.04949747468305833 2023-11-29 13:49:38,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3989806.6666666665, ans=0.125 2023-11-29 13:49:44,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3989873.3333333335, ans=0.125 2023-11-29 13:49:47,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3989873.3333333335, ans=0.04949747468305833 2023-11-29 13:50:02,593 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598500 2023-11-29 13:50:13,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3990006.6666666665, ans=0.125 2023-11-29 13:50:15,071 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:50:18,592 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:50:21,102 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:50:25,502 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.873e+01 9.186e+01 9.880e+01 1.044e+02 1.365e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-29 13:50:30,897 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9350, loss[loss=0.0675, simple_loss=0.08132, pruned_loss=0.01438, audio_tagging_loss=0.01246, over 14544.00 frames. ], tot_loss[loss=0.06431, simple_loss=0.08827, pruned_loss=0.0117, audio_tagging_loss=0.008478, over 3043189.10 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:50:36,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3990140.0, ans=0.125 2023-11-29 13:50:36,858 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2023-11-29 13:50:38,084 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.38 vs. limit=22.5 2023-11-29 13:50:44,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3990206.6666666665, ans=0.125 2023-11-29 13:50:44,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3990206.6666666665, ans=0.125 2023-11-29 13:51:04,910 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598550 2023-11-29 13:51:33,151 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2023-11-29 13:51:33,505 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9400, loss[loss=0.09299, simple_loss=0.1286, pruned_loss=0.01944, audio_tagging_loss=0.009254, over 16667.00 frames. ], tot_loss[loss=0.06506, simple_loss=0.08933, pruned_loss=0.01187, audio_tagging_loss=0.008527, over 3044094.82 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:51:51,796 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.38 vs. limit=12.0 2023-11-29 13:51:52,887 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=15.0 2023-11-29 13:51:53,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3990540.0, ans=0.125 2023-11-29 13:51:59,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3990606.6666666665, ans=0.0 2023-11-29 13:52:06,956 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598600 2023-11-29 13:52:28,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3990740.0, ans=0.0 2023-11-29 13:52:30,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3990740.0, ans=0.125 2023-11-29 13:52:31,032 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.737e+01 9.140e+01 9.680e+01 1.042e+02 1.691e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-29 13:52:35,710 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9450, loss[loss=0.04865, simple_loss=0.07037, pruned_loss=0.006358, audio_tagging_loss=0.007106, over 15611.00 frames. ], tot_loss[loss=0.06524, simple_loss=0.08964, pruned_loss=0.01183, audio_tagging_loss=0.008581, over 3054774.89 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:52:36,904 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 13:52:54,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3990873.3333333335, ans=0.04949747468305833 2023-11-29 13:52:55,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3990873.3333333335, ans=0.1 2023-11-29 13:53:08,746 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598650 2023-11-29 13:53:25,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3991073.3333333335, ans=0.0 2023-11-29 13:53:37,316 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9500, loss[loss=0.05945, simple_loss=0.08565, pruned_loss=0.009736, audio_tagging_loss=0.006884, over 15175.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08946, pruned_loss=0.01183, audio_tagging_loss=0.008574, over 3050931.76 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:53:42,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3991140.0, ans=0.0 2023-11-29 13:53:58,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3991206.6666666665, ans=0.125 2023-11-29 13:54:10,569 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598700 2023-11-29 13:54:33,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.161e+01 9.773e+01 1.049e+02 1.216e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-29 13:54:34,175 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=22.5 2023-11-29 13:54:34,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3991406.6666666665, ans=0.0 2023-11-29 13:54:38,475 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9550, loss[loss=0.06963, simple_loss=0.09537, pruned_loss=0.01255, audio_tagging_loss=0.009393, over 15832.00 frames. ], tot_loss[loss=0.06548, simple_loss=0.08983, pruned_loss=0.01188, audio_tagging_loss=0.008687, over 3048913.37 frames. ], batch size: 60, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:54:42,229 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 13:54:56,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3991540.0, ans=0.125 2023-11-29 13:55:12,024 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598750 2023-11-29 13:55:16,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2023-11-29 13:55:17,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3991673.3333333335, ans=0.2 2023-11-29 13:55:40,324 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9600, loss[loss=0.07379, simple_loss=0.1075, pruned_loss=0.01181, audio_tagging_loss=0.008234, over 15090.00 frames. ], tot_loss[loss=0.06564, simple_loss=0.08982, pruned_loss=0.01196, audio_tagging_loss=0.008777, over 3052808.32 frames. ], batch size: 53, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:56:12,507 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598800 2023-11-29 13:56:26,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3992006.6666666665, ans=0.125 2023-11-29 13:56:37,808 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.152e+01 9.142e+01 9.552e+01 1.023e+02 1.315e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-29 13:56:41,288 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9650, loss[loss=0.07594, simple_loss=0.1066, pruned_loss=0.01548, audio_tagging_loss=0.007184, over 15336.00 frames. ], tot_loss[loss=0.06563, simple_loss=0.08985, pruned_loss=0.012, audio_tagging_loss=0.008709, over 3049482.03 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 13:56:45,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3992140.0, ans=0.0 2023-11-29 13:56:48,389 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2023-11-29 13:57:00,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3992206.6666666665, ans=0.0 2023-11-29 13:57:06,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3992273.3333333335, ans=0.125 2023-11-29 13:57:15,238 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598850 2023-11-29 13:57:18,133 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2023-11-29 13:57:42,296 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9700, loss[loss=0.0609, simple_loss=0.08765, pruned_loss=0.0079, audio_tagging_loss=0.009173, over 15275.00 frames. ], tot_loss[loss=0.06521, simple_loss=0.08948, pruned_loss=0.01191, audio_tagging_loss=0.008566, over 3052395.88 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 13:58:02,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3992540.0, ans=0.125 2023-11-29 13:58:02,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.51 vs. limit=22.5 2023-11-29 13:58:04,770 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.05 vs. limit=22.5 2023-11-29 13:58:06,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3992606.6666666665, ans=0.1 2023-11-29 13:58:15,968 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598900 2023-11-29 13:58:17,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3992606.6666666665, ans=0.1 2023-11-29 13:58:23,285 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.51 vs. limit=22.5 2023-11-29 13:58:23,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3992673.3333333335, ans=0.125 2023-11-29 13:58:25,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3992673.3333333335, ans=0.2 2023-11-29 13:58:36,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3992740.0, ans=0.5 2023-11-29 13:58:41,705 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 9.286e+01 9.993e+01 1.057e+02 1.299e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-29 13:58:44,661 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9750, loss[loss=0.06143, simple_loss=0.08248, pruned_loss=0.01068, audio_tagging_loss=0.009505, over 14863.00 frames. ], tot_loss[loss=0.06478, simple_loss=0.08905, pruned_loss=0.0118, audio_tagging_loss=0.00845, over 3050839.15 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 13:58:46,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3992806.6666666665, ans=0.0 2023-11-29 13:58:53,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3992806.6666666665, ans=0.2 2023-11-29 13:58:59,193 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2023-11-29 13:59:01,587 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2023-11-29 13:59:05,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3992873.3333333335, ans=0.125 2023-11-29 13:59:11,590 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.04 vs. limit=15.0 2023-11-29 13:59:15,183 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=22.5 2023-11-29 13:59:16,803 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 598950 2023-11-29 13:59:26,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3993006.6666666665, ans=0.1 2023-11-29 13:59:44,646 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9800, loss[loss=0.04843, simple_loss=0.06915, pruned_loss=0.006985, audio_tagging_loss=0.006868, over 15248.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08889, pruned_loss=0.01179, audio_tagging_loss=0.008391, over 3046706.14 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 13:59:57,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3993206.6666666665, ans=0.125 2023-11-29 14:00:18,748 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599000 2023-11-29 14:00:24,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3993340.0, ans=0.125 2023-11-29 14:00:43,059 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:00:44,118 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.931e+01 9.016e+01 9.640e+01 1.052e+02 1.257e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-29 14:00:46,482 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9850, loss[loss=0.06819, simple_loss=0.09329, pruned_loss=0.01265, audio_tagging_loss=0.008894, over 15147.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.0889, pruned_loss=0.01174, audio_tagging_loss=0.008352, over 3047720.15 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:01:07,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3993540.0, ans=10.0 2023-11-29 14:01:16,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3993606.6666666665, ans=0.125 2023-11-29 14:01:17,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3993606.6666666665, ans=0.125 2023-11-29 14:01:20,031 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599050 2023-11-29 14:01:42,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3993740.0, ans=0.0 2023-11-29 14:01:47,699 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9900, loss[loss=0.05227, simple_loss=0.06968, pruned_loss=0.008916, audio_tagging_loss=0.00851, over 16957.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08905, pruned_loss=0.01167, audio_tagging_loss=0.008295, over 3052945.30 frames. ], batch size: 66, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:01:52,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3993806.6666666665, ans=0.125 2023-11-29 14:01:53,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3993806.6666666665, ans=0.2 2023-11-29 14:01:53,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.15 vs. limit=12.0 2023-11-29 14:02:01,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3993873.3333333335, ans=0.1 2023-11-29 14:02:19,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3993940.0, ans=0.2 2023-11-29 14:02:19,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3993940.0, ans=0.125 2023-11-29 14:02:20,693 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599100 2023-11-29 14:02:26,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3994006.6666666665, ans=0.035 2023-11-29 14:02:34,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3994006.6666666665, ans=0.0 2023-11-29 14:02:46,876 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.073e+01 9.233e+01 9.702e+01 1.026e+02 1.352e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-29 14:02:49,341 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 9950, loss[loss=0.05836, simple_loss=0.08907, pruned_loss=0.007709, audio_tagging_loss=0.006119, over 14273.00 frames. ], tot_loss[loss=0.06413, simple_loss=0.08854, pruned_loss=0.01153, audio_tagging_loss=0.008335, over 3052179.98 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:02:57,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3994140.0, ans=0.125 2023-11-29 14:03:22,168 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599150 2023-11-29 14:03:25,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3994340.0, ans=0.0 2023-11-29 14:03:28,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2023-11-29 14:03:34,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3994340.0, ans=0.2 2023-11-29 14:03:51,248 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10000, loss[loss=0.07314, simple_loss=0.1095, pruned_loss=0.01149, audio_tagging_loss=0.006891, over 16566.00 frames. ], tot_loss[loss=0.06376, simple_loss=0.08786, pruned_loss=0.01151, audio_tagging_loss=0.008323, over 3042028.75 frames. ], batch size: 59, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:03:51,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3994473.3333333335, ans=0.95 2023-11-29 14:04:09,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3994540.0, ans=0.0 2023-11-29 14:04:15,445 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-11-29 14:04:24,860 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599200 2023-11-29 14:04:43,091 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2023-11-29 14:04:45,434 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=15.0 2023-11-29 14:04:50,873 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.059e+01 9.244e+01 9.877e+01 1.049e+02 1.463e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-29 14:04:53,629 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10050, loss[loss=0.04833, simple_loss=0.05535, pruned_loss=0.00729, audio_tagging_loss=0.01336, over 13094.00 frames. ], tot_loss[loss=0.06349, simple_loss=0.0873, pruned_loss=0.01147, audio_tagging_loss=0.008378, over 3041689.32 frames. ], batch size: 53, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:05:01,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3994806.6666666665, ans=0.125 2023-11-29 14:05:22,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3994940.0, ans=0.125 2023-11-29 14:05:22,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3994940.0, ans=0.0 2023-11-29 14:05:27,127 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599250 2023-11-29 14:05:36,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3995006.6666666665, ans=0.0 2023-11-29 14:05:47,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3995073.3333333335, ans=0.125 2023-11-29 14:05:47,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3995073.3333333335, ans=15.0 2023-11-29 14:05:56,135 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10100, loss[loss=0.06382, simple_loss=0.08311, pruned_loss=0.0116, audio_tagging_loss=0.01066, over 14781.00 frames. ], tot_loss[loss=0.06419, simple_loss=0.08819, pruned_loss=0.01172, audio_tagging_loss=0.008374, over 3042138.81 frames. ], batch size: 59, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:05:57,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3995140.0, ans=0.0 2023-11-29 14:05:58,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3995140.0, ans=0.0 2023-11-29 14:06:09,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3995206.6666666665, ans=10.0 2023-11-29 14:06:29,034 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599300 2023-11-29 14:06:36,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3995340.0, ans=0.125 2023-11-29 14:06:45,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3995406.6666666665, ans=0.0 2023-11-29 14:06:48,872 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:06:49,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3995406.6666666665, ans=0.0 2023-11-29 14:06:54,612 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.067e+01 9.808e+01 1.052e+02 1.257e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 14:06:57,691 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10150, loss[loss=0.07361, simple_loss=0.1039, pruned_loss=0.01613, audio_tagging_loss=0.005521, over 14958.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08869, pruned_loss=0.01174, audio_tagging_loss=0.008457, over 3047001.08 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:07:00,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3995473.3333333335, ans=0.125 2023-11-29 14:07:00,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3995473.3333333335, ans=0.0 2023-11-29 14:07:20,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3995540.0, ans=0.2 2023-11-29 14:07:29,261 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:07:31,194 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599350 2023-11-29 14:07:43,825 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-29 14:07:57,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3995806.6666666665, ans=0.125 2023-11-29 14:07:58,621 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10200, loss[loss=0.07904, simple_loss=0.1095, pruned_loss=0.01686, audio_tagging_loss=0.007444, over 15524.00 frames. ], tot_loss[loss=0.0651, simple_loss=0.08969, pruned_loss=0.01183, audio_tagging_loss=0.008427, over 3048268.11 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:07:59,093 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.55 vs. limit=15.0 2023-11-29 14:08:00,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3995806.6666666665, ans=0.2 2023-11-29 14:08:03,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3995806.6666666665, ans=0.07 2023-11-29 14:08:05,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3995806.6666666665, ans=0.0 2023-11-29 14:08:10,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3995873.3333333335, ans=0.2 2023-11-29 14:08:25,165 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:08:30,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3995940.0, ans=0.125 2023-11-29 14:08:32,955 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599400 2023-11-29 14:08:43,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3996006.6666666665, ans=0.125 2023-11-29 14:08:59,151 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.884e+01 9.205e+01 9.737e+01 1.021e+02 2.393e+02, threshold=1.947e+02, percent-clipped=1.0 2023-11-29 14:09:01,481 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10250, loss[loss=0.04181, simple_loss=0.05475, pruned_loss=0.007762, audio_tagging_loss=0.006672, over 16699.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08934, pruned_loss=0.01197, audio_tagging_loss=0.008491, over 3049434.83 frames. ], batch size: 64, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:09:17,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3996206.6666666665, ans=0.125 2023-11-29 14:09:19,215 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.01 vs. limit=15.0 2023-11-29 14:09:33,761 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599450 2023-11-29 14:09:47,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3996340.0, ans=0.125 2023-11-29 14:09:55,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3996406.6666666665, ans=0.125 2023-11-29 14:10:03,466 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10300, loss[loss=0.07079, simple_loss=0.1031, pruned_loss=0.01299, audio_tagging_loss=0.006228, over 15300.00 frames. ], tot_loss[loss=0.06494, simple_loss=0.08923, pruned_loss=0.01172, audio_tagging_loss=0.0086, over 3054373.12 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:10:25,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3996540.0, ans=0.1 2023-11-29 14:10:38,052 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599500 2023-11-29 14:10:55,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3996740.0, ans=0.0 2023-11-29 14:11:02,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3996740.0, ans=0.0 2023-11-29 14:11:03,592 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.634e+01 9.141e+01 9.728e+01 1.059e+02 1.776e+02, threshold=1.946e+02, percent-clipped=0.0 2023-11-29 14:11:06,027 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10350, loss[loss=0.07597, simple_loss=0.1059, pruned_loss=0.01629, audio_tagging_loss=0.006736, over 15909.00 frames. ], tot_loss[loss=0.06546, simple_loss=0.09, pruned_loss=0.0119, audio_tagging_loss=0.008554, over 3058310.41 frames. ], batch size: 59, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:11:26,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3996873.3333333335, ans=0.125 2023-11-29 14:11:27,981 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.67 vs. limit=22.5 2023-11-29 14:11:38,097 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2023-11-29 14:11:40,513 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599550 2023-11-29 14:11:58,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3997073.3333333335, ans=0.125 2023-11-29 14:12:08,461 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10400, loss[loss=0.04355, simple_loss=0.05745, pruned_loss=0.004692, audio_tagging_loss=0.01013, over 13945.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.08968, pruned_loss=0.01193, audio_tagging_loss=0.008723, over 3052067.65 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:12:10,099 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2023-11-29 14:12:20,995 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2023-11-29 14:12:38,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3997273.3333333335, ans=0.0 2023-11-29 14:12:42,190 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599600 2023-11-29 14:12:50,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3997340.0, ans=0.1 2023-11-29 14:13:08,427 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 9.101e+01 9.809e+01 1.030e+02 1.340e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 14:13:10,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3997473.3333333335, ans=0.1 2023-11-29 14:13:10,935 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10450, loss[loss=0.05843, simple_loss=0.07519, pruned_loss=0.01014, audio_tagging_loss=0.0107, over 14818.00 frames. ], tot_loss[loss=0.06558, simple_loss=0.08995, pruned_loss=0.01192, audio_tagging_loss=0.008689, over 3048857.96 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:13:14,107 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.93 vs. limit=15.0 2023-11-29 14:13:18,379 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.24 vs. limit=10.0 2023-11-29 14:13:37,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3997606.6666666665, ans=0.0 2023-11-29 14:13:44,788 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599650 2023-11-29 14:13:46,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3997606.6666666665, ans=0.125 2023-11-29 14:13:47,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3997673.3333333335, ans=0.1 2023-11-29 14:13:47,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3997673.3333333335, ans=0.1 2023-11-29 14:13:53,579 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.83 vs. limit=22.5 2023-11-29 14:14:12,917 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10500, loss[loss=0.06875, simple_loss=0.0947, pruned_loss=0.01271, audio_tagging_loss=0.008688, over 15118.00 frames. ], tot_loss[loss=0.06538, simple_loss=0.08966, pruned_loss=0.01193, audio_tagging_loss=0.008628, over 3051714.98 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:14:39,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3997940.0, ans=0.125 2023-11-29 14:14:43,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3997940.0, ans=0.1 2023-11-29 14:14:45,941 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599700 2023-11-29 14:15:10,884 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.97 vs. limit=15.0 2023-11-29 14:15:14,109 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.802e+01 9.178e+01 9.890e+01 1.072e+02 1.434e+02, threshold=1.978e+02, percent-clipped=0.0 2023-11-29 14:15:15,315 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10550, loss[loss=0.05377, simple_loss=0.07089, pruned_loss=0.007991, audio_tagging_loss=0.01033, over 16254.00 frames. ], tot_loss[loss=0.06519, simple_loss=0.0897, pruned_loss=0.01187, audio_tagging_loss=0.008473, over 3052796.16 frames. ], batch size: 61, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:15:47,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3998273.3333333335, ans=0.125 2023-11-29 14:15:48,321 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599750 2023-11-29 14:16:16,364 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10600, loss[loss=0.06103, simple_loss=0.08894, pruned_loss=0.01028, audio_tagging_loss=0.00628, over 15091.00 frames. ], tot_loss[loss=0.06552, simple_loss=0.09025, pruned_loss=0.01198, audio_tagging_loss=0.008412, over 3050991.13 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:16:23,481 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.71 vs. limit=6.0 2023-11-29 14:16:27,001 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2023-11-29 14:16:45,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3998606.6666666665, ans=0.125 2023-11-29 14:16:50,354 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599800 2023-11-29 14:16:54,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3998673.3333333335, ans=0.0 2023-11-29 14:16:55,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3998673.3333333335, ans=0.0 2023-11-29 14:17:00,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3998673.3333333335, ans=0.1 2023-11-29 14:17:00,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3998673.3333333335, ans=0.125 2023-11-29 14:17:17,862 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.533e+01 8.938e+01 9.773e+01 1.042e+02 1.293e+02, threshold=1.955e+02, percent-clipped=0.0 2023-11-29 14:17:19,080 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10650, loss[loss=0.03685, simple_loss=0.04124, pruned_loss=0.003857, audio_tagging_loss=0.01238, over 15925.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08888, pruned_loss=0.01179, audio_tagging_loss=0.008496, over 3042761.74 frames. ], batch size: 65, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:17:21,943 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.78 vs. limit=15.0 2023-11-29 14:17:23,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=3998806.6666666665, ans=12.0 2023-11-29 14:17:41,628 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=22.5 2023-11-29 14:17:51,888 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599850 2023-11-29 14:18:07,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3999073.3333333335, ans=0.07 2023-11-29 14:18:18,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3999073.3333333335, ans=0.0 2023-11-29 14:18:20,950 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10700, loss[loss=0.06317, simple_loss=0.0871, pruned_loss=0.008905, audio_tagging_loss=0.01072, over 15773.00 frames. ], tot_loss[loss=0.06429, simple_loss=0.08824, pruned_loss=0.0116, audio_tagging_loss=0.008565, over 3043243.68 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:18:25,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3999140.0, ans=0.0 2023-11-29 14:18:29,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3999140.0, ans=0.1 2023-11-29 14:18:35,505 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.32 vs. limit=15.0 2023-11-29 14:18:48,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3999273.3333333335, ans=0.0 2023-11-29 14:18:49,557 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.26 vs. limit=15.0 2023-11-29 14:18:54,340 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599900 2023-11-29 14:18:55,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3999273.3333333335, ans=0.2 2023-11-29 14:19:07,409 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.59 vs. limit=22.5 2023-11-29 14:19:09,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3999406.6666666665, ans=0.2 2023-11-29 14:19:19,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3999406.6666666665, ans=0.1 2023-11-29 14:19:21,105 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.715e+01 9.059e+01 9.669e+01 1.032e+02 1.449e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-29 14:19:22,376 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10750, loss[loss=0.05841, simple_loss=0.08066, pruned_loss=0.01129, audio_tagging_loss=0.006782, over 16108.00 frames. ], tot_loss[loss=0.06455, simple_loss=0.08874, pruned_loss=0.01166, audio_tagging_loss=0.008523, over 3052071.26 frames. ], batch size: 59, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:19:36,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3999540.0, ans=0.1 2023-11-29 14:19:40,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3999540.0, ans=0.2 2023-11-29 14:19:45,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.37 vs. limit=15.0 2023-11-29 14:19:47,878 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.98 vs. limit=15.0 2023-11-29 14:19:49,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3999606.6666666665, ans=0.125 2023-11-29 14:19:55,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3999606.6666666665, ans=0.0 2023-11-29 14:19:56,686 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 599950 2023-11-29 14:20:01,806 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2023-11-29 14:20:10,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3999740.0, ans=0.2 2023-11-29 14:20:15,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3999740.0, ans=0.1 2023-11-29 14:20:19,426 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:20:23,911 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10800, loss[loss=0.05876, simple_loss=0.0823, pruned_loss=0.009907, audio_tagging_loss=0.007701, over 13956.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08925, pruned_loss=0.0117, audio_tagging_loss=0.008413, over 3053211.90 frames. ], batch size: 53, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:20:26,841 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=12.0 2023-11-29 14:20:38,424 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.70 vs. limit=15.0 2023-11-29 14:20:40,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3999873.3333333335, ans=15.0 2023-11-29 14:20:42,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3999873.3333333335, ans=0.125 2023-11-29 14:20:45,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3999873.3333333335, ans=0.0 2023-11-29 14:20:46,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3999873.3333333335, ans=0.0 2023-11-29 14:20:51,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3999940.0, ans=0.2 2023-11-29 14:20:56,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3999940.0, ans=0.125 2023-11-29 14:20:57,293 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600000 2023-11-29 14:21:13,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4000006.6666666665, ans=0.035 2023-11-29 14:21:20,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4000073.3333333335, ans=0.125 2023-11-29 14:21:28,897 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 9.037e+01 9.839e+01 1.048e+02 1.354e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-29 14:21:28,928 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10850, loss[loss=0.05549, simple_loss=0.07625, pruned_loss=0.01016, audio_tagging_loss=0.0072, over 15462.00 frames. ], tot_loss[loss=0.06434, simple_loss=0.08873, pruned_loss=0.01161, audio_tagging_loss=0.008369, over 3055152.42 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:21:30,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4000140.0, ans=0.1 2023-11-29 14:21:40,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4000206.6666666665, ans=0.1 2023-11-29 14:22:02,288 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600050 2023-11-29 14:22:24,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4000406.6666666665, ans=0.0 2023-11-29 14:22:30,818 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10900, loss[loss=0.06123, simple_loss=0.08127, pruned_loss=0.0122, audio_tagging_loss=0.008391, over 14937.00 frames. ], tot_loss[loss=0.06515, simple_loss=0.08962, pruned_loss=0.01189, audio_tagging_loss=0.008452, over 3053932.95 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:22:30,856 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:22:44,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4000540.0, ans=0.2 2023-11-29 14:22:49,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4000540.0, ans=0.125 2023-11-29 14:22:57,644 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:23:03,933 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600100 2023-11-29 14:23:04,078 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:23:05,461 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-29 14:23:28,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4000740.0, ans=0.125 2023-11-29 14:23:32,232 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.967e+01 9.295e+01 9.770e+01 1.037e+02 1.464e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-29 14:23:32,263 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 10950, loss[loss=0.07444, simple_loss=0.1058, pruned_loss=0.01701, audio_tagging_loss=0.004552, over 15175.00 frames. ], tot_loss[loss=0.06523, simple_loss=0.08967, pruned_loss=0.01195, audio_tagging_loss=0.008444, over 3055682.35 frames. ], batch size: 55, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:23:37,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4000806.6666666665, ans=0.125 2023-11-29 14:24:01,406 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.88 vs. limit=10.0 2023-11-29 14:24:05,711 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600150 2023-11-29 14:24:34,447 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11000, loss[loss=0.05774, simple_loss=0.07544, pruned_loss=0.009019, audio_tagging_loss=0.011, over 16358.00 frames. ], tot_loss[loss=0.06518, simple_loss=0.08942, pruned_loss=0.01194, audio_tagging_loss=0.008536, over 3064012.21 frames. ], batch size: 62, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:24:48,120 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:24:51,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4001206.6666666665, ans=0.0 2023-11-29 14:25:07,583 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600200 2023-11-29 14:25:14,322 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.81 vs. limit=10.0 2023-11-29 14:25:36,585 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.589e+01 8.890e+01 9.536e+01 1.018e+02 1.298e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-29 14:25:36,616 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11050, loss[loss=0.08418, simple_loss=0.1223, pruned_loss=0.01679, audio_tagging_loss=0.006234, over 15569.00 frames. ], tot_loss[loss=0.06529, simple_loss=0.08962, pruned_loss=0.01188, audio_tagging_loss=0.008598, over 3067830.44 frames. ], batch size: 60, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:25:55,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4001540.0, ans=0.2 2023-11-29 14:26:04,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4001606.6666666665, ans=0.0 2023-11-29 14:26:07,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4001606.6666666665, ans=0.125 2023-11-29 14:26:09,882 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600250 2023-11-29 14:26:25,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=22.5 2023-11-29 14:26:27,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4001740.0, ans=0.125 2023-11-29 14:26:28,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4001740.0, ans=0.1 2023-11-29 14:26:30,099 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.38 vs. limit=10.0 2023-11-29 14:26:31,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4001740.0, ans=0.0 2023-11-29 14:26:33,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4001740.0, ans=0.1 2023-11-29 14:26:38,315 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11100, loss[loss=0.07886, simple_loss=0.1111, pruned_loss=0.01558, audio_tagging_loss=0.007743, over 16264.00 frames. ], tot_loss[loss=0.06539, simple_loss=0.08955, pruned_loss=0.01196, audio_tagging_loss=0.008657, over 3067477.98 frames. ], batch size: 58, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:26:38,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4001806.6666666665, ans=0.125 2023-11-29 14:26:40,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4001806.6666666665, ans=0.125 2023-11-29 14:27:00,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4001873.3333333335, ans=0.125 2023-11-29 14:27:11,829 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600300 2023-11-29 14:27:40,197 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 9.265e+01 9.785e+01 1.064e+02 1.288e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-29 14:27:40,227 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11150, loss[loss=0.06028, simple_loss=0.07738, pruned_loss=0.01139, audio_tagging_loss=0.0102, over 15678.00 frames. ], tot_loss[loss=0.06513, simple_loss=0.08888, pruned_loss=0.01192, audio_tagging_loss=0.008772, over 3059052.97 frames. ], batch size: 59, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:28:01,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4002206.6666666665, ans=0.0 2023-11-29 14:28:05,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4002273.3333333335, ans=0.0 2023-11-29 14:28:06,752 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.35 vs. limit=22.5 2023-11-29 14:28:13,819 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600350 2023-11-29 14:28:29,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4002406.6666666665, ans=0.125 2023-11-29 14:28:30,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4002406.6666666665, ans=0.2 2023-11-29 14:28:41,724 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11200, loss[loss=0.07647, simple_loss=0.1115, pruned_loss=0.01462, audio_tagging_loss=0.006086, over 17141.00 frames. ], tot_loss[loss=0.06549, simple_loss=0.0893, pruned_loss=0.012, audio_tagging_loss=0.008842, over 3050717.27 frames. ], batch size: 60, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:29:14,756 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600400 2023-11-29 14:29:16,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4002606.6666666665, ans=0.125 2023-11-29 14:29:18,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4002673.3333333335, ans=0.125 2023-11-29 14:29:43,047 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.595e+01 9.140e+01 9.616e+01 1.036e+02 1.306e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-29 14:29:43,077 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11250, loss[loss=0.05393, simple_loss=0.07915, pruned_loss=0.006272, audio_tagging_loss=0.008082, over 15176.00 frames. ], tot_loss[loss=0.06533, simple_loss=0.08919, pruned_loss=0.01194, audio_tagging_loss=0.008798, over 3055389.03 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 32.0 2023-11-29 14:29:46,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4002806.6666666665, ans=0.125 2023-11-29 14:29:56,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4002873.3333333335, ans=0.0 2023-11-29 14:30:03,553 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.65 vs. limit=12.0 2023-11-29 14:30:05,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4002873.3333333335, ans=0.125 2023-11-29 14:30:08,258 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.77 vs. limit=22.5 2023-11-29 14:30:16,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4002940.0, ans=0.0 2023-11-29 14:30:17,312 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600450 2023-11-29 14:30:25,966 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.69 vs. limit=22.5 2023-11-29 14:30:28,207 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2023-11-29 14:30:33,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4003073.3333333335, ans=0.0 2023-11-29 14:30:44,764 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11300, loss[loss=0.09043, simple_loss=0.1219, pruned_loss=0.0247, audio_tagging_loss=0.004756, over 14675.00 frames. ], tot_loss[loss=0.06507, simple_loss=0.08897, pruned_loss=0.01188, audio_tagging_loss=0.008702, over 3053129.88 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:30:50,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4003140.0, ans=0.125 2023-11-29 14:30:54,035 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-29 14:31:18,220 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600500 2023-11-29 14:31:30,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4003340.0, ans=0.0 2023-11-29 14:31:43,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4003406.6666666665, ans=0.0 2023-11-29 14:31:46,826 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11350, loss[loss=0.06323, simple_loss=0.08919, pruned_loss=0.01031, audio_tagging_loss=0.008329, over 16365.00 frames. ], tot_loss[loss=0.06484, simple_loss=0.08891, pruned_loss=0.01181, audio_tagging_loss=0.008576, over 3056030.08 frames. ], batch size: 61, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:31:49,744 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.915e+01 9.251e+01 9.881e+01 1.050e+02 2.034e+02, threshold=1.976e+02, percent-clipped=1.0 2023-11-29 14:31:55,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4003473.3333333335, ans=0.1 2023-11-29 14:32:19,812 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600550 2023-11-29 14:32:48,385 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11400, loss[loss=0.06765, simple_loss=0.08753, pruned_loss=0.01222, audio_tagging_loss=0.01167, over 14295.00 frames. ], tot_loss[loss=0.06511, simple_loss=0.08968, pruned_loss=0.0119, audio_tagging_loss=0.00837, over 3059318.07 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:33:00,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4003873.3333333335, ans=0.125 2023-11-29 14:33:01,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=12.0 2023-11-29 14:33:08,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4003873.3333333335, ans=0.125 2023-11-29 14:33:12,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4003940.0, ans=0.125 2023-11-29 14:33:14,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4003940.0, ans=0.125 2023-11-29 14:33:18,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4003940.0, ans=0.125 2023-11-29 14:33:22,100 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600600 2023-11-29 14:33:38,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4004073.3333333335, ans=0.0 2023-11-29 14:33:38,733 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.21 vs. limit=6.0 2023-11-29 14:33:45,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4004073.3333333335, ans=0.2 2023-11-29 14:33:49,935 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11450, loss[loss=0.05918, simple_loss=0.07605, pruned_loss=0.0111, audio_tagging_loss=0.01005, over 14674.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08953, pruned_loss=0.01188, audio_tagging_loss=0.008267, over 3049525.01 frames. ], batch size: 59, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:33:52,181 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.736e+01 9.290e+01 9.810e+01 1.057e+02 1.472e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-29 14:34:24,134 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600650 2023-11-29 14:34:30,258 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2023-11-29 14:34:53,816 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11500, loss[loss=0.0442, simple_loss=0.05046, pruned_loss=0.009395, audio_tagging_loss=0.009575, over 16814.00 frames. ], tot_loss[loss=0.0645, simple_loss=0.08885, pruned_loss=0.01179, audio_tagging_loss=0.008279, over 3042066.26 frames. ], batch size: 65, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:35:10,248 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=15.0 2023-11-29 14:35:10,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4004540.0, ans=0.125 2023-11-29 14:35:26,646 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600700 2023-11-29 14:35:31,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4004673.3333333335, ans=0.125 2023-11-29 14:35:31,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4004673.3333333335, ans=0.125 2023-11-29 14:35:47,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4004740.0, ans=0.125 2023-11-29 14:35:50,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4004740.0, ans=0.0 2023-11-29 14:35:55,400 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11550, loss[loss=0.05481, simple_loss=0.07971, pruned_loss=0.008154, audio_tagging_loss=0.006799, over 14075.00 frames. ], tot_loss[loss=0.06454, simple_loss=0.08903, pruned_loss=0.01181, audio_tagging_loss=0.008213, over 3036426.10 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:35:57,768 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.705e+01 9.007e+01 9.636e+01 1.040e+02 1.609e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-29 14:36:28,682 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600750 2023-11-29 14:36:33,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4005006.6666666665, ans=0.125 2023-11-29 14:36:33,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4005006.6666666665, ans=0.2 2023-11-29 14:36:36,873 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 14:36:55,928 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:36:56,770 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11600, loss[loss=0.05788, simple_loss=0.07858, pruned_loss=0.008787, audio_tagging_loss=0.009803, over 14954.00 frames. ], tot_loss[loss=0.06502, simple_loss=0.08955, pruned_loss=0.01195, audio_tagging_loss=0.008296, over 3042222.72 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:37:09,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4005206.6666666665, ans=0.025 2023-11-29 14:37:09,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4005206.6666666665, ans=0.0 2023-11-29 14:37:28,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4005273.3333333335, ans=0.125 2023-11-29 14:37:30,306 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600800 2023-11-29 14:37:30,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4005273.3333333335, ans=0.125 2023-11-29 14:37:45,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4005406.6666666665, ans=0.0 2023-11-29 14:37:58,588 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11650, loss[loss=0.06702, simple_loss=0.08163, pruned_loss=0.01419, audio_tagging_loss=0.01202, over 14761.00 frames. ], tot_loss[loss=0.06499, simple_loss=0.08927, pruned_loss=0.01193, audio_tagging_loss=0.008428, over 3045476.60 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:38:00,912 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.952e+01 9.263e+01 9.866e+01 1.051e+02 2.462e+02, threshold=1.973e+02, percent-clipped=1.0 2023-11-29 14:38:15,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4005540.0, ans=0.2 2023-11-29 14:38:15,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4005540.0, ans=0.0 2023-11-29 14:38:16,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4005540.0, ans=0.125 2023-11-29 14:38:18,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4005540.0, ans=0.125 2023-11-29 14:38:25,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4005606.6666666665, ans=0.09899494936611666 2023-11-29 14:38:28,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4005606.6666666665, ans=0.125 2023-11-29 14:38:32,161 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600850 2023-11-29 14:38:33,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4005606.6666666665, ans=0.125 2023-11-29 14:38:41,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4005673.3333333335, ans=0.1 2023-11-29 14:38:46,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4005740.0, ans=0.125 2023-11-29 14:38:59,898 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11700, loss[loss=0.06044, simple_loss=0.07469, pruned_loss=0.01144, audio_tagging_loss=0.01165, over 16133.00 frames. ], tot_loss[loss=0.06403, simple_loss=0.08777, pruned_loss=0.01164, audio_tagging_loss=0.008509, over 3056079.26 frames. ], batch size: 61, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:39:11,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4005806.6666666665, ans=0.0 2023-11-29 14:39:11,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4005806.6666666665, ans=0.0 2023-11-29 14:39:18,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4005873.3333333335, ans=0.125 2023-11-29 14:39:33,591 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600900 2023-11-29 14:39:34,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4005940.0, ans=0.0 2023-11-29 14:39:52,179 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.42 vs. limit=15.0 2023-11-29 14:40:01,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4006140.0, ans=0.125 2023-11-29 14:40:02,085 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11750, loss[loss=0.0581, simple_loss=0.07535, pruned_loss=0.01184, audio_tagging_loss=0.008594, over 16386.00 frames. ], tot_loss[loss=0.06409, simple_loss=0.08798, pruned_loss=0.01162, audio_tagging_loss=0.008475, over 3053308.12 frames. ], batch size: 62, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:40:05,486 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.465e+01 9.067e+01 9.619e+01 1.045e+02 1.766e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-29 14:40:05,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4006140.0, ans=0.125 2023-11-29 14:40:25,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4006273.3333333335, ans=0.1 2023-11-29 14:40:32,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=4006273.3333333335, ans=0.05 2023-11-29 14:40:34,663 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 600950 2023-11-29 14:40:34,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4006273.3333333335, ans=0.0 2023-11-29 14:40:41,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4006340.0, ans=0.125 2023-11-29 14:40:54,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4006406.6666666665, ans=0.0 2023-11-29 14:41:02,835 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11800, loss[loss=0.06481, simple_loss=0.08688, pruned_loss=0.01127, audio_tagging_loss=0.0101, over 14585.00 frames. ], tot_loss[loss=0.06399, simple_loss=0.08788, pruned_loss=0.01152, audio_tagging_loss=0.008523, over 3050911.40 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:41:10,957 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.16 vs. limit=22.5 2023-11-29 14:41:16,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4006540.0, ans=0.1 2023-11-29 14:41:32,698 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:41:35,914 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601000 2023-11-29 14:42:04,180 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11850, loss[loss=0.05667, simple_loss=0.07212, pruned_loss=0.01063, audio_tagging_loss=0.009983, over 14877.00 frames. ], tot_loss[loss=0.06446, simple_loss=0.08863, pruned_loss=0.01163, audio_tagging_loss=0.008517, over 3051453.75 frames. ], batch size: 57, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:42:04,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4006806.6666666665, ans=0.125 2023-11-29 14:42:07,761 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.808e+01 9.068e+01 9.599e+01 1.030e+02 1.301e+02, threshold=1.920e+02, percent-clipped=0.0 2023-11-29 14:42:15,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4006873.3333333335, ans=0.1 2023-11-29 14:42:22,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4006873.3333333335, ans=0.0 2023-11-29 14:42:37,806 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601050 2023-11-29 14:42:38,555 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.51 vs. limit=15.0 2023-11-29 14:42:49,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4007006.6666666665, ans=0.0 2023-11-29 14:43:04,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4007140.0, ans=0.125 2023-11-29 14:43:05,832 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11900, loss[loss=0.0755, simple_loss=0.1028, pruned_loss=0.01323, audio_tagging_loss=0.01089, over 14861.00 frames. ], tot_loss[loss=0.06474, simple_loss=0.08879, pruned_loss=0.01168, audio_tagging_loss=0.008659, over 3051546.84 frames. ], batch size: 56, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:43:21,941 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.14 vs. limit=22.5 2023-11-29 14:43:38,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4007273.3333333335, ans=0.1 2023-11-29 14:43:39,235 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601100 2023-11-29 14:43:50,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4007340.0, ans=0.125 2023-11-29 14:43:56,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4007406.6666666665, ans=0.125 2023-11-29 14:44:07,740 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 11950, loss[loss=0.04057, simple_loss=0.05141, pruned_loss=0.006423, audio_tagging_loss=0.008437, over 14304.00 frames. ], tot_loss[loss=0.06473, simple_loss=0.08866, pruned_loss=0.01169, audio_tagging_loss=0.008708, over 3043401.90 frames. ], batch size: 54, lr: 1.34e-03, grad_scale: 8.0 2023-11-29 14:44:11,322 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.065e+01 8.998e+01 9.725e+01 1.040e+02 1.716e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-29 14:44:20,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4007540.0, ans=0.1 2023-11-29 14:44:21,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4007540.0, ans=0.95 2023-11-29 14:44:29,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4007540.0, ans=0.125 2023-11-29 14:44:40,059 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.45 vs. limit=10.0 2023-11-29 14:44:40,722 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601150 2023-11-29 14:44:48,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4007673.3333333335, ans=0.0 2023-11-29 14:44:49,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4007673.3333333335, ans=0.1 2023-11-29 14:45:07,248 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.94 vs. limit=12.0 2023-11-29 14:45:07,573 INFO [train_asr.py:1235] (2/4) Epoch 50, batch 12000, loss[loss=0.05781, simple_loss=0.07899, pruned_loss=0.007839, audio_tagging_loss=0.01047, over 15134.00 frames. ], tot_loss[loss=0.06501, simple_loss=0.08886, pruned_loss=0.01174, audio_tagging_loss=0.008837, over 3048692.80 frames. ], batch size: 59, lr: 1.34e-03, grad_scale: 16.0 2023-11-29 14:45:07,573 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 14:45:47,784 INFO [train_asr.py:1267] (2/4) Epoch 50, validation: loss=0.05813, simple_loss=0.05044, pruned_loss=0.005399, audio_tagging_loss=0.02752, over 4681554.00 frames. 2023-11-29 14:45:47,785 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 14:45:49,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4007806.6666666665, ans=0.0 2023-11-29 14:46:01,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4007873.3333333335, ans=0.125 2023-11-29 14:46:03,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4007873.3333333335, ans=0.125 2023-11-29 14:46:34,483 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 0, loss[loss=0.07828, simple_loss=0.09906, pruned_loss=0.01007, audio_tagging_loss=0.01868, over 15666.00 frames. ], tot_loss[loss=0.07828, simple_loss=0.09906, pruned_loss=0.01007, audio_tagging_loss=0.01868, over 15666.00 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:46:34,484 INFO [train_asr.py:1258] (2/4) Computing validation loss 2023-11-29 14:46:56,147 INFO [zipformer.py:1877] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.4666, 3.8379, 4.3626, 3.4733], device='cuda:2') 2023-11-29 14:47:11,094 INFO [train_asr.py:1267] (2/4) Epoch 51, validation: loss=0.05803, simple_loss=0.05046, pruned_loss=0.005398, audio_tagging_loss=0.02741, over 4681554.00 frames. 2023-11-29 14:47:11,094 INFO [train_asr.py:1268] (2/4) Maximum memory allocated so far is 26096MB 2023-11-29 14:47:13,524 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601200 2023-11-29 14:47:30,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4008040.0, ans=0.2 2023-11-29 14:47:34,177 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=15.0 2023-11-29 14:47:45,161 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 9.507e+01 9.981e+01 1.081e+02 1.521e+02, threshold=1.996e+02, percent-clipped=0.0 2023-11-29 14:47:45,747 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.73 vs. limit=15.0 2023-11-29 14:47:53,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4008173.3333333335, ans=0.125 2023-11-29 14:47:58,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4008173.3333333335, ans=0.125 2023-11-29 14:48:13,502 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 50, loss[loss=0.06119, simple_loss=0.06939, pruned_loss=0.007576, audio_tagging_loss=0.01892, over 15293.00 frames. ], tot_loss[loss=0.07041, simple_loss=0.0862, pruned_loss=0.01108, audio_tagging_loss=0.01623, over 692056.87 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:48:16,020 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601250 2023-11-29 14:48:38,315 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.88 vs. limit=15.0 2023-11-29 14:48:46,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4008440.0, ans=0.07 2023-11-29 14:49:06,491 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=22.5 2023-11-29 14:49:07,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4008573.3333333335, ans=0.125 2023-11-29 14:49:15,444 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 100, loss[loss=0.06409, simple_loss=0.07359, pruned_loss=0.01103, audio_tagging_loss=0.01626, over 14774.00 frames. ], tot_loss[loss=0.07042, simple_loss=0.08661, pruned_loss=0.01149, audio_tagging_loss=0.01562, over 1210914.18 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:49:17,845 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601300 2023-11-29 14:49:45,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4008773.3333333335, ans=0.0 2023-11-29 14:49:51,222 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.732e+01 9.896e+01 1.042e+02 1.115e+02 1.364e+02, threshold=2.085e+02, percent-clipped=0.0 2023-11-29 14:50:03,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4008906.6666666665, ans=0.125 2023-11-29 14:50:17,037 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 150, loss[loss=0.07612, simple_loss=0.1073, pruned_loss=0.01385, audio_tagging_loss=0.008646, over 15709.00 frames. ], tot_loss[loss=0.07005, simple_loss=0.08942, pruned_loss=0.0116, audio_tagging_loss=0.01374, over 1619070.64 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:50:19,535 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601350 2023-11-29 14:50:38,880 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.92 vs. limit=22.5 2023-11-29 14:50:54,177 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2023-11-29 14:51:07,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4009240.0, ans=0.125 2023-11-29 14:51:17,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4009240.0, ans=0.125 2023-11-29 14:51:19,926 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 200, loss[loss=0.06813, simple_loss=0.09027, pruned_loss=0.01195, audio_tagging_loss=0.01106, over 15941.00 frames. ], tot_loss[loss=0.06808, simple_loss=0.08859, pruned_loss=0.01147, audio_tagging_loss=0.01232, over 1940666.37 frames. ], batch size: 61, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:51:22,358 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601400 2023-11-29 14:51:55,783 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.803e+01 9.139e+01 9.906e+01 1.061e+02 1.460e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-29 14:52:02,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4009506.6666666665, ans=0.125 2023-11-29 14:52:04,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4009506.6666666665, ans=0.1 2023-11-29 14:52:08,271 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.19 vs. limit=22.5 2023-11-29 14:52:21,834 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 250, loss[loss=0.07231, simple_loss=0.1091, pruned_loss=0.009638, audio_tagging_loss=0.008134, over 14884.00 frames. ], tot_loss[loss=0.06763, simple_loss=0.08933, pruned_loss=0.01177, audio_tagging_loss=0.0112, over 2186196.44 frames. ], batch size: 54, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:52:23,391 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2023-11-29 14:52:24,254 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601450 2023-11-29 14:52:46,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4009773.3333333335, ans=0.0 2023-11-29 14:52:48,490 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-29 14:53:23,652 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 300, loss[loss=0.07643, simple_loss=0.09272, pruned_loss=0.02118, audio_tagging_loss=0.00889, over 15797.00 frames. ], tot_loss[loss=0.06758, simple_loss=0.09032, pruned_loss=0.01201, audio_tagging_loss=0.01042, over 2376907.50 frames. ], batch size: 59, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:53:26,664 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601500 2023-11-29 14:53:39,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4010040.0, ans=0.025 2023-11-29 14:53:55,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=4010106.6666666665, ans=0.2 2023-11-29 14:53:59,443 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.022e+01 9.446e+01 1.010e+02 1.084e+02 1.415e+02, threshold=2.020e+02, percent-clipped=0.0 2023-11-29 14:54:01,200 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=12.0 2023-11-29 14:54:26,165 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 350, loss[loss=0.05742, simple_loss=0.08363, pruned_loss=0.00889, audio_tagging_loss=0.00672, over 15710.00 frames. ], tot_loss[loss=0.06671, simple_loss=0.08984, pruned_loss=0.01201, audio_tagging_loss=0.009772, over 2530252.07 frames. ], batch size: 59, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 14:54:29,237 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601550 2023-11-29 14:54:48,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4010373.3333333335, ans=0.0 2023-11-29 14:54:48,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4010373.3333333335, ans=0.05 2023-11-29 14:55:27,922 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 400, loss[loss=0.07319, simple_loss=0.1024, pruned_loss=0.01406, audio_tagging_loss=0.007944, over 14680.00 frames. ], tot_loss[loss=0.06617, simple_loss=0.08984, pruned_loss=0.01185, audio_tagging_loss=0.009405, over 2652907.19 frames. ], batch size: 54, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:55:30,258 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601600 2023-11-29 14:55:30,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4010640.0, ans=0.125 2023-11-29 14:55:34,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4010640.0, ans=0.0 2023-11-29 14:55:59,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4010773.3333333335, ans=0.125 2023-11-29 14:56:01,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4010773.3333333335, ans=0.0 2023-11-29 14:56:04,472 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.022e+01 9.094e+01 9.565e+01 1.047e+02 1.359e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-29 14:56:11,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4010840.0, ans=0.125 2023-11-29 14:56:21,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4010906.6666666665, ans=0.0 2023-11-29 14:56:29,842 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 450, loss[loss=0.0811, simple_loss=0.1116, pruned_loss=0.01647, audio_tagging_loss=0.008818, over 14020.00 frames. ], tot_loss[loss=0.0665, simple_loss=0.09082, pruned_loss=0.01211, audio_tagging_loss=0.00898, over 2742905.00 frames. ], batch size: 54, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:56:30,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4010973.3333333335, ans=10.0 2023-11-29 14:56:32,874 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601650 2023-11-29 14:56:34,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4010973.3333333335, ans=0.125 2023-11-29 14:57:07,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4011173.3333333335, ans=0.125 2023-11-29 14:57:15,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4011173.3333333335, ans=0.1 2023-11-29 14:57:21,168 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.23 vs. limit=15.0 2023-11-29 14:57:30,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4011306.6666666665, ans=0.09899494936611666 2023-11-29 14:57:31,207 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 500, loss[loss=0.08157, simple_loss=0.1105, pruned_loss=0.01709, audio_tagging_loss=0.00925, over 15775.00 frames. ], tot_loss[loss=0.06556, simple_loss=0.08996, pruned_loss=0.01175, audio_tagging_loss=0.008824, over 2806281.95 frames. ], batch size: 58, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:57:33,661 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601700 2023-11-29 14:57:58,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4011440.0, ans=0.0 2023-11-29 14:58:05,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4011440.0, ans=10.0 2023-11-29 14:58:07,336 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.832e+01 9.018e+01 9.718e+01 1.038e+02 1.323e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-29 14:58:09,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4011506.6666666665, ans=0.0 2023-11-29 14:58:19,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4011573.3333333335, ans=0.1 2023-11-29 14:58:32,338 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 550, loss[loss=0.0633, simple_loss=0.09313, pruned_loss=0.009605, audio_tagging_loss=0.007126, over 15124.00 frames. ], tot_loss[loss=0.06542, simple_loss=0.08995, pruned_loss=0.01178, audio_tagging_loss=0.008668, over 2862221.60 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:58:34,985 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601750 2023-11-29 14:58:43,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4011706.6666666665, ans=0.125 2023-11-29 14:59:10,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4011840.0, ans=0.2 2023-11-29 14:59:32,716 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.35 vs. limit=15.0 2023-11-29 14:59:35,614 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 600, loss[loss=0.0768, simple_loss=0.1022, pruned_loss=0.0131, audio_tagging_loss=0.01262, over 14746.00 frames. ], tot_loss[loss=0.06475, simple_loss=0.08899, pruned_loss=0.01162, audio_tagging_loss=0.00863, over 2905571.90 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 14:59:38,699 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601800 2023-11-29 14:59:41,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4011973.3333333335, ans=0.05 2023-11-29 15:00:11,507 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=15.0 2023-11-29 15:00:12,014 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.862e+01 9.089e+01 9.742e+01 1.033e+02 1.328e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-29 15:00:12,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4012173.3333333335, ans=0.0 2023-11-29 15:00:22,303 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2023-11-29 15:00:30,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4012240.0, ans=10.0 2023-11-29 15:00:33,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4012240.0, ans=0.2 2023-11-29 15:00:35,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4012240.0, ans=0.125 2023-11-29 15:00:38,273 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 650, loss[loss=0.09112, simple_loss=0.1222, pruned_loss=0.02032, audio_tagging_loss=0.009694, over 14368.00 frames. ], tot_loss[loss=0.06456, simple_loss=0.0882, pruned_loss=0.01168, audio_tagging_loss=0.008782, over 2937962.17 frames. ], batch size: 55, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 15:00:40,724 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601850 2023-11-29 15:00:40,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4012306.6666666665, ans=0.0 2023-11-29 15:01:06,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4012440.0, ans=0.125 2023-11-29 15:01:38,035 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2023-11-29 15:01:39,804 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 700, loss[loss=0.06822, simple_loss=0.09746, pruned_loss=0.01229, audio_tagging_loss=0.007197, over 15151.00 frames. ], tot_loss[loss=0.06463, simple_loss=0.08836, pruned_loss=0.01172, audio_tagging_loss=0.008737, over 2964247.67 frames. ], batch size: 55, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 15:01:42,863 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601900 2023-11-29 15:02:07,818 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=12.0 2023-11-29 15:02:10,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4012773.3333333335, ans=0.125 2023-11-29 15:02:11,716 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.64 vs. limit=15.0 2023-11-29 15:02:15,540 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.121e+01 8.992e+01 9.586e+01 1.033e+02 1.329e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-29 15:02:21,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4012840.0, ans=0.125 2023-11-29 15:02:27,222 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.89 vs. limit=15.0 2023-11-29 15:02:32,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4012906.6666666665, ans=0.125 2023-11-29 15:02:41,762 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 750, loss[loss=0.0681, simple_loss=0.09474, pruned_loss=0.01213, audio_tagging_loss=0.008602, over 14935.00 frames. ], tot_loss[loss=0.06459, simple_loss=0.08845, pruned_loss=0.01168, audio_tagging_loss=0.008691, over 2987258.76 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:02:44,187 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 601950 2023-11-29 15:02:47,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4012973.3333333335, ans=0.04949747468305833 2023-11-29 15:02:59,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4013040.0, ans=0.125 2023-11-29 15:03:14,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4013106.6666666665, ans=0.1 2023-11-29 15:03:44,092 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 800, loss[loss=0.07477, simple_loss=0.09767, pruned_loss=0.01649, audio_tagging_loss=0.009445, over 14975.00 frames. ], tot_loss[loss=0.06532, simple_loss=0.08976, pruned_loss=0.0118, audio_tagging_loss=0.008645, over 3006123.34 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 15:03:46,527 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 602000 2023-11-29 15:03:49,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4013306.6666666665, ans=0.07 2023-11-29 15:04:05,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4013373.3333333335, ans=0.125 2023-11-29 15:04:07,881 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-29 15:04:14,305 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.19 vs. limit=15.0 2023-11-29 15:04:21,461 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.185e+01 9.281e+01 9.942e+01 1.059e+02 1.465e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-29 15:04:27,439 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.31 vs. limit=10.0 2023-11-29 15:04:30,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4013506.6666666665, ans=0.1 2023-11-29 15:04:46,294 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 850, loss[loss=0.06854, simple_loss=0.08861, pruned_loss=0.01501, audio_tagging_loss=0.00923, over 14595.00 frames. ], tot_loss[loss=0.06491, simple_loss=0.08906, pruned_loss=0.01158, audio_tagging_loss=0.008799, over 3015831.28 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 32.0 2023-11-29 15:04:47,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4013640.0, ans=0.125 2023-11-29 15:04:48,729 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 602050 2023-11-29 15:04:50,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4013640.0, ans=0.125 2023-11-29 15:05:24,105 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.48 vs. limit=22.5 2023-11-29 15:05:32,982 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.09 vs. limit=10.0 2023-11-29 15:05:43,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4013906.6666666665, ans=0.125 2023-11-29 15:05:48,322 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 900, loss[loss=0.06865, simple_loss=0.09263, pruned_loss=0.01268, audio_tagging_loss=0.009655, over 15692.00 frames. ], tot_loss[loss=0.06526, simple_loss=0.08926, pruned_loss=0.01173, audio_tagging_loss=0.0089, over 3025017.85 frames. ], batch size: 60, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:05:50,791 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 602100 2023-11-29 15:05:58,334 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.84 vs. limit=22.5 2023-11-29 15:06:26,399 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.140e+01 9.010e+01 9.603e+01 1.027e+02 1.199e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-29 15:06:51,390 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 950, loss[loss=0.0477, simple_loss=0.05803, pruned_loss=0.009302, audio_tagging_loss=0.009385, over 14845.00 frames. ], tot_loss[loss=0.06465, simple_loss=0.08838, pruned_loss=0.01165, audio_tagging_loss=0.008817, over 3027752.70 frames. ], batch size: 56, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:06:53,937 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 602150 2023-11-29 15:06:59,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4014306.6666666665, ans=0.125 2023-11-29 15:07:24,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4014440.0, ans=0.125 2023-11-29 15:07:34,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=4014506.6666666665, ans=10.0 2023-11-29 15:07:51,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4014573.3333333335, ans=0.125 2023-11-29 15:07:51,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4014573.3333333335, ans=0.125 2023-11-29 15:07:53,965 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 1000, loss[loss=0.06199, simple_loss=0.08399, pruned_loss=0.01028, audio_tagging_loss=0.009715, over 15120.00 frames. ], tot_loss[loss=0.06464, simple_loss=0.08869, pruned_loss=0.01164, audio_tagging_loss=0.008646, over 3031726.66 frames. ], batch size: 57, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:07:56,591 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 602200 2023-11-29 15:07:59,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4014640.0, ans=0.125 2023-11-29 15:08:00,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4014640.0, ans=0.125 2023-11-29 15:08:24,027 WARNING [train_asr.py:1481] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-29 15:08:27,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4014773.3333333335, ans=0.125 2023-11-29 15:08:28,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4014773.3333333335, ans=0.125 2023-11-29 15:08:33,536 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.974e+01 9.222e+01 9.984e+01 1.098e+02 2.505e+02, threshold=1.997e+02, percent-clipped=1.0 2023-11-29 15:08:49,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4014906.6666666665, ans=0.125 2023-11-29 15:08:52,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4014906.6666666665, ans=0.125 2023-11-29 15:08:53,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4014906.6666666665, ans=0.125 2023-11-29 15:08:56,582 INFO [train_asr.py:1235] (2/4) Epoch 51, batch 1050, loss[loss=0.06005, simple_loss=0.08464, pruned_loss=0.008256, audio_tagging_loss=0.00948, over 14151.00 frames. ], tot_loss[loss=0.06449, simple_loss=0.0885, pruned_loss=0.0117, audio_tagging_loss=0.008535, over 3032375.85 frames. ], batch size: 54, lr: 1.33e-03, grad_scale: 16.0 2023-11-29 15:08:59,076 INFO [model.py:807] (2/4) Freeze_encoder: False; Current batch idx: 602250 2023-11-29 15:09:26,500 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.18 vs. limit=15.0 2023-11-29 15:09:28,705 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.83 vs. limit=10.0 2023-11-29 15:09:29,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4015106.6666666665, ans=0.125 2023-11-29 15:09:30,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4015106.6666666665, ans=0.025 2023-11-29 15:09:39,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=4015173.3333333335, ans=10.0 2023-11-29 15:09:44,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4015173.3333333335, ans=0.125 2023-11-29 15:09:51,221 INFO [checkpoint.py:75] (2/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/bad-model-2.pt 2023-11-29 15:09:53,002 INFO [train_asr.py:1596] (2/4) Saving batch to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/batch-8cd272e0-909e-4060-8425-07646bc9947a.pt 2023-11-29 15:09:53,075 INFO [train_asr.py:1602] (2/4) features shape: torch.Size([59, 1689, 80]) 2023-11-29 15:09:53,293 INFO [train_asr.py:1606] (2/4) num tokens: 2110